Sgdclassifier Giving Different Accuracy Each Time For Text Classification
Solution 1:
This is because in your prepare_data()
method, you are randomly shuffling the data. This is what you are doing:
random.shuffle(data)
So it affects the training of the estimator and hence the results.
Try commenting or removing that line along with the random_state
set in the SGDClassifier
. You will get exact same results each time.
Suggestion: Try using different estimators to see which one performs best. If you are keen on using the SGDClassifier
, then I would recommend to see and understand the n_iter
parameter. Try changing it to a larger value, and you will see the difference in accuracy will become less and less (even with your shuffling of data).
You can look at this answer for more details on it:
Solution 2:
All classifiers which split or shuffle data (h/t to Vivek) have an optional random_state
variable in the constructor with a default value of None
. When the random_state
is passed, it is checked by an internal check_random_state
function.
From the documentation:
check_random_state
: create anp.random.RandomState
object from a parameter random_state. Ifrandom_state
isNone
ornp.random
, then a randomly-initializedRandomState
object is returned. Ifrandom_state
is an integer, then it is used to seed a newRandomState
object. Ifrandom_state
is aRandomState
object, then it is passed through.
Because you're using the default None
, you have some uncontrolled stochastic noise in your code.
Pass a seed for reproducibility.
Post a Comment for "Sgdclassifier Giving Different Accuracy Each Time For Text Classification"