Sgdclassifier Giving Different Accuracy Each Time For Text Classification

March 31, 2024 Post a Comment

I'm using the SVM Classifier for classifying text as good text and gibberish. I'm using python's scikit-learn and doing it as follows: ''' Created on May 5, 2017 ''' import re im

Solution 1:

This is because in your prepare_data() method, you are randomly shuffling the data. This is what you are doing:

random.shuffle(data)

So it affects the training of the estimator and hence the results.

Try commenting or removing that line along with the random_state set in the SGDClassifier. You will get exact same results each time.

Suggestion: Try using different estimators to see which one performs best. If you are keen on using the SGDClassifier, then I would recommend to see and understand the n_iter parameter. Try changing it to a larger value, and you will see the difference in accuracy will become less and less (even with your shuffling of data).

You can look at this answer for more details on it:

https://datascience.stackexchange.com/a/9794

Solution 2:

All classifiers which split or shuffle data (h/t to Vivek) have an optional random_state variable in the constructor with a default value of None. When the random_state is passed, it is checked by an internal check_random_state function.

From the documentation:

check_random_state: create a np.random.RandomState object from a parameter random_state. If random_state is None or np.random, then a randomly-initialized RandomState object is returned. If random_state is an integer, then it is used to seed a new RandomState object. If random_state is a RandomState object, then it is passed through.

Because you're using the default None, you have some uncontrolled stochastic noise in your code.

Pass a seed for reproducibility.

Python Playground

Sgdclassifier Giving Different Accuracy Each Time For Text Classification

Solution 1:

Solution 2:

Post a Comment for "Sgdclassifier Giving Different Accuracy Each Time For Text Classification"