Predict Training Data In Sklearn

February 28, 2024 Post a Comment

I use scikit-learn's SVM like so: clf = svm.SVC() clf.fit(td_X, td_y) My question is when I use the classifier to predict the class of a member of the training set, could the cla

Solution 1:

Yes definitely, run this code for example:

from sklearn import svm
import numpy as np
clf = svm.SVC()
np.random.seed(seed=42)
x=np.random.normal(loc=0.0, scale=1.0, size=[100,2])
y=np.random.randint(2,size=100)
clf.fit(x,y)
print(clf.score(x,y))

The score is 0.61, so nearly 40% of the training data is missclassified. Part of the reason is that even though the default kernel is 'rbf' (which in theory should be able to classify perfectly any training data set, as long as you don't have two identical training points with different labels), there is also regularization to reduce overfitting. The default regularizer is C=1.0.

If you run the same code as above but switch clf = svm.SVC() to clf = svm.SVC(C=200000), you'll get an accuracy of 0.94.

Python Playground

Predict Training Data In Sklearn

Solution 1:

Post a Comment for "Predict Training Data In Sklearn"