Using Sample_weight In Gridsearchcv

March 27, 2024 Post a Comment

Is it possible to perform a GridSearchCV (to get the best SVM's C) and yet specify the sample_weight with scikit-learn? Here's my code and the error I'm confronted to: gs = GridSea

Solution 1:

Just trying to close out this long hanging question...

You needed to get the last version of SKL and use the following:

gs.fit(Xtrain, ytrain, fit_params={'sample_weight': sw_train})

However, it is more in line with the documentation to pass fit_params to the constructor:

gs = GridSearchCV(svm.SVC(C=1), [{'kernel': ['linear'], 'C': [.1, 1, 10], 'probability': [True], 'sample_weight': sw_train}], fit_params={'sample_weight': sw_train})

gs.fit(Xtrain, ytrain)

Solution 2:

The previous answers are now obsolete. The dictionary fit_params should be passed to the fit method.

From the documentation for GridSearchCV:

fit_params : dict, optional
Parameters to pass to the fit method.
Deprecated since version 0.19: fit_params as a constructor argument was deprecated in version 0.19 and will be removed in version 0.21. Pass fit parameters to the fit method instead.

Solution 3:

In version 0.16.1, if you use Pipeline, you need to pass the param to GridSearchCV constructor:

Baca Juga

clf = pipeline.Pipeline([('svm', svm_model)])
model = grid_search.GridSearchCV(estimator = clf, param_grid=param_grid,
    fit_params={'svm__sample_weight': sw_train})

Solution 4:

The following works in Sklearn 0.23.1,

grid_cv = GridSearchCV(clf, param_grid=param_grid,
                       scoring='recall', n_jobs=-1, cv=10)

grid_cv.fit(x_train_orig, y=y_train_orig,
            sample_weight=my_sample_weights)

Solution 5:

OP's edit and other answers are not entirely correct. While for fitting fit_params={'sample_weight': weights} works, those weight will not be used to compute validation loss! (github issue).

Consequently, cross-validation will report unweighted loss, and thus the hyper-parameter-tuning might get steered off into the wrong direction.

Here is my work-around for cross-validation with class weights using accuracy as metric. Should also work with other metrics.

from sklearn.metrics import accuracy_score
from sklearn.utils import compute_sample_weight
from sklearn.metrics import make_scorer


defweighted_accuracy_eval(y_pred, y_true, **kwargs):
    balanced_class_weights_eval = compute_sample_weight(
        class_weight='balanced',
        y=y_true
    )
    out = accuracy_score(y_pred=y_pred, y_true=y_true, sample_weight=balanced_class_weights_eval, **kwargs)
    return out


weighted_accuracy_eval_skl = make_scorer(weighted_accuracy_eval)

gridsearch = GridSearchCV(
    estimator=model,
    scoring=weighted_accuracy_eval,
    param_grid=paramGrid,
)

cv_result = gridsearch.fit(
    X_train,
    y_train,
    fit_params=fit_params
)

Python Playground