Skip to content

sample_weights array can't be used with GridSearchCV #2879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shenberg opened this issue Feb 20, 2014 · 21 comments · Fixed by #8278
Closed

sample_weights array can't be used with GridSearchCV #2879

shenberg opened this issue Feb 20, 2014 · 21 comments · Fixed by #8278

Comments

@shenberg
Copy link

The internal cross-validation isn't aware of sample weights, so and exception is thrown if a sample_weights sequence is passed to the grid search, because fit_grid_point does not split the weights into training and test sets.

@ndawe
Copy link
Member

ndawe commented Feb 20, 2014

I have added support for sample_weight in GridSearchCV in #1574. Hopefully I can get that merged soon.

@shenberg
Copy link
Author

Thanks, that was extremely fast turn-around.

As a meta-question, is sampling weighted? e.g. Is it better for stratified folds to try maintain the ratio of total weight per class instead of the number of samples?

@amueller
Copy link
Member

sample_weight support was merged long ago

@jnothman
Copy link
Member

@amueller, sample_weight is still not supported as an argument to fit in *SearchCV

@jnothman jnothman reopened this Oct 29, 2016
@amueller
Copy link
Member

amueller commented Nov 1, 2016

There is fit_params

@stephen-hoover
Copy link
Contributor

@amueller , I'm running into a similar issue. It's possible to set a sample_weights array as an instance attribute via the GridSearchCV.fit_params dictionary, and then everything works fine. But this fails in nested cross-validation with model_selection.cross_val_predict, because the GridSearchCV.fit (and RandomizedSearchCV.fit) method doesn't accept fit parameters.

Is there a reason why the BaseSearchCV subclasses can't accept fit parameters? Having to set data-dependent fit parameters as instance attributes appears to contradict http://scikit-learn.org/stable/developers/contributing.html#fitting .

I would be willing to make that change (adding fit parameters) if you'd accept that PR.

@jnothman
Copy link
Member

jnothman commented Feb 2, 2017

Yes, it's broken. One issue is that we need to be clear whether sample_weight is being passed only to scoring, only to fit, or both. Feel free to propose and champion a solution.

@ManasHardas
Copy link

fit_params is deprecated since 0.19 and will be removed in 0.21
How else to make "sample_weights" a part of cross-validation?

@amueller
Copy link
Member

@ManasHardas pass them to fit.

@farfan92
Copy link

farfan92 commented Feb 5, 2018

@amueller Trying your solution when using RandomizedSearchCV, with a RandomForestClassifier.

Attempting to pass: fit_params ={'sample_weight':s_weights}
to the .fit method of RandomizedSearchCV results in

TypeError: fit() got an unexpected keyword argument 'fit_params'

But will still manage to run with the deprecation warning when passed to the constructor instead.

@jnothman
Copy link
Member

jnothman commented Feb 5, 2018 via email

@farfan92
Copy link

farfan92 commented Feb 5, 2018

I see, should have more closely looked at doc

@rishabhgit
Copy link

Hi @amueller , @jnothman ,

Has this issue been fixed? I'm trying to run Randomized Search CV with sample_weights both as a scoring param and fit param. Here is a code snippet:

scorer = make_scorer(r2_score,sample_weight=weights)
rs = RandomizedSearchCV(estimator=rf, param_distributions=param_grid, n_iter=3, 
                           scoring = scorer,
                           n_jobs=-1, cv=3)
    
rs = rs.fit(X_train, y_train) 

But, RandomizedSearchCV does not split the sample weights column when splitting the data into train and test sets. Here's some output about the array shapes and the error:

X_train shape  (349978, 367)
y_train shape  (349978,)
Sample weight shape  (349978,)

ValueError: Found input variables with inconsistent numbers of samples: [116660, 116660, 349978]

@jnothman
Copy link
Member

Has this issue been fixed?

No we don't currently support weighted scoring in cross validation. Sorry. :'( Soon?

@doriang102
Copy link

Is this still unresolved? GridSearchCV and RandomSearchCV do not throw an error when sample_weight is passed to the fit method, yet the cross validation seems to ignore them entirely.

@jnothman
Copy link
Member

jnothman commented Apr 30, 2019 via email

@doriang102
Copy link

Thanks. Is there any current workaround other than oversampling the class in a way way proportional to the weights?

@jnothman
Copy link
Member

jnothman commented Apr 30, 2019 via email

@david-r-wasserman
Copy link

Now that this issue has been fixed, the new feature should be documented at https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html and https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html.

Also, this only works for sample_weights, right? If the estimator's fit() function has other params that are sample-specific, those still won't work, or will they? This should be made clear in the documentation.

@lpossberg
Copy link

Has this issue been fixed?

No we don't currently support weighted scoring in cross validation. Sorry. :'( Soon?

Hi @jnothman ,
is there any progress concerning the weighted scoring in cross validation?

@Demetrio92
Copy link

Demetrio92 commented Apr 13, 2021

I have a work-around for evaluating with weights. It's somewhat inefficient, but works.

from sklearn.metrics import accuracy_score
from sklearn.utils import compute_sample_weight
from sklearn.metrics import make_scorer


def weighted_accuracy_eval(y_pred, y_true, **kwargs):
    balanced_class_weights_eval = compute_sample_weight(
        class_weight='balanced',
        y=y_true
    )
    out = accuracy_score(y_pred=y_pred, y_true=y_true, sample_weight=balanced_class_weights_eval, **kwargs)
    return out


weighted_accuracy_eval_skl = make_scorer(weighted_accuracy_eval)

gridsearch = GridSearchCV(
    estimator=model,
    scoring=weighted_accuracy_eval,
    param_grid=paramGrid,
)

cv_result = gridsearch.fit(
    X_train,
    y_train,
    **fit_params
)

@lpossberg @jnothman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.