-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
sample_weights array can't be used with GridSearchCV #2879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have added support for sample_weight in GridSearchCV in #1574. Hopefully I can get that merged soon. |
Thanks, that was extremely fast turn-around. As a meta-question, is sampling weighted? e.g. Is it better for stratified folds to try maintain the ratio of total weight per class instead of the number of samples? |
sample_weight support was merged long ago |
@amueller, sample_weight is still not supported as an argument to fit in *SearchCV |
There is |
@amueller , I'm running into a similar issue. It's possible to set a Is there a reason why the I would be willing to make that change (adding fit parameters) if you'd accept that PR. |
Yes, it's broken. One issue is that we need to be clear whether |
fit_params is deprecated since 0.19 and will be removed in 0.21 |
@ManasHardas pass them to fit. |
@amueller Trying your solution when using RandomizedSearchCV, with a RandomForestClassifier. Attempting to pass:
But will still manage to run with the deprecation warning when passed to the constructor instead. |
just pass keyword arguments to fit these days
|
I see, should have more closely looked at doc |
Has this issue been fixed? I'm trying to run Randomized Search CV with sample_weights both as a scoring param and fit param. Here is a code snippet:
But, RandomizedSearchCV does not split the sample weights column when splitting the data into train and test sets. Here's some output about the array shapes and the error:
|
No we don't currently support weighted scoring in cross validation. Sorry. :'( Soon? |
Is this still unresolved? |
Yes, they cannot at present be passed to scoring (without making them part
of X), unfortunately.
|
Thanks. Is there any current workaround other than oversampling the class in a way way proportional to the weights? |
Well if you make sample_weight the first column in X all you need is
to make scorers that extract that (lambda est, X, y: orig_scorer(est,
X[:, 1:], y, sample_weight=X[:, 0])) and use a Pipeline with a
FunctionTransformer(lambda X: X[:, 1:]) at the start. But it's
certainly a hack. (And you should probably use something other than
lambdas.
|
Now that this issue has been fixed, the new feature should be documented at https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html and https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html. Also, this only works for |
Hi @jnothman , |
I have a work-around for evaluating with weights. It's somewhat inefficient, but works. from sklearn.metrics import accuracy_score
from sklearn.utils import compute_sample_weight
from sklearn.metrics import make_scorer
def weighted_accuracy_eval(y_pred, y_true, **kwargs):
balanced_class_weights_eval = compute_sample_weight(
class_weight='balanced',
y=y_true
)
out = accuracy_score(y_pred=y_pred, y_true=y_true, sample_weight=balanced_class_weights_eval, **kwargs)
return out
weighted_accuracy_eval_skl = make_scorer(weighted_accuracy_eval)
gridsearch = GridSearchCV(
estimator=model,
scoring=weighted_accuracy_eval,
param_grid=paramGrid,
)
cv_result = gridsearch.fit(
X_train,
y_train,
**fit_params
) |
The internal cross-validation isn't aware of sample weights, so and exception is thrown if a sample_weights sequence is passed to the grid search, because fit_grid_point does not split the weights into training and test sets.
The text was updated successfully, but these errors were encountered: