-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MRG Deprecates 'normalize' in LinearRegression (_base.py) #17743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG Deprecates 'normalize' in LinearRegression (_base.py) #17743
Conversation
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
…/scikit-learn into depreciate_normalize_base
@rth @agramfort @glemaitre (problem with the docs should hopefully be fixed soon: #17745) |
…into depreciate_normalize_base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will also need an entry in what's new do document the deprecation
besides LTGM provided CIs are happy (doc build included)
…into depreciate_normalize_base
Co-authored-by: Guillaume Lemaitre <[email protected]>
…into depreciate_normalize_base
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
…er test the impact of with_mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a new commit to change the new test to see the impact of the with_mean
parameter of the StandardScaler
for dense inputs.
So apparently, both with_mean=True
and with_mean=False
work for dense data on LinearRegression
. I assume the mean feature value is moved to the intercept and therefore scaling with or without mean does change the equivalence asserted in the test.
However I am not sure about how regularization will impact this if we are to write a similar test for Ridge and Lasso for instance.
The test failure is unrelated and reported in a dedicated issue: #19224. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of the updated test, I think it's fine to keep an explicit with_mean=False
in the deprecation message.
LGTM for merge once the following comment is addressed:
) | ||
def test_linear_model_sample_weights_normalize_in_pipeline( | ||
estimator, is_sparse, with_mean | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this test is only meant to test LinearRegression
it should be moved to sklearn/linear_model/tests/test_base.py
. If it's meant to be extended to Ridge
, Lasso
... maybe it should be move to a new file, e.g. sklearn/linear_model/tests/test_linear_model.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall I was proposing sklearn/linear_model/tests/test_common.py
that is the usual way that we structure common tests for a module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To anticipate this question, I tried to see if this test would pass with the current code for Ridge and Lasso and actually it always fails whether with_mean
is True
or False
on dense data and it also fails with with_mean=False
on sparse data:
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-True-False] PASSED [ 12%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-True] PASSED [ 25%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-False] PASSED [ 37%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-True-False] FAILED [ 50%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-True] FAILED [ 62%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-False] FAILED [ 75%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-True] FAILED [ 87%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-False] FAILED [100%]
So the deprecation of their normalize
option should not be implemented with the same message I believe.
To keep this PR focused, let's just move this test to sklearn/linear_model/tests/test_base.py
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @glemaitre . I answered your comment, but it must have gone lost int the flow of other comments:
#17743 (comment)
I will move it to test_base.py
.
@ogrisel failing for Ridge and Lasso might indeed be a problem as it was supposed to be extended to include them in this test. Why is this the case (for their failing)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre
There is no file: sklearn/linear_model/tests/test_common.py
(there is: sklearn/tests/test_common.py
, hence my previous question above).
Should I create it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no file: sklearn/linear_model/tests/test_common.py
We could create one. But it's fine to keep in sklearn/linear_model/tests/test_base.py for now. You can move this test to sklearn/linear_model/tests/test_common.py in a PR that needs to reuse it for another estimator of the sklearn.linear_model
module.
…into depreciate_normalize_base
Merged. The rest of the discussion can be handled in PRs related to Ridge and Lasso. |
congrats and thanks @maikia ! |
Towards: #3020
It deprecates 'normalize' in _base.py (LinearRegression)