Skip to content

MRG Deprecates 'normalize' in LinearRegression (_base.py) #17743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 141 commits into from
Jan 22, 2021

Conversation

maikia
Copy link
Contributor

@maikia maikia commented Jun 26, 2020

Towards: #3020

It deprecates 'normalize' in _base.py (LinearRegression)

@maikia
Copy link
Contributor Author

maikia commented Jun 26, 2020

@rth @agramfort @glemaitre
what do you think?

(problem with the docs should hopefully be fixed soon: #17745)

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will also need an entry in what's new do document the deprecation

besides LTGM provided CIs are happy (doc build included)

@maikia maikia changed the title WIP: Deprecates 'normalize' in LinearRegression (_base.py) Deprecates 'normalize' in LinearRegression (_base.py) Jun 29, 2020
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a new commit to change the new test to see the impact of the with_mean parameter of the StandardScaler for dense inputs.

So apparently, both with_mean=True and with_mean=False work for dense data on LinearRegression. I assume the mean feature value is moved to the intercept and therefore scaling with or without mean does change the equivalence asserted in the test.

However I am not sure about how regularization will impact this if we are to write a similar test for Ridge and Lasso for instance.

@ogrisel
Copy link
Member

ogrisel commented Jan 21, 2021

The test failure is unrelated and reported in a dedicated issue: #19224.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of the updated test, I think it's fine to keep an explicit with_mean=False in the deprecation message.

LGTM for merge once the following comment is addressed:

)
def test_linear_model_sample_weights_normalize_in_pipeline(
estimator, is_sparse, with_mean
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is only meant to test LinearRegression it should be moved to sklearn/linear_model/tests/test_base.py. If it's meant to be extended to Ridge, Lasso... maybe it should be move to a new file, e.g. sklearn/linear_model/tests/test_linear_model.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall I was proposing sklearn/linear_model/tests/test_common.py that is the usual way that we structure common tests for a module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To anticipate this question, I tried to see if this test would pass with the current code for Ridge and Lasso and actually it always fails whether with_mean is True or False on dense data and it also fails with with_mean=False on sparse data:

sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-True-False] PASSED                                                          [ 12%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-True] PASSED                                                          [ 25%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-False] PASSED                                                         [ 37%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-True-False] FAILED                                                                     [ 50%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-True] FAILED                                                                     [ 62%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-False] FAILED                                                                    [ 75%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-True] FAILED                                                                     [ 87%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-False] FAILED                                                                    [100%]

So the deprecation of their normalize option should not be implemented with the same message I believe.

To keep this PR focused, let's just move this test to sklearn/linear_model/tests/test_base.py for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @glemaitre . I answered your comment, but it must have gone lost int the flow of other comments:
#17743 (comment)

I will move it to test_base.py.

@ogrisel failing for Ridge and Lasso might indeed be a problem as it was supposed to be extended to include them in this test. Why is this the case (for their failing)?

Copy link
Contributor Author

@maikia maikia Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre
There is no file: sklearn/linear_model/tests/test_common.py
(there is: sklearn/tests/test_common.py, hence my previous question above).

Should I create it?

Copy link
Member

@ogrisel ogrisel Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no file: sklearn/linear_model/tests/test_common.py

We could create one. But it's fine to keep in sklearn/linear_model/tests/test_base.py for now. You can move this test to sklearn/linear_model/tests/test_common.py in a PR that needs to reuse it for another estimator of the sklearn.linear_model module.

Base automatically changed from master to main January 22, 2021 10:52
@ogrisel ogrisel merged commit 306826f into scikit-learn:main Jan 22, 2021
@ogrisel
Copy link
Member

ogrisel commented Jan 22, 2021

Merged. The rest of the discussion can be handled in PRs related to Ridge and Lasso.

@agramfort
Copy link
Member

congrats and thanks @maikia !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants