Skip to content

MNT Handle NaNs in scipy dev rankdata #24141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 16, 2022

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Aug 8, 2022

One of the failure in the scipy-dev build #23626 (turns out it fixes all the scipy-dev failures for some reason ...)

This is a change in scipy 1.10.dev: scipy/scipy#16140

import numpy as np

from scipy.stats import rankdata

print(rankdata([1, 2, np.nan]))

scipy 1.10.dev: array([ 1., 2., nan])
scipy 1.9: array([ 1., 2., 3.])

Note: the change in scipy breaks backward-compatibility for the rank in the cv_results_ attribute. For nan scores, the associated rank will be np.iinfo(np.int32).min i.e. -2147483648. I am not sure how much the ranks are used in general (I would guess not used very much) and how acceptable such a breaking change is.

@jjerphan jjerphan added Waiting for Reviewer Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. Needs Decision - Backward Compatibility labels Aug 8, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix. LGTM

@thomasjpfan thomasjpfan changed the title Handle NaNs in scipy dev rankdata MNT Handle NaNs in scipy dev rankdata Aug 8, 2022
@thomasjpfan thomasjpfan removed Breaking Change Issue resolution would not be easily handled by the usual deprecation cycle. Needs Decision - Backward Compatibility labels Aug 8, 2022
@thomasjpfan
Copy link
Member

I removed the "Breaking Change" label because this preserves the behavior in scikit-learn when SciPy 1.10 is released. In a sense, this PR is fixing something in scikit-learn that would have been broken.

@jjerphan
Copy link
Member

jjerphan commented Aug 8, 2022

OK, I added them regarding @lesteve's last remark:

Note: the change in scipy breaks backward-compatibility for the rank in the cv_results_ attribute. For nan scores, the associated rank will be np.iinfo(np.int32).min i.e. -2147483648. I am not sure how much the ranks are used in general (I would guess not used very much) and how acceptable such a breaking change is.

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 8, 2022

As for @lesteve's comment, we can think about backporting this change to 1.1.3 so that 1.1.X works with SciPy 1.10 with the same behavior as SciPy 1.9. Note, by the time SciPy 1.10 is released, I suspect we would have already released 1.2, which would include this PR.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @lesteve. Let's add a changelog entry targeting scikit-learn 1.2.

I we ever decide to backport this to 1.1.3 we will move the entry then.

@lesteve
Copy link
Member Author

lesteve commented Aug 16, 2022

OK I have added a what's new entry

@ogrisel ogrisel merged commit 98cb31f into scikit-learn:main Aug 16, 2022
@ogrisel
Copy link
Member

ogrisel commented Aug 16, 2022

Thanks for the fix!

@lesteve lesteve deleted the handle-rankdata-nan branch August 16, 2022 08:53
@lesteve
Copy link
Member Author

lesteve commented Aug 16, 2022

Great, the scipy-dev build may be green tomorrow morning which hasn't happened in a while 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants