-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Adds support for drop + handle_unknown=ignore in the OneHotEncoder #19041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Adds support for drop + handle_unknown=ignore in the OneHotEncoder #19041
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should update the docstring of the OneHotEncoder
as well.
doc/modules/preprocessing.rst
Outdated
|
||
All the categories in `X_test` are unknown during transform and will be mapped | ||
to all zeros. This means that unknown categories will have the same mapping | ||
as the dropped category. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to show the inverse_transform
here.
Do you think it would be a good idea to expose an attribute containing the column with unknown categories? I am wondering if the warning will not be too much annoying. I am thinking that we could have 2 attributes, one containing the column indices and another the unknown categories, when it applies. In this case, we could avoid to warn but you could always check the attributes for sanity check? |
With If the goal is to avoid warnings, we can hope that the documentation is clear enough and remove the warning. |
Yep this is True. Since it would only be rare, we should not warn so much thought. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would be +0 for having handle_unknown == "ignore"
not warn but handle_unknown == "warn"
instead, but we can always do that in a latter PR.
In particular @amueller wasn't a big fan of warnings: #18072 (comment)
Reference Issues/PRs
Fixes #18072
What does this implement/fix? Explain your changes.
Adds support for the suggestions stated in #18072