Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strict_na keyword to the assert_.._equal methods for object dtype to help with deprecation #58072

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jorisvandenbossche
Copy link
Member

Adding a strict_na=False option to our assert functions (the default of True keeps the new behaviour enforced on main). This keyword is only relevant for object dtype (or when using that code path when specifying check_dtype=False and having different dtypes).

This was brought up in the original PR, see #52081 (comment), and also discussed in the original issue (#18463)

This obviously needs some more docstring and test updates, but first wanted to see how this is received

@jorisvandenbossche jorisvandenbossche added the Testing pandas testing functions or related to the test suite label Mar 29, 2024
@jbrockmendel
Copy link
Member

Is the idea to keep this forever or just a lengthened deprecation cycle?

could be modestly be more strict by disallowing egregiously mismatched cases e.g. np.dt64(nat) vs np.td64(nat)?

@jorisvandenbossche
Copy link
Member Author

or just a lengthened deprecation cycle?

No, it's actually just to make it easier to use, because for certain use cases the previous behaviour was a lot more convenient (so the title "help with deprecation" is a bit wrong, it's not to help with the deprecation process itself, but just to help live with the new enforced behaviour).

@jorisvandenbossche
Copy link
Member Author

could be modestly be more strict by disallowing egregiously mismatched cases e.g. np.dt64(nat) vs np.td64(nat)?

That again would then make the keyword logic and implementation more complex. So unless there is a use case for this, I personally wouldn't do that.
(in the end, the keyword that I am adding in this PR needs to be explicitly enabled, so the user of this typically knows why they are specifying it)

@WillAyd
Copy link
Member

WillAyd commented Apr 2, 2024

What is the envisioned scope of this keyword? Does it treat None, pd.NA, pa.null and np.nan all the same? Or just a subset of those?

@WillAyd WillAyd added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Apr 2, 2024
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Apr 2, 2024

It will treat anything that pd.isna(..) considers as missing (for scalar input) as equal.
(so not a subset, as all those scalars can occur in an object-dtype column)

Copy link
Contributor

github-actions bot commented May 3, 2024

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Stale Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants