-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: pd.NA.__format__ fails with format_specs #34740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@topper-123 Thanks for looking into this! Personally, instead of relying on a try/except of NaN to check what is supported, I would rather try to understand how and what works for NaN, and try to implement the same logic here. For example, I suppose that Now, properly implementing |
Hmm, np.nan is just a float, so using the builtin Another idea: how robust would it be if we format some other value (eg np.nan), and then replace "nan" with " |
Wouldn't work out of the box, e.g. Another idea: def __format__(self, format_spec):
try:
return self.__repr__().__format__(format_spec)
except ValueError:
return self.__repr__() This would allow string format_spec to work (as they do for floats already) and make self.repr() a fallback that always works. |
I don't fully know how the inner python details of this method work, but I suppose the above would end up calling
I think that is certainly better (avoiding only accepting the rules valid for float), but that still wouldn't work for the example I gave of (now, it's certainly already fixing a set of use cases, so could also be a good start) |
Very quick try with
works for the example you gave, and also for the example I gave:
Of course, the above still needs 1) take the 1 char length difference into account in case there is whitespace (like the second example) and 2) still fallback to formatting with the string repr and finally the plain <NA> string repr (like your example impl at #34740 (comment)). |
Yeah, The length format spec would be one special case that would need to be handled, but are there other? I don't think so for floats, but there could be for other format_specs? |
I've made the simpler implementation that I suggested. I'm a bit hesitant that adding the special cases will make this too complex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm cc @jorisvandenbossche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am fine with the simplest solution that at least fixes the basic formatting, for now. I still think it wouldn't be hard to support proper floating point / numeric formatting (with the NaN formatting and replace afterwards)
thanks @topper-123 |
pd.NA
fails if passed to a format string and format parameters are supplied. This is different behaviour thannp.nan
and makes converting arrays containingpd.NA
to strings very brittle and annoying.Examples:
The new behaviour mirrors the behaviour of
np.nan
.