-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Json fix normalize #49920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json fix normalize #49920
Conversation
pandas/io/json/_normalize.py
Outdated
@@ -153,7 +153,7 @@ def _normalise_json( | |||
# to avoid adding the separator to the start of every key | |||
# GH#43831 avoid adding key if key_string blank | |||
key_string=new_key | |||
if new_key[: len(separator)] != separator | |||
if key_string or new_key[: len(separator)] != separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does key_string
being false imply that
new_key[: len(separator)] == separator
?
If so, then can this be simplified to just if key_string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implication if set is that you are within at least one recursive call. Seems like the string substitution in place should only effect the very top of the hierarchy.
Probably a cleaner way to represent it - this was just a quick bolt on to the existing code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry not sure I understand the reply (or perhaps my suggestion was unclear) - I was suggesting:
_normalise_json(
data=value,
# to avoid adding the separator to the start of every key
# GH#43831 avoid adding key if key_string blank
key_string=new_key if key_string else removeprefix(new_key, separator),
normalized_dict=normalized_dict,
separator=separator,
)
because if key_string
is falsey, then new_key[: len(separator)] != separator
must also be falsey, and so the latter isn't needed (if you have a or b
and you know not a implies not b
, then a or b
is the same as a
). Wouldn't this also be a quick bolt-on to the existing code, but simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea. We still support Python 3.8 though right? I think removeprefix was added in 3.9
Can also move this out of the argument list if that helps readability - even the way it was I agree is less than desirable with readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but there's a 3.8 version of it in pandas, which is used in some places, e.g.
pandas/pandas/core/ops/common.py
Lines 56 to 64 in 4a5d77f
if sys.version_info < (3, 9): | |
from pandas.util._str_methods import ( | |
removeprefix, | |
removesuffix, | |
) | |
stripped_name = removesuffix(removeprefix(name, "__"), "__") | |
else: | |
stripped_name = name.removeprefix("__").removesuffix("__") |
If you write it like that (with the if sys.version_info < (3, 9):
check) then pyupgrade
will automatically only keep the 3.9+ version when pandas drops 3.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah very nice. Cool let me do a refactor with this should make things cleaner
pandas/io/json/_normalize.py
Outdated
@@ -148,13 +149,13 @@ def _normalise_json( | |||
if isinstance(data, dict): | |||
for key, value in data.items(): | |||
new_key = f"{key_string}{separator}{key}" | |||
|
|||
if not key_string: | |||
new_key = removeprefix(new_key, separator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only available in python3.8 and under - this is on purpose, because it "forces" you to write it like
new_key = removeprefix(new_key, separator) | |
if sys.version_info < (3, 9): | |
from pandas.util._str_methods import removeprefix | |
new_key = removeprefix(new_key, separator) | |
else: | |
new_key = new_key.removeprefix(separator) |
and then when Python3.8 is dropped, pyupgrade will rewrite this automatically to only keep
new_key = new_key.removeprefix(separator)
(you can see what will happen with pyupgrade pandas/io/json/_normalize.py --py39-plus
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha - sorry misunderstood before thought that compat was handled directly in pandas.util._str_methods
pandas/io/json/_normalize.py
Outdated
@@ -21,6 +21,7 @@ | |||
Scalar, | |||
) | |||
from pandas.util._decorators import deprecate | |||
from pandas.util._str_methods import removeprefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pandas.util._str_methods import removeprefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @WillAyd !
(as an aside, this could probably be rewritten better without recursion? I'll take a look when I get a chance)
@@ -148,13 +149,18 @@ def _normalise_json( | |||
if isinstance(data, dict): | |||
for key, value in data.items(): | |||
new_key = f"{key_string}{separator}{key}" | |||
|
|||
if not key_string: | |||
if sys.version_info < (3, 9): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli if we use if not PY310
where PY310
is from pandas.compat
would pyupgrade still flag this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wouldn't, no, pyupgrade just does static analysis (it wouldn't know what the symbol PY310
means) - in fact, I was kinda tempted to replace all the PY310
and other pandas.compat
constants with sys.version_info
checks, so we don't need to remember what to clean up when dropping versions each year
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be open to this change.
Thanks @WillAyd |
This patch may have induced a potential regression. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected appear as subbullets. This is a partially automated message.
|
@rhshadrach awesome bot. Will take a look - moving the import to the global space might help |
closes #49861