Json fix normalize #49920

WillAyd · 2022-11-26T20:01:42Z

MarcoGorelli · 2022-11-26T22:29:34Z

pandas/io/json/_normalize.py

@@ -153,7 +153,7 @@ def _normalise_json(
                # to avoid adding the separator to the start of every key
                # GH#43831 avoid adding key if key_string blank
                key_string=new_key
-                if new_key[: len(separator)] != separator
+                if key_string or new_key[: len(separator)] != separator


Does key_string being false imply that

new_key[: len(separator)] == separator

?

If so, then can this be simplified to just if key_string?

The implication if set is that you are within at least one recursive call. Seems like the string substitution in place should only effect the very top of the hierarchy.

Probably a cleaner way to represent it - this was just a quick bolt on to the existing code

Sorry not sure I understand the reply (or perhaps my suggestion was unclear) - I was suggesting:

_normalise_json( data=value, # to avoid adding the separator to the start of every key # GH#43831 avoid adding key if key_string blank key_string=new_key if key_string else removeprefix(new_key, separator), normalized_dict=normalized_dict, separator=separator, )

because if key_string is falsey, then new_key[: len(separator)] != separator must also be falsey, and so the latter isn't needed (if you have a or b and you know not a implies not b, then a or b is the same as a). Wouldn't this also be a quick bolt-on to the existing code, but simpler?

Nice idea. We still support Python 3.8 though right? I think removeprefix was added in 3.9

Can also move this out of the argument list if that helps readability - even the way it was I agree is less than desirable with readability

True, but there's a 3.8 version of it in pandas, which is used in some places, e.g.

pandas/pandas/core/ops/common.py

Lines 56 to 64 in 4a5d77f

if sys.version_info < (3, 9):

from pandas.util._str_methods import (

removeprefix,

removesuffix,

)

stripped_name = removesuffix(removeprefix(name, "__"), "__")

else:

stripped_name = name.removeprefix("__").removesuffix("__")

If you write it like that (with the if sys.version_info < (3, 9): check) then pyupgrade will automatically only keep the 3.9+ version when pandas drops 3.8

Ah very nice. Cool let me do a refactor with this should make things cleaner

MarcoGorelli · 2022-11-27T20:51:59Z

pandas/io/json/_normalize.py

@@ -148,13 +149,13 @@ def _normalise_json(
    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{key_string}{separator}{key}"
+
+            if not key_string:
+                new_key = removeprefix(new_key, separator)


this is only available in python3.8 and under - this is on purpose, because it "forces" you to write it like

Suggested change

new_key = removeprefix(new_key, separator)

if sys.version_info < (3, 9):

from pandas.util._str_methods import removeprefix

new_key = removeprefix(new_key, separator)

else:

new_key = new_key.removeprefix(separator)

and then when Python3.8 is dropped, pyupgrade will rewrite this automatically to only keep

new_key = new_key.removeprefix(separator)

(you can see what will happen with pyupgrade pandas/io/json/_normalize.py --py39-plus)

Ah gotcha - sorry misunderstood before thought that compat was handled directly in pandas.util._str_methods

WillAyd · 2022-11-27T21:01:47Z

pandas/io/json/_normalize.py

@@ -21,6 +21,7 @@
    Scalar,
 )
 from pandas.util._decorators import deprecate
+from pandas.util._str_methods import removeprefix


Suggested change

from pandas.util._str_methods import removeprefix

MarcoGorelli

Looks good to me, thanks @WillAyd !

(as an aside, this could probably be rewritten better without recursion? I'll take a look when I get a chance)

mroeschke · 2022-11-28T18:26:06Z

pandas/io/json/_normalize.py

@@ -148,13 +149,18 @@ def _normalise_json(
    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{key_string}{separator}{key}"
+
+            if not key_string:
+                if sys.version_info < (3, 9):


@MarcoGorelli if we use if not PY310 where PY310 is from pandas.compat would pyupgrade still flag this?

it wouldn't, no, pyupgrade just does static analysis (it wouldn't know what the symbol PY310 means) - in fact, I was kinda tempted to replace all the PY310 and other pandas.compat constants with sys.version_info checks, so we don't need to remember what to clean up when dropping versions each year

I would be open to this change.

mroeschke · 2022-11-28T20:04:14Z

Thanks @WillAyd

rhshadrach · 2022-12-24T20:17:26Z

This patch may have induced a potential regression. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected appear as subbullets. This is a partially automated message.

https://asv-runner.github.io/asv-collection/pandas/#io.json.NormalizeJSON.time_normalize_json
- orient='columns'; frame='df'
- orient='columns'; frame='df_date_idx'
- orient='columns'; frame='df_int_float_str'
- orient='columns'; frame='df_int_floats'
- orient='columns'; frame='df_td_int_ts'
- orient='index'; frame='df'
- orient='index'; frame='df_date_idx'
- orient='index'; frame='df_int_float_str'
- orient='index'; frame='df_int_floats'
- orient='index'; frame='df_td_int_ts'
- orient='records'; frame='df'
- orient='records'; frame='df_date_idx'
- orient='records'; frame='df_int_float_str'
- orient='records'; frame='df_int_floats'
- orient='records'; frame='df_td_int_ts'
- orient='split'; frame='df'
- orient='split'; frame='df_date_idx'
- orient='split'; frame='df_int_float_str'
- orient='split'; frame='df_int_floats'
- orient='split'; frame='df_td_int_ts'
- orient='values'; frame='df'
- orient='values'; frame='df_date_idx'
- orient='values'; frame='df_int_float_str'
- orient='values'; frame='df_int_floats'
- orient='values'; frame='df_td_int_ts'

WillAyd · 2023-01-03T18:19:15Z

@rhshadrach awesome bot. Will take a look - moving the import to the global space might help

WillAyd added 2 commits November 26, 2022 11:58

added failing test

f7fb9a6

fix + whatsnew

e1d66f1

MarcoGorelli reviewed Nov 26, 2022

View reviewed changes

WillAyd added 2 commits November 27, 2022 12:45

Refactor for readability

3f54cd3

Merge remote-tracking branch 'upstream/main' into json-fix-normalize

17da75e

MarcoGorelli reviewed Nov 27, 2022

View reviewed changes

WillAyd commented Nov 27, 2022

View reviewed changes

Better compat

4e3f0e7

MarcoGorelli approved these changes Nov 27, 2022

View reviewed changes

MarcoGorelli added the IO JSON read_json, to_json, json_normalize label Nov 27, 2022

MarcoGorelli added this to the 2.0 milestone Nov 27, 2022

mroeschke reviewed Nov 28, 2022

View reviewed changes

mroeschke approved these changes Nov 28, 2022

View reviewed changes

mroeschke merged commit cd58f3b into pandas-dev:main Nov 28, 2022

WillAyd deleted the json-fix-normalize branch December 24, 2022 22:05

	if sys.version_info < (3, 9):
	from pandas.util._str_methods import (
	removeprefix,
	removesuffix,
	)

	stripped_name = removesuffix(removeprefix(name, "__"), "__")
	else:
	stripped_name = name.removeprefix("__").removesuffix("__")

Uh oh!

Json fix normalize #49920

Json fix normalize #49920

Uh oh!

Conversation

WillAyd commented Nov 26, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Nov 28, 2022

Uh oh!

rhshadrach commented Dec 24, 2022

Uh oh!

WillAyd commented Jan 3, 2023

Uh oh!

Uh oh!

MarcoGorelli left a comment •

edited

Loading