Skip to content

My feature branch to issue #19129 (read_json and orient='table' With Numeric Column) #60945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Other enhancements
- :meth:`DataFrame.plot.scatter` argument ``c`` now accepts a column of strings, where rows with the same string are colored identically (:issue:`16827` and :issue:`16485`)
- :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` methods ``sum``, ``mean``, ``median``, ``prod``, ``min``, ``max``, ``std``, ``var`` and ``sem`` now accept ``skipna`` parameter (:issue:`15675`)
- :class:`Rolling` and :class:`Expanding` now support aggregations ``first`` and ``last`` (:issue:`33155`)
- :func:`read_json` with ``orient="table"`` now correctly restores non-string column names when reading JSON data, ensuring that column names retain their original types as specified in the schema (:issue:`19129`).
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
- :meth:`DataFrameGroupBy.transform`, :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, :meth:`RollingGroupby.apply`, :meth:`ExpandingGroupby.apply`, :meth:`Rolling.apply`, :meth:`Expanding.apply`, :meth:`DataFrame.apply` with ``engine="numba"`` now supports positional arguments passed as kwargs (:issue:`58995`)
Expand Down
25 changes: 23 additions & 2 deletions pandas/io/json/_table_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,17 +366,29 @@ def parse_table_schema(json, precise_float: bool) -> DataFrame:
:class:`Index` name of 'index' and :class:`MultiIndex` names starting
with 'level_' are not supported.

To handle cases where column names are non-string types (e.g., integers),
all column names are first converted to strings when constructing the DataFrame.
After applying the correct data types using `astype(dtypes)`, the column names
are restored to their original types as specified in the schema.
This ensures compatibility with `to_json(orient="table")` while maintaining
the integrity of non-string column names.

See Also
--------
build_table_schema : Inverse function.
pandas.read_json
"""
table = ujson_loads(json, precise_float=precise_float)
col_order = [field["name"] for field in table["schema"]["fields"]]
col_order = [
field["name"] if isinstance(field["name"], str) else str(field["name"])
for field in table["schema"]["fields"]
]
df = DataFrame(table["data"], columns=col_order)[col_order]

dtypes = {
field["name"]: convert_json_field_to_pandas_type(field)
field["name"]
if isinstance(field["name"], str)
else str(field["name"]): convert_json_field_to_pandas_type(field)
for field in table["schema"]["fields"]
}

Expand All @@ -388,6 +400,15 @@ def parse_table_schema(json, precise_float: bool) -> DataFrame:

df = df.astype(dtypes)

# Convert column names back to their original types
original_types = {
str(field["name"])
if not isinstance(field["name"], str)
else field["name"]: field["name"]
for field in table["schema"]["fields"]
}
df.columns = [original_types[col] for col in df.columns]

if "primaryKey" in table["schema"]:
df = df.set_index(table["schema"]["primaryKey"])
if len(df.index.names) == 1:
Expand Down