0% found this document useful (0 votes)
273 views17 pages

All Pandas Json - Normalize

The document discusses how to use Pandas' json_normalize() function to flatten JSON data into Pandas DataFrames for analysis. It covers flattening simple and nested JSON objects as well as lists of JSON objects. Examples are provided for flattening different JSON structures like those with multiple levels, nested lists, and missing keys. The max_level parameter is introduced to control how many nested levels to flatten.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
273 views17 pages

All Pandas Json - Normalize

The document discusses how to use Pandas' json_normalize() function to flatten JSON data into Pandas DataFrames for analysis. It covers flattening simple and nested JSON objects as well as lists of JSON objects. Examples are provided for flattening different JSON structures like those with multiple levels, nested lists, and missing keys. The max_level parameter is introduced to control how many nested levels to flatten.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B.

Chen | Towards Data Science

Open in app

Follow 592K Followers

This is your last free member-only story this month. Upgrade for unlimited access.

All Pandas json_normalize() you should know


for flattening JSON
Some of the most useful Pandas tricks

B. Chen Feb 22 · 8 min read

All Pandas json_normalize() you should know for flattening JSON (Image by Author using canva.com)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 1/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

Reading data is the first step in any data science project. As a machine learning
practitioner or a data scientist, you would have surely come across JSON (JavaScript
Object Notation) data. JSON is a widely used format for storing and exchanging data.
For example, NoSQL database like MongoDB store the data in JSON format, and REST
API’s responses are mostly available in JSON.

Although this format works well for storing and exchanging data, it needs to be
converted into a tabular form for further analysis. You are likely to deal with 2 types of
JSON structure, a JSON object or a list of JSON objects. In internal Python lingo, you are
most likely to deal with a dict or a list of dicts.

A dictionary and a list of dictionaries (Image by author)

In this article, you’ll learn how to use Pandas’s built-in function json_normalize() to
flatten those 2 types of JSON into Pandas DataFrames. This article is structured as
follows:

1. Flattening a simple JSON

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 2/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

2. Flattening a JSON with multiple levels

3. Flattening a JSON with a nested list

4. Ignoring KeyError if keys are not always present

5. Custom separator using sep

6. Adding prefix for meta and record data

7. Working with a local file

8. Working with a URL

Please check out Notebook for the source code.

1. Flattening a simple JSON


Let’s begin with 2 simple JSON, a simple dict and a list of simple dicts.

When the JSON is a simple dict

a_dict = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
}

df = pd.json_normalize(a_dict)

(image by author)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 3/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

The result looks great. Let’s take a look at the data types with df.info() . We can see

that columns that are numerical are cast to numeric types.

>>> df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 school 1 non-null object
1 location 1 non-null object
2 ranking 1 non-null int64
dtypes: int64(1), object(2)
memory usage: 152.0+ bytes

When the data is a list of dicts

json_list = [
{ 'class': 'Year 1', 'student number': 20, 'room': 'Yellow' },
{ 'class': 'Year 2', 'student number': 25, 'room': 'Blue' },
]

pd.json_normalize(json_list)

(image by author)

The result looks great. json_normalize() function is able to convert each record in the
list into a row of tabular form.

What about keys that are not always present, for example, num_of_students is not
available in the 2nd record.
https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 4/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

json_list = [
{ 'class': 'Year 1', 'num_of_students': 20, 'room': 'Yellow' },
{ 'class': 'Year 2', 'room': 'Blue' }, # no num_of_students
]

pd.json_normalize(json_list)

(image by author)

We can see that no error is thrown and those missing keys are shown as NaN .

2. Flattening a JSON with multiple levels


Pandas json_normalize() works great for simple JSON (known as flattened JSON).
What about JSON with multiple levels?

When the data is a dict


Let’s first take a look at the following dict:

json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': '[email protected]',
'general': '[email protected]'
},
'tel': '123456789',
}
}
}

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 5/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

The value of info is multiple levels (known as a nested dict). By calling


pd.json_normalize(json_obj) , we get:

The result looks great. All nested values are flattened and converted into separate
columns.

If you don’t want to dig all the way down to each value use the max_level argument.
With the argument max_level=1 , we can see that our nested value contacts is put up

into a single column info.contacts.

pd.json_normalize(data, max_level=1)

(image by author)

When the data is a list of dicts

json_list = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask'
}
}
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 6/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

'teachers': {
'math': 'Alan Turing',
'physics': 'Albert Einstein'
}
}
},
]

pd.json_normalize(json_list)

(image by author)

We can see that all nested values in each record of the list are flattened and converted
into separate columns. Similarly, we can use the max_level argument to limit the
number of levels, for example

pd.json_normalize(json_list, max_level=1)

(image by author)

3. Flattening JSON with a nested list


What about JSON with a nested list?

When the data is a dict


Let’s see how to flatten the following JSON into a DataFrame:

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 7/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': '[email protected]',
'general': '[email protected]'
},
'tel': '123456789',
}
},
'students': [
{ 'name': 'Tom' },
{ 'name': 'James' },
{ 'name': 'Jacqueline' }
],
}

Notes the value of students is a nested list. By calling pd.json_normalize(json_obj) , we

get:

(image by author)

We can see that our nested list is put up into a single column students and other values
are flattened. How can we flatten the nested list? To do that, we can set the argument
record_path to ['students'] :

# Flatten students
pd.json_normalize(data, record_path=['students'])

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 8/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

(image by author)

The result looks great but doesn’t include school and tel. To include them, we can use
the argument meta to specify a list of metadata we want in the result.

pd.json_normalize(
json_obj,
record_path =['students'],
meta=['school', ['info', 'contacts', 'tel']],
)

(image by author)

When the data is a list of dicts

json_list = [
{
'class': 'Year 1',
https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 9/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

'student count': 20,


'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask'
}
},
'students': [
{
'name': 'Tom',
'sex': 'M',
'grades': { 'math': 66, 'physics': 77 }
},
{
'name': 'James',
'sex': 'M',
'grades': { 'math': 80, 'physics': 78 }
},
]
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {
'teachers': {
'math': 'Alan Turing',
'physics': 'Albert Einstein'
}
},
'students': [
{ 'name': 'Tony', 'sex': 'M' },
{ 'name': 'Jacqueline', 'sex': 'F' },
]
},
]

pd.json_normalize(json_list)

(image by author)

All nested lists are put up into a single column students and other values are flattened.
To flatten the nested list, we can set the argument record_path to ['students'] . Notices

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 10/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

that not all records have math and physics, and those missing values are shown as NaN .

pd.json_normalize(json_list, record_path=['students'])

(image by author)

If you would like to include other metadata use the argument meta :

pd.json_normalize(
json_list,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']]
)

(image by author)

4. The errors argument

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 11/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

The errors argument default to 'raise’ and will raise KeyError if keys listed in meta

are not always present. For example, the math teacher is not available from the second
record.

data = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask',
}
},
'students': [
{ 'name': 'Tom', 'sex': 'M' },
{ 'name': 'James', 'sex': 'M' },
]
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {
'teachers': {
# no math teacher
'physics': 'Albert Einstein'
}
},
'students': [
{ 'name': 'Tony', 'sex': 'M' },
{ 'name': 'Jacqueline', 'sex': 'F' },
]
},
]

A KeyError will be thrown when trying to flatten the math.

pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 12/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

(image by author)

To work around it, set the argument errors to 'ignore' and those missing values are
filled with NaN .

pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
errors='ignore'
)

(image by author)

5. Custom Separator using the sep argument

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 13/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

By default, all nested values will generate column names separated by . . For example

info.teachers.math. To separate column names with something else, you can use the
sep argument.

pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
sep='->'
)

(image by author)

6. Adding prefix for meta and record data


Sometimes, it may be more descriptive to add prefixes for the column names. To do that
for the meta and record_path , we can simply pass the string to the argument

meta_prefix and record_prefix respectively:

pd.json_normalize(
data,
record_path=['students'],
meta=['class'],
meta_prefix='meta-',
record_prefix='student-'
)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 14/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

(image by author)

7. Working with a local file


Often, the JSON data you will be working on is stored locally as a .json file. However,
Pandas json_normalize() function only accepts a dict or a list of dicts. To work around it,
you need help from a 3rd module, for example, the Python json module:

import json
# load data using Python JSON module
with open('data/simple.json','r') as f:
data = json.loads(f.read())

# Flattening JSON data


pd.json_normalize(data)

data = json.loads(f.read()) loads data using Python json module. After that,
json_normalize() is called on the data to flatten it into a DataFrame.

8. Working with a URL


JSON is a standard format for transferring data in REST APIs. Often, you need to work
with API’s response in JSON format. The simplest way to do that is using the Python
request modules:

import requests

URL = 'http://raw.githubusercontent.com/BindiChen/machine-
learning/master/data-analysis/027-pandas-convert-
json/data/simple.json'
data = json.loads(requests.get(URL).text)

# Flattening JSON data


pd.json_normalize(data)

Conclusion
https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 15/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

Pandas json_normalize() function is a quick, convenient, and powerful way for


flattening JSON into a DataFrame.

I hope this article will help you to save time in flattening JSON data. I recommend you to
check out the documentation for the json_normalize() API and to know about other
things you can do.

Thanks for reading. Please check out the notebook for the source code and stay tuned if
you are interested in the practical aspect of machine learning.

You may be interested in some of my other Pandas articles:


Pandas cut() function for transforming numerical data into categorical data

Using Pandas method chaining to improve code readability

How to do a Custom Sort on Pandas DataFrame

All the Pandas shift() you should know for data analysis

When to use Pandas transform() function

Pandas concat() tricks you should know

Difference between apply() and transform() in Pandas

All the Pandas merge() you should know

Working with datetime in Pandas DataFrame

Pandas read_csv() tricks you should know

4 tricks you should know to parse date columns with Pandas read_csv()

More tutorials can be found on my Github

Sign up for The Variable


By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 16/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

Emails will be sent to [email protected].


Get this newsletter
Not you?

Python Pandas Data Science Json Data Analysis

About Help Legal

Get the Medium app

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 17/17

You might also like