0% found this document useful (0 votes)

273 views17 pages

All Pandas Json - Normalize

The document discusses how to use Pandas' json_normalize() function to flatten JSON data into Pandas DataFrames for analysis. It covers flattening simple and nested JSON objects as well as lists of JSON objects. Examples are provided for flattening different JSON structures like those with multiple levels, nested lists, and missing keys. The max_level parameter is introduced to control how many nested levels to flatten.

Uploaded by

[email protected]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

273 views17 pages

All Pandas Json - Normalize

Uploaded by

[email protected]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B.

Chen | Towards Data Science

Open in app

Follow 592K Followers

This is your last free member-only story this month. Upgrade for unlimited access.

All Pandas json_normalize() you should know

for flattening JSON
Some of the most useful Pandas tricks

B. Chen Feb 22 · 8 min read

All Pandas json_normalize() you should know for flattening JSON (Image by Author using canva.com)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 1/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

Reading data is the first step in any data science project. As a machine learning
practitioner or a data scientist, you would have surely come across JSON (JavaScript
Object Notation) data. JSON is a widely used format for storing and exchanging data.
For example, NoSQL database like MongoDB store the data in JSON format, and REST
API’s responses are mostly available in JSON.

Although this format works well for storing and exchanging data, it needs to be
converted into a tabular form for further analysis. You are likely to deal with 2 types of
JSON structure, a JSON object or a list of JSON objects. In internal Python lingo, you are
most likely to deal with a dict or a list of dicts.

A dictionary and a list of dictionaries (Image by author)

In this article, you’ll learn how to use Pandas’s built-in function json_normalize() to
flatten those 2 types of JSON into Pandas DataFrames. This article is structured as
follows:

1. Flattening a simple JSON

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 2/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

2. Flattening a JSON with multiple levels

3. Flattening a JSON with a nested list

4. Ignoring KeyError if keys are not always present

5. Custom separator using sep

6. Adding prefix for meta and record data

7. Working with a local file

8. Working with a URL

Please check out Notebook for the source code.

1. Flattening a simple JSON

Let’s begin with 2 simple JSON, a simple dict and a list of simple dicts.

When the JSON is a simple dict

a_dict = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
}

df = pd.json_normalize(a_dict)

(image by author)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 3/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

The result looks great. Let’s take a look at the data types with df.info() . We can see

that columns that are numerical are cast to numeric types.

>>> df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 school 1 non-null object
1 location 1 non-null object
2 ranking 1 non-null int64
dtypes: int64(1), object(2)
memory usage: 152.0+ bytes

When the data is a list of dicts

json_list = [
{ 'class': 'Year 1', 'student number': 20, 'room': 'Yellow' },
{ 'class': 'Year 2', 'student number': 25, 'room': 'Blue' },
]

pd.json_normalize(json_list)

(image by author)

The result looks great. json_normalize() function is able to convert each record in the
list into a row of tabular form.

What about keys that are not always present, for example, num_of_students is not
available in the 2nd record.
https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 4/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

json_list = [
{ 'class': 'Year 1', 'num_of_students': 20, 'room': 'Yellow' },
{ 'class': 'Year 2', 'room': 'Blue' }, # no num_of_students
]

pd.json_normalize(json_list)

(image by author)

We can see that no error is thrown and those missing keys are shown as NaN .

2. Flattening a JSON with multiple levels

Pandas json_normalize() works great for simple JSON (known as flattened JSON).
What about JSON with multiple levels?

When the data is a dict

Let’s first take a look at the following dict:

json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': '[email protected]',
'general': '[email protected]'
},
'tel': '123456789',
}
}
}

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 5/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

The value of info is multiple levels (known as a nested dict). By calling

pd.json_normalize(json_obj) , we get:

The result looks great. All nested values are flattened and converted into separate
columns.

If you don’t want to dig all the way down to each value use the max_level argument.
With the argument max_level=1 , we can see that our nested value contacts is put up

into a single column info.contacts.

pd.json_normalize(data, max_level=1)

(image by author)

When the data is a list of dicts

json_list = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask'
}
}
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 6/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

'teachers': {
'math': 'Alan Turing',
'physics': 'Albert Einstein'
}
}
},
]

pd.json_normalize(json_list)

(image by author)

We can see that all nested values in each record of the list are flattened and converted
into separate columns. Similarly, we can use the max_level argument to limit the
number of levels, for example

pd.json_normalize(json_list, max_level=1)

(image by author)

3. Flattening JSON with a nested list

What about JSON with a nested list?

When the data is a dict

Let’s see how to flatten the following JSON into a DataFrame:

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 7/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

Notes the value of students is a nested list. By calling pd.json_normalize(json_obj) , we

get:

(image by author)

We can see that our nested list is put up into a single column students and other values
are flattened. How can we flatten the nested list? To do that, we can set the argument
record_path to ['students'] :

# Flatten students
pd.json_normalize(data, record_path=['students'])

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 8/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

(image by author)

The result looks great but doesn’t include school and tel. To include them, we can use
the argument meta to specify a list of metadata we want in the result.

pd.json_normalize(
json_obj,
record_path =['students'],
meta=['school', ['info', 'contacts', 'tel']],
)

(image by author)

When the data is a list of dicts

json_list = [
{
'class': 'Year 1',
https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 9/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

'student count': 20,

'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask'
}
},
'students': [
{
'name': 'Tom',
'sex': 'M',
'grades': { 'math': 66, 'physics': 77 }
},
{
'name': 'James',
'sex': 'M',
'grades': { 'math': 80, 'physics': 78 }
},
]
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {
'teachers': {
'math': 'Alan Turing',
'physics': 'Albert Einstein'
}
},
'students': [
{ 'name': 'Tony', 'sex': 'M' },
{ 'name': 'Jacqueline', 'sex': 'F' },
]
},
]

pd.json_normalize(json_list)

(image by author)

All nested lists are put up into a single column students and other values are flattened.
To flatten the nested list, we can set the argument record_path to ['students'] . Notices

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 10/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

that not all records have math and physics, and those missing values are shown as NaN .

pd.json_normalize(json_list, record_path=['students'])

(image by author)

If you would like to include other metadata use the argument meta :

pd.json_normalize(
json_list,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']]
)

(image by author)

4. The errors argument

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 11/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

The errors argument default to 'raise’ and will raise KeyError if keys listed in meta

are not always present. For example, the math teacher is not available from the second
record.

data = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {
'teachers': {
'math': 'Rick Scott',
'physics': 'Elon Mask',
}
},
'students': [
{ 'name': 'Tom', 'sex': 'M' },
{ 'name': 'James', 'sex': 'M' },
]
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {
'teachers': {
# no math teacher
'physics': 'Albert Einstein'
}
},
'students': [
{ 'name': 'Tony', 'sex': 'M' },
{ 'name': 'Jacqueline', 'sex': 'F' },
]
},
]

A KeyError will be thrown when trying to flatten the math.

pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 12/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

(image by author)

To work around it, set the argument errors to 'ignore' and those missing values are
filled with NaN .

pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
errors='ignore'
)

(image by author)

5. Custom Separator using the sep argument

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 13/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

By default, all nested values will generate column names separated by . . For example

info.teachers.math. To separate column names with something else, you can use the
sep argument.

pd.json_normalize(
data,
record_path =['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
sep='->'
)

(image by author)

6. Adding prefix for meta and record data

Sometimes, it may be more descriptive to add prefixes for the column names. To do that
for the meta and record_path , we can simply pass the string to the argument

meta_prefix and record_prefix respectively:

pd.json_normalize(
data,
record_path=['students'],
meta=['class'],
meta_prefix='meta-',
record_prefix='student-'
)

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 14/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

(image by author)

7. Working with a local file

Often, the JSON data you will be working on is stored locally as a .json file. However,
Pandas json_normalize() function only accepts a dict or a list of dicts. To work around it,
you need help from a 3rd module, for example, the Python json module:

import json
# load data using Python JSON module
with open('data/simple.json','r') as f:
data = json.loads(f.read())

# Flattening JSON data

pd.json_normalize(data)

data = json.loads(f.read()) loads data using Python json module. After that,
json_normalize() is called on the data to flatten it into a DataFrame.

8. Working with a URL

JSON is a standard format for transferring data in REST APIs. Often, you need to work
with API’s response in JSON format. The simplest way to do that is using the Python
request modules:

import requests

URL = 'http://raw.githubusercontent.com/BindiChen/machine-
learning/master/data-analysis/027-pandas-convert-
json/data/simple.json'
data = json.loads(requests.get(URL).text)

# Flattening JSON data

pd.json_normalize(data)

Conclusion
https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 15/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

Pandas json_normalize() function is a quick, convenient, and powerful way for

flattening JSON into a DataFrame.

I hope this article will help you to save time in flattening JSON data. I recommend you to
check out the documentation for the json_normalize() API and to know about other
things you can do.

Thanks for reading. Please check out the notebook for the source code and stay tuned if
you are interested in the practical aspect of machine learning.

You may be interested in some of my other Pandas articles:

Pandas cut() function for transforming numerical data into categorical data

Using Pandas method chaining to improve code readability

How to do a Custom Sort on Pandas DataFrame

All the Pandas shift() you should know for data analysis

When to use Pandas transform() function

Pandas concat() tricks you should know

Difference between apply() and transform() in Pandas

All the Pandas merge() you should know

Working with datetime in Pandas DataFrame

Pandas read_csv() tricks you should know

4 tricks you should know to parse date columns with Pandas read_csv()

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 16/17
5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B. Chen | Towards Data Science

Emails will be sent to [email protected].

Get this newsletter
Not you?

Python Pandas Data Science Json Data Analysis

About Help Legal

Get the Medium app

https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd 17/17

Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
English L@2reading PDF
50% (2)
English L@2reading PDF
213 pages
And Then We Will Be Okay
No ratings yet
And Then We Will Be Okay
16 pages
Pandas
No ratings yet
Pandas
41 pages
1000+ Core Java & Advance Java
No ratings yet
1000+ Core Java & Advance Java
24 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
How To Set Up MES-Driven Staging
No ratings yet
How To Set Up MES-Driven Staging
15 pages
Historical Development of Special Education
No ratings yet
Historical Development of Special Education
9 pages
(Online Teaching) b1 Preliminary For Schools Speaking Part 3 Vocabulary
0% (1)
(Online Teaching) b1 Preliminary For Schools Speaking Part 3 Vocabulary
9 pages
Pandas
No ratings yet
Pandas
13 pages
Catlog - Model-SE-LSC
No ratings yet
Catlog - Model-SE-LSC
39 pages
HunterSNUGSV UVM Resets Paper
No ratings yet
HunterSNUGSV UVM Resets Paper
13 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
Python Programming Pandas Across Examples
No ratings yet
Python Programming Pandas Across Examples
350 pages
Apuntes Azure Data Scientist
No ratings yet
Apuntes Azure Data Scientist
397 pages
Pandas
No ratings yet
Pandas
30 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Module 2.1 - Speaking Mathematically
No ratings yet
Module 2.1 - Speaking Mathematically
7 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
Unit 2
No ratings yet
Unit 2
81 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas
No ratings yet
Pandas
63 pages
DWV Unit1
No ratings yet
DWV Unit1
102 pages
Social Aspects of Interlanguage
No ratings yet
Social Aspects of Interlanguage
4 pages
Pandas
No ratings yet
Pandas
36 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
Pandas
No ratings yet
Pandas
94 pages
KSM Starter Smart Contract Security Audit Report Halborn v1 1
No ratings yet
KSM Starter Smart Contract Security Audit Report Halborn v1 1
51 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Python Lecture 5 (2025)
No ratings yet
Python Lecture 5 (2025)
29 pages
Chapter 2
No ratings yet
Chapter 2
63 pages
Ip Study
No ratings yet
Ip Study
18 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
DSP Unit-5 Updated
No ratings yet
DSP Unit-5 Updated
23 pages
UNIT II Material
No ratings yet
UNIT II Material
34 pages
Module 4
No ratings yet
Module 4
38 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas Library
No ratings yet
Pandas Library
15 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
09goods L Question - Wave On String (Eng)
No ratings yet
09goods L Question - Wave On String (Eng)
13 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
28 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
3BPROFED10 - Pansoy & Raymundo - Module2 - Lesson3
No ratings yet
3BPROFED10 - Pansoy & Raymundo - Module2 - Lesson3
33 pages
CSIT228 Object-Oriented Programming 2
No ratings yet
CSIT228 Object-Oriented Programming 2
8 pages
Pandas
No ratings yet
Pandas
29 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
2022 Term01 Practicals
No ratings yet
2022 Term01 Practicals
24 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Common Python Data Science Interview Questions1
No ratings yet
Common Python Data Science Interview Questions1
5 pages
NSTP Good Cetizenship Values
No ratings yet
NSTP Good Cetizenship Values
24 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Top Python Questions 1735201448
No ratings yet
Top Python Questions 1735201448
25 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
Lab Hands On 1 - Computing Systems (AKIA)
No ratings yet
Lab Hands On 1 - Computing Systems (AKIA)
6 pages
Pen Style Scrapbook Style Journal
No ratings yet
Pen Style Scrapbook Style Journal
27 pages
Scenario Series 19 - Handling JSON in Pyspark
No ratings yet
Scenario Series 19 - Handling JSON in Pyspark
8 pages
Arabic - Unicode Character Table
No ratings yet
Arabic - Unicode Character Table
4 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Rajnandini Resume
No ratings yet
Rajnandini Resume
2 pages
Enga8 Grammar Worksheet2 1 Present Perfect Affirmative
No ratings yet
Enga8 Grammar Worksheet2 1 Present Perfect Affirmative
1 page
Abstract in Prof. Ed. 10
No ratings yet
Abstract in Prof. Ed. 10
3 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
2.personal Pronouns
No ratings yet
2.personal Pronouns
1 page
Drama 7 Week 5 Lesson Plan
No ratings yet
Drama 7 Week 5 Lesson Plan
3 pages
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Contoh RPH
No ratings yet
Contoh RPH
3 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Webquest To The Old West
No ratings yet
Webquest To The Old West
7 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Gs Present Continuous - Exercises
No ratings yet
Gs Present Continuous - Exercises
4 pages
Query and Reporting Tools: Search Engine Architecture
No ratings yet
Query and Reporting Tools: Search Engine Architecture
5 pages
JSON A Panda Python
No ratings yet
JSON A Panda Python
3 pages
Find The Distance Between The Points
No ratings yet
Find The Distance Between The Points
7 pages
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
From Everand
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
Vere salazar
No ratings yet
Postgresql Jsonb: Learn This Powerful Tool By Example
From Everand
Postgresql Jsonb: Learn This Powerful Tool By Example
Mohammed N. S. Al Saadi
No ratings yet
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

All Pandas Json - Normalize

Uploaded by

All Pandas Json - Normalize

Uploaded by

5/11/2021 All Pandas json_normalize() you should know for flattening JSON | by B.

Chen | Towards Data Science

Follow 592K Followers

All Pandas json_normalize() you should know

B. Chen Feb 22 · 8 min read

A dictionary and a list of dictionaries (Image by author)

1. Flattening a simple JSON

2. Flattening a JSON with multiple levels

3. Flattening a JSON with a nested list

4. Ignoring KeyError if keys are not always present

5. Custom separator using sep

6. Adding prefix for meta and record data

7. Working with a local file

8. Working with a URL

Please check out Notebook for the source code.

1. Flattening a simple JSON

When the JSON is a simple dict

that columns that are numerical are cast to numeric types.

When the data is a list of dicts

2. Flattening a JSON with multiple levels

When the data is a dict

The value of info is multiple levels (known as a nested dict). By calling

into a single column info.contacts.

When the data is a list of dicts

3. Flattening JSON with a nested list

When the data is a dict

Notes the value of students is a nested list. By calling pd.json_normalize(json_obj) , we

When the data is a list of dicts

'student count': 20,

4. The errors argument

A KeyError will be thrown when trying to flatten the math.

5. Custom Separator using the sep argument

6. Adding prefix for meta and record data

meta_prefix and record_prefix respectively:

7. Working with a local file

# Flattening JSON data

8. Working with a URL

# Flattening JSON data

Pandas json_normalize() function is a quick, convenient, and powerful way for

You may be interested in some of my other Pandas articles:

Using Pandas method chaining to improve code readability

How to do a Custom Sort on Pandas DataFrame

When to use Pandas transform() function

Pandas concat() tricks you should know

Difference between apply() and transform() in Pandas

All the Pandas merge() you should know

Working with datetime in Pandas DataFrame

Pandas read_csv() tricks you should know

More tutorials can be found on my Github

Sign up for The Variable

Emails will be sent to [email protected].

Python Pandas Data Science Json Data Analysis

About Help Legal

Get the Medium app

You might also like