0% found this document useful (0 votes)

283 views3 pages

JSON A Panda Python

The document discusses how to read JSON data directly into a pandas DataFrame using the read_json() function. It provides an example of reading GitHub issue data into a DataFrame and summarizing the columns. Additionally, it demonstrates how to export a DataFrame back to JSON using the to_json() method and various output formats.

Uploaded by

Parra Victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

283 views3 pages

JSON A Panda Python

Uploaded by

Parra Victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Reading json directly into pandas

[Link]

Andy Hayden
Reading json directly into pandas
12 Jun 2013
New to pandas 0.12 release, is a read_json function (which uses the speedy ujson under the hood).
Andy Hayden

It's as easy as whacking in the path/url/string of a valid json:

Python Data
Hacker

In [1]: df = pd.read_json('[Link]

resume

Let's inspect a few columns to see how we've done:

In [2]: df[['created_at', 'title', 'body', 'comments']]
Out[2]:
created_at
0 2013-06-12 [Link]

title
DOC add to_datetime to [Link]

1 2013-06-12 [Link]

ci/after_script.sh missing?

2 2013-06-11 [Link]

ENH Prefer requests over urllib2

3 2013-06-11 [Link]
4 2013-06-11 [Link]

Nothing in docs about [Link]

DOC: Clarify quote behavior parameters

Either I'm being thick or `to_da

[Link] the moment we
There's nothing on the docs abou
I've been bit many times recentl

The parse_dates argument has a good crack at parsing any columns which look like they're dates, and
it's worked in this example (converting created_at to Timestamps). It looks carefully at the datatype and at
column names (you can pass also pass a column name explicitly to ensure it gets converted) to choose
which to parse.
After you've done some analysis in your favourite data analysis library, the corresponding to_json allows
you can export results to valid json.
In [4]: res = df[['created_at', 'title', 'body', 'comments']].head()
In [5]: res.to_json()
Out[5]: '{"created_at":{"0":1370695148000000000,"1":1370665875000000000,"2":1370656273000000000
Here, orient decides how we should layout the data:
orient : {'split', 'records', 'index', 'columns', 'values'},
default is 'index' for Series, 'columns' for DataFrame
The format of the JSON string
split : dict like
{index -> [index], columns -> [columns], data -> [values]}
records : list like [{column -> value}, ... , {column -> value}]
index : dict like {index -> {column -> value}}
columns : dict like {column -> {index -> value}}
values : just the values array
For example (note times have been exported as epoch, but we could have used iso via):
In [6]: res.to_json(orient='records')

1 de 3

23/12/14 03:17

Reading json directly into pandas

[Link]

Out[6]: '[{"created_at":1370695148000000000,"title":"CLN: refactored url accessing and filepath

Note, our times have been converted to unix timestamps (which also means we'd need to use the same
pd.to_datetime trick when read_json it back in). Also NaNs, NaTs and Nones will be converted to JSON's
null.
And save it to a le:
In [7]: res.to_json(file_name)
Useful.
Warning: read_json requires valid JSON, so doing something like will cause some Exception:
In [8]: pd.read_json("{'0':{'0':1,'1':3},'1':{'0':2,'1':4}}")
# ValueError, since this isn't valid JSON
In [9]: pd.read_json('{"0":{"0":1,"1":3},"1":{"0":2,"1":4}}')
Out[9]:
0

Just as an further example, here I can get all the issues from github (there's a limit of 100 per request), this
is how easy it is to extract data in pandas:
In [10]: page = 1
df = pd.read_json('[Link]
while df:
dfs[page] = df
page += 1
df = pd.read_json('[Link]
In [11]: [Link]() # 7 requests come back with issues
Out[11]: [1, 2, 3, 4, 5, 6, 7]
In [12]: df = [Link](dfs, ignore_index=True).set_index('number)
In [13]: df
Out[13]:
<class '[Link]'>
Int64Index: 613 entries, 3813 to 39
Data columns (total 18 columns):

2 de 3

assignee

body

613

non-null values

closed_at

comments

613

non-null values

comments_url

613

non-null values

created_at

613

non-null values

events_url
html_url

613
613

non-null values
non-null values

613

non-null values

labels
labels_url

613
613

non-null values
non-null values

23/12/14 03:17

Reading json directly into pandas

milestone
pull_request

586
613

non-null values
non-null values

state

613

non-null values

title

613

non-null values

updated_at

613

non-null values

url

613

non-null values

user

613

non-null values

[Link]

dtypes: datetime64[ns](1), int64(2), object(15)

In [14]: [Link]()
Out[14]:
count

613.000000

mean

3.590538

std
min

9.641128
0.000000

25%
50%

0.000000
1.000000

75%

4.000000

max

185.000000

dtype: float64
It deals with fairly moderately sized les fairly eciently, here's a 200Mb le (this is on my 2009 macbook
air, I'd expect times to be may be faster on better hardware e.g. an SSD):
In [15]: %time pd.read_json('[Link]')
CPU times: user 4.78 s, sys: 684 ms, total: 5.46 s
Wall time: 5.89 s
Thanks to wesm, jreback and Komnomnomnom for putting it together.
Ghostery ha bloqueado los comentarios creados por Disqus.

blog comments powered by Disqus

3 de 3

23/12/14 03:17

Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
41 pages
Handling JSON
No ratings yet
Handling JSON
4 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
Getting Started with Pandas DataFrames
No ratings yet
Getting Started with Pandas DataFrames
38 pages
Python Pandas Guide for Data Analysts
No ratings yet
Python Pandas Guide for Data Analysts
37 pages
pandas DataFrame API Guide
No ratings yet
pandas DataFrame API Guide
33 pages
Overview of Pandas DataFrames
No ratings yet
Overview of Pandas DataFrames
21 pages
Pandas
No ratings yet
Pandas
41 pages
Data Analysis with Pandas Overview
No ratings yet
Data Analysis with Pandas Overview
49 pages
Data Frame
No ratings yet
Data Frame
95 pages
JSON Functions in PySpark 1753482553
No ratings yet
JSON Functions in PySpark 1753482553
9 pages
Using rbind with Pandas DataFrames
No ratings yet
Using rbind with Pandas DataFrames
17 pages
Pandas Worksheets ALL
100% (1)
Pandas Worksheets ALL
8 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
P.no 35 To 52
No ratings yet
P.no 35 To 52
18 pages
Pandas Guide for Data Enthusiasts
No ratings yet
Pandas Guide for Data Enthusiasts
14 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
10 pages
Pandas for Data Analysis Beginners
No ratings yet
Pandas for Data Analysis Beginners
89 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Pandas Summarized Visually in 8
100% (2)
Pandas Summarized Visually in 8
8 pages
Scenario Series 19 - Handling JSON in Pyspark
No ratings yet
Scenario Series 19 - Handling JSON in Pyspark
8 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Module 4
No ratings yet
Module 4
38 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Unit Iv
No ratings yet
Unit Iv
63 pages
File Handling
No ratings yet
File Handling
13 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
Pandas DataFrame Notes - 12pages-Pages-1
No ratings yet
Pandas DataFrame Notes - 12pages-Pages-1
1 page
Pandas
No ratings yet
Pandas
57 pages
DWV Unit1
No ratings yet
DWV Unit1
102 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
4python Processing JSON Data
No ratings yet
4python Processing JSON Data
2 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
4 Data Transformation Using Pandas
No ratings yet
4 Data Transformation Using Pandas
59 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
12 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Python GPU DataFrames Guide
No ratings yet
Python GPU DataFrames Guide
2 pages
Python Pandas Cheat Sheet Guide
No ratings yet
Python Pandas Cheat Sheet Guide
11 pages
Python Pandas DataFrame Guide
No ratings yet
Python Pandas DataFrame Guide
53 pages
Pandas
No ratings yet
Pandas
25 pages
Line by Line 12 IP
No ratings yet
Line by Line 12 IP
21 pages
Advance Python Unit 4
No ratings yet
Advance Python Unit 4
13 pages
JSON Data Handling in Python Lab
No ratings yet
JSON Data Handling in Python Lab
5 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas Guide for Data Science
No ratings yet
Pandas Guide for Data Science
42 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Apache Log Data Processing with PySpark
No ratings yet
Apache Log Data Processing with PySpark
10 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
10 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
122 pages
DataFrame in Pandas
No ratings yet
DataFrame in Pandas
4 pages
FBI Secutary
100% (2)
FBI Secutary
136 pages
2015 Jweia BB Yt Car Cyclist
No ratings yet
2015 Jweia BB Yt Car Cyclist
16 pages
Cycling Aerodynamics: Drafting Effects
No ratings yet
Cycling Aerodynamics: Drafting Effects
4 pages
Deshidratación en Natación
No ratings yet
Deshidratación en Natación
13 pages
Kinematic Analysis Software Guide
No ratings yet
Kinematic Analysis Software Guide
222 pages
Documentacio Cubes 1.0
No ratings yet
Documentacio Cubes 1.0
167 pages
Z Order Curve
No ratings yet
Z Order Curve
16 pages
Westeros & Essos: A Comprehensive Map
No ratings yet
Westeros & Essos: A Comprehensive Map
1 page
Panciera AlicariaPlautusFestus 2007
No ratings yet
Panciera AlicariaPlautusFestus 2007
5 pages
Wa0007.
No ratings yet
Wa0007.
7 pages
T&D Supply System
No ratings yet
T&D Supply System
93 pages
Answers To Practice Questions For Convolution - PDF
No ratings yet
Answers To Practice Questions For Convolution - PDF
6 pages
Exalted Caste Book Twilight PDF
No ratings yet
Exalted Caste Book Twilight PDF
2 pages
Resume Sopheak Phat
No ratings yet
Resume Sopheak Phat
2 pages
Model 5000 Manual
No ratings yet
Model 5000 Manual
296 pages
Project GAIA-X: Federated Data Infrastructure
100% (2)
Project GAIA-X: Federated Data Infrastructure
56 pages
Photo Guide Siebel IP 17 Installtion
No ratings yet
Photo Guide Siebel IP 17 Installtion
13 pages
TSECO
No ratings yet
TSECO
2 pages
ERP Integration for MBA Students
No ratings yet
ERP Integration for MBA Students
19 pages
1.PFD 561-89-86933
No ratings yet
1.PFD 561-89-86933
8 pages
The Cosmic Pulse of Life - Trevor James Constable
88% (17)
The Cosmic Pulse of Life - Trevor James Constable
337 pages
Recent Advances in Observation Behavior
No ratings yet
Recent Advances in Observation Behavior
10 pages
Schiller AT1 Service Manual
100% (5)
Schiller AT1 Service Manual
114 pages
Physics Students' Catapult Guide
No ratings yet
Physics Students' Catapult Guide
6 pages
Monowall
No ratings yet
Monowall
168 pages
Chapter No. 1: 1.1 Problem Statement (FMEA) Failure Mode Effect Analysis of Welding Defects of Beam Assembly
No ratings yet
Chapter No. 1: 1.1 Problem Statement (FMEA) Failure Mode Effect Analysis of Welding Defects of Beam Assembly
27 pages
Anritsu Understanding Pim
100% (1)
Anritsu Understanding Pim
28 pages
GTPL Gujarati Power Plus Pack
0% (2)
GTPL Gujarati Power Plus Pack
1 page
Rectification Report: Before Corrective Action
No ratings yet
Rectification Report: Before Corrective Action
3 pages
Red Sea Shura Program 7 Civil Specs
No ratings yet
Red Sea Shura Program 7 Civil Specs
779 pages
Test Automation vs. Manual Testing
No ratings yet
Test Automation vs. Manual Testing
186 pages
Column Detail PDF
No ratings yet
Column Detail PDF
1 page
SY MTech Roll Call SemI 19-20
No ratings yet
SY MTech Roll Call SemI 19-20
9 pages
Lab Manual Mad III Year Cse Vi Sem
No ratings yet
Lab Manual Mad III Year Cse Vi Sem
55 pages
hrs1k (H)
No ratings yet
hrs1k (H)
3 pages
With Is With: The Electric Power Research Institute
No ratings yet
With Is With: The Electric Power Research Institute
17 pages
Factors Affecting Self Inductance Experiment
No ratings yet
Factors Affecting Self Inductance Experiment
8 pages
Tokenomy: Global Token Platform
No ratings yet
Tokenomy: Global Token Platform
88 pages

JSON A Panda Python

Uploaded by

JSON A Panda Python

Uploaded by

Reading json directly into pandas

It's as easy as whacking in the path/url/string of a valid json:

Let's inspect a few columns to see how we've done:

ENH Prefer requests over urllib2

Nothing in docs about [Link]

Either I'm being thick or `to_da

Reading json directly into pandas

Out[6]: '[{"created_at":1370695148000000000,"title":"CLN: refactored url accessing and filepath

Reading json directly into pandas

dtypes: datetime64[ns](1), int64(2), object(15)

blog comments powered by Disqus

You might also like