Getting Web Data r5 Json Data
Getting Web Data r5 Json Data
Readme
License:
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
http://creativecommons.org/licenses/by-nc-sa/4.0/
gastonsanchez.com
CC BY-SA-NC 4.0
3 / 39
Lectures Menu
Slide Decks
1. Introduction
2. Reading files from the Web
3. Basics of XML and HTML
4. Parsing XML / HTML content
5. Handling JSON data
6. HTTP Basics and the RCurl Package
7. Getting data via Web Forms
8. Getting data via Web APIs
gastonsanchez.com
CC BY-SA-NC 4.0
4 / 39
JSON Data
gastonsanchez.com
CC BY-SA-NC 4.0
5 / 39
Goal
JSON
The goal of these slides is to provide an introduction
for handling JSON data in R
gastonsanchez.com
CC BY-SA-NC 4.0
6 / 39
Synopsis
In a nutshell
Well cover the following topics:
I
JSON Basics
gastonsanchez.com
CC BY-SA-NC 4.0
7 / 39
Some References
I
Introducing JSON
http://www.json.org/
R package RJSONIO
http://cran.r-project.org/web/packages/RJSONIO/index.html
R package jsonlite
http://cran.r-project.org/web/packages/jsonlite/vignettes/
json-mapping.pdf
R package rjson
http://cran.r-project.org/web/packages/rjson/index.html
gastonsanchez.com
CC BY-SA-NC 4.0
8 / 39
JSON Basics
gastonsanchez.com
CC BY-SA-NC 4.0
9 / 39
Basics First
Fundamentals
JSON stands for JavaScript Object Notation
and it is a format for representing data
I
lightweight format
widely popular
fairly simple
gastonsanchez.com
CC BY-SA-NC 4.0
10 / 39
Basics First
gastonsanchez.com
CC BY-SA-NC 4.0
11 / 39
Understanding JSON
gastonsanchez.com
CC BY-SA-NC 4.0
12 / 39
Understanding JSON
null
square brackets [ ]
true
curly brackets { }
false
number
string
gastonsanchez.com
CC BY-SA-NC 4.0
13 / 39
JSON Arrays
Unnamed Arrays
Square brackets [ ] are used for ordered unnamed arrays
I
[ 1, 2, 3, ...
]
]
Named Arrays
Curly brackets { } are used for named arrays
I
{ "dollars" :
{ "city" :
gastonsanchez.com
5, "euros" :
20, ...
"Berkeley", "state" :
"CA", ...
CC BY-SA-NC 4.0
14 / 39
JSON Arrays
Containers can be nested
Example A
Example B
[
"name": ["X", "Y", "Z"],
"grams": [300, 200, 500],
"qty": [4, 5, null],
"new": [true, false, true],
{ "name": "X",
"grams": 300,
"qty": 4,
"new": true },
{ "name": "Y",
"grams": 200,
"qty": 5,
"new": false },
{ "name": "Z",
"grams": 500,
"qty": null,
"new": true}
]
gastonsanchez.com
CC BY-SA-NC 4.0
15 / 39
Gender
male
female
male
female
male
male
male
unknown
Homeland
Tatooine
Naboo
Tatooine
Alderaan
Stewjon
Corellia
Naboo
Naboo
Born
41.9BBY
46BBY
19BBY
19BBY
57BBY
29BBY
82BBY
33BBY
Jedi
yes
no
yes
no
yes
no
no
no
gastonsanchez.com
CC BY-SA-NC 4.0
16 / 39
gastonsanchez.com
CC BY-SA-NC 4.0
17 / 39
{
"Name": [ "Anakin", "Amidala", "Luke", ... , "R2-D2" ],
"Gender": [ "male", "female", "male", ... , "unknown" ],
"Homeworld": [ "Tatooine", "Naboo", "Tatooine", ... , "Naboo" ],
"Born": [ "41.9BBY", "46BBY", "19BBY", ... , "33BBY" ],
"Jedi": [ "yes", "no", "yes", ... , "no" ]
}
gastonsanchez.com
CC BY-SA-NC 4.0
18 / 39
JSON R packages
gastonsanchez.com
CC BY-SA-NC 4.0
19 / 39
R packages
gastonsanchez.com
CC BY-SA-NC 4.0
20 / 39
R package RJSONIO
R package "RJSONIO"
If you dont have "RJSONIO" youll have to install it:
# install RJSONIO
install.packages("RJSONIO", dependencies = TRUE)
gastonsanchez.com
CC BY-SA-NC 4.0
21 / 39
R package RJSONIO
Main functions
There are 2 primary functions in "RJSONIO"
I
gastonsanchez.com
CC BY-SA-NC 4.0
22 / 39
toJSON()
Function toJSON()
toJSON(x, container = isContainer(x, asIs, .level),
collapse = "\n", ...)
I
gastonsanchez.com
CC BY-SA-NC 4.0
23 / 39
fromJSON()
Function fromJSON()
fromJSON(content, handler = NULL, default.size = 100,
depth = 150L, allowComments = TRUE, ...)
I
gastonsanchez.com
CC BY-SA-NC 4.0
24 / 39
gastonsanchez.com
Gender
male
female
male
female
male
male
male
unknown
Homeland
Tatooine
Naboo
Tatooine
Alderaan
Stewjon
Corellia
Naboo
Naboo
Born
41.9BBY
46BBY
19BBY
19BBY
57BBY
29BBY
82BBY
33BBY
Jedi
yes
no
yes
no
yes
no
no
no
CC BY-SA-NC 4.0
25 / 39
R Data Frame
# toy data
sw_data = rbind(
c("Anakin", "male", "Tatooine", "41.9BBY", "yes"),
c("Amidala", "female", "Naboo", "46BBY", "no"),
c("Luke", "male", "Tatooine", "19BBY", "yes"),
c("Leia", "female", "Alderaan", "19BBY", "no"),
c("Obi-Wan", "male", "Stewjon", "57BBY", "yes"),
c("Han", "male", "Corellia", "29BBY", "no"),
c("Palpatine", "male", "Naboo", "82BBY", "no"),
c("R2-D2", "unknown", "Naboo", "33BBY", "no"))
# convert to data.frame and add column names
swdf = data.frame(sw_data)
names(swdf) = c("Name", "Gender", "Homeworld", "Born", "Jedi")
swdf
##
##
##
##
##
##
##
##
##
gastonsanchez.com
CC BY-SA-NC 4.0
26 / 39
From R to JSON
# load RJSONIO
library(RJSONIO)
# convert R data.frame to JSON
sw_json = toJSON(swdf)
# what class?
class(sw_json)
## [1] "character"
# display JSON format
cat(sw_json)
##
##
##
##
##
##
##
{
"Name": [ "Anakin", "Amidala", "Luke", "Leia", "Obi-Wan", "Han", "Palpatine", "R2-D2" ],
"Gender": [ "male", "female", "male", "female", "male", "male", "male", "unknown" ],
"Homeworld": [ "Tatooine", "Naboo", "Tatooine", "Alderaan", "Stewjon", "Corellia", "Naboo", "Naboo" ],
"Born": [ "41.9BBY", "46BBY", "19BBY", "19BBY", "57BBY", "29BBY", "82BBY", "33BBY" ],
"Jedi": [ "yes", "no", "yes", "no", "yes", "no", "no", "no" ]
}
gastonsanchez.com
CC BY-SA-NC 4.0
27 / 39
From JSON to R
# convert JSON string to R list
sw_R = fromJSON(sw_json)
# what class?
class(sw_R)
## [1] "list"
# display JSON format
sw_R
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
$Name
[1] "Anakin"
"Amidala"
[7] "Palpatine" "R2-D2"
$Gender
[1] "male"
"female"
[8] "unknown"
"male"
$Homeworld
[1] "Tatooine" "Naboo"
[7] "Naboo"
"Naboo"
$Born
[1] "41.9BBY" "46BBY"
[8] "33BBY"
$Jedi
[1] "yes" "no"
"Leia"
"female"
"male"
"Obi-Wan"
"Han"
"male"
"male"
"19BBY"
"yes" "no"
gastonsanchez.com
"Luke"
"19BBY"
"yes" "no"
"57BBY"
"no"
"Corellia"
"29BBY"
"82BBY"
"no"
CC BY-SA-NC 4.0
28 / 39
gastonsanchez.com
CC BY-SA-NC 4.0
29 / 39
gastonsanchez.com
CC BY-SA-NC 4.0
30 / 39
File: miserables.js
Well read the miserables dataset from:
http://mbostock.github.io/protovis/ex/miserables.js
gastonsanchez.com
CC BY-SA-NC 4.0
31 / 39
Reading Issues
gastonsanchez.com
CC BY-SA-NC 4.0
32 / 39
Reading miserables.js
# load RJSONIO and jsonlite
library(RJSONIO)
library(jsonlite)
# url with JSON content
miser = "http://mbostock.github.io/protovis/ex/miserables.js"
# import content as text (character vector)
miserables = readLines(miser)
# eliminate first 11 lines (containing comments)
miserables = miserables[-c(1:11)]
gastonsanchez.com
CC BY-SA-NC 4.0
33 / 39
Once we have the JSON content in the proper shape, we can parse
it with fromJSON().
gastonsanchez.com
CC BY-SA-NC 4.0
34 / 39
# class
class(mis1)
## Error:
## [1] "list"
##
##
##
##
##
$ode
[1] 77
$ink
[1] 254
# names
names(mis1)
## [1] "ode" "ink"
gastonsanchez.com
CC BY-SA-NC 4.0
35 / 39
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
[[1]]
[[1]]$odeNam
[1] "Myriel"
[[1]]$rou
[1] 1
[[2]]
[[2]]$odeNam
[1] "Napoleon"
[[1]]
ourc arge
1
0
alu
1
[[2]]
ourc arge
2
0
alu
8
[[3]]
ourc arge
3
0
alu
10
[[2]]$rou
[1] 1
[[3]]
[[3]]$odeNam
[1] "Mlle. Baptistine"
[[3]]$rou
[1] 1
gastonsanchez.com
CC BY-SA-NC 4.0
36 / 39
$ode
[1] "data.frame"
$ink
[1] "data.frame"
## [1] 2
# names
names(mis2)
##
##
##
##
##
$ode
[1] 77
$ink
[1] 254
gastonsanchez.com
CC BY-SA-NC 4.0
37 / 39
##
##
##
##
##
##
##
##
##
##
##
##
odeNam rou
1
Myriel
1
2
Napoleon
1
3 Mlle. Baptistine
1
4
Mme. Magloire
1
5
Countess de Lo
1
gastonsanchez.com
1
2
3
4
5
CC BY-SA-NC 4.0
38 / 39
Parsing Differences
gastonsanchez.com
CC BY-SA-NC 4.0
39 / 39