04 Visualizing Data
04 Visualizing Data
Visualizing Data
Discover the unexpected in
Garrett Grolemund
Master Instructor, RStudio
August 2014
1. Scatterplots
2. Bar charts
b. Positions
3. Histograms
c. Parameters
5. Saving Graphs
plot
plot(iris$Sepal.Width,
iris$Sepal.Length)
simple plots in R
x variable y variable
plot(iris$Sepal.Width ,iris$Sepal.Length )
plot
plot(iris$Sepal.Width,
iris$Sepal.Length)
• simple
• difficult to customize
ggplot2
ggplot2
ggplot2
ggplot2
ggplot2
ggplot2
50000
Price 40000
30000
20000
10000
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Year 1800
1100
1600
1000
1400
900 σ
1200
0.01
Price
Price
800
1000 0.1
700 0.2
800
600
600
500
400
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Year Year © 2014 RStudio, Inc. All rights reserved.
Winston Chang, http://shop.oreilly.com/product/0636920023135.do © 2014 RStudio, Inc. All rights reserved.
David B Sparks, http://bit.ly/hn54NW © 2014 RStudio, Inc. All rights reserved.
Violent
Crime
Density
1400
1200
1000
800
600
400
(quick) plots in R
●
How would you
40
describe this
●
●
relationship?
35 ●
●
● ●
● ●
● ●●
30 ● ● ●
● ● ● ● ●● ●
hwy ● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ●● ●● ● ● ● ●
25 ● ● ● ●● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
15 ● ●● ●●● ● ●
● ●
2 3 4 5 6 7
displ
qplot(displ, hwy, data = mpg) © 2014 RStudio, Inc. All rights reserved.
The greatest value of a picture
is when it forces us to notice
what we never expected to
see.
– John Tukey
What other variables would help us
●
40
●
understand this pattern?
●
manufacturer, model, cyl, trans, drv, class
●
35 ●
●
● ●
● ●
● ●●
30 ● ● ●
● ● ● ● ●● ●
hwy ● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ●● ●● ● ● ● ●
25 ● ● ● ●● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
15 ● ●● ●●● ● ●
● ●
2 3 4 5 6 7
displ
qplot(displ, hwy, data = mpg) © 2014 RStudio, Inc. All rights reserved.
Studio
Additional variables
Aesthetics
Visual characteristics that can be mapped to data
1.4 1.4 1.4 1.4
1
0.8 0.8 0.8 0.8
0.6 0.8 1.0 1.2 1.4 0.6 0.8 1.0 1.2 1.4 0.6 0.8 1.0 1.2 1.4 0.6 0.8 1.0 1.2 1.4
1 1 1 1
Aesthetics
aesthetic
variable to
feature map it to
40
●
●
35 ●
●
● ● class
● ●
● ●●
● 2seater
30 ● ● ● ● compact
● ● ● ● ●● ●
hwy ● ● ● ● ●
● midsize
● ● ●● ● ● ● ● ● minivan
● ● ● ●● ● ●● ●● ● ● ● ●
25 ● ● ● ●● ● ● ● ● ● pickup
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● subcompact
● ● ● ● ● ● suv
● ●
20 ● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
15 ●
●●
●●
● ●
●●● ● ●
Legend chosen
● ●
and displayed
automatically.
●
2 3 4 5 6 7
displ
qplot(displ, hwy, data = mpg, color = class) © 2014 RStudio, Inc. All rights reserved.
Your turn
Add color, size, and shape aesthetics to
your graph. Experiment.
Color
Size
Shape
dark blue
Linear mapping
Size Discrete size steps between radius and
value
Faceting
Summary
●
●
●
●
● ●
● ●
● ●●
● ● ●
● ● ● ● ●● ●
● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ●● ●● ● ● ● ●
● ● ● ●● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
● ●● ●●● ● ●
● ●
●
●
●
●
● ●
● ●
● ●●
● ● ●
● ● ● ● ●● ●
● ● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●● ● ●● ●● ● ● ● ●
● ● ● ●● ● ● ● ●
●● ●● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ● ●●
● ● ● ●● ●
● ● ● ● ● ●●
●● ●● ● ●● ● ● ● ● ● ●
●● ● ●
● ●● ●●● ● ●
● ●
Geometric object
the "type" of graph, or
data set
x variable y variable type of plot
variables are in
40
●
●
35 ●
●
● ●
● ●
● ●
30 ● ●
● ● ●
hwy
● ● ●
● ● ● ●
● ● ● ● ●
25 ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ●
20 ● ● ●
● ●
● ●
● ● ●
● ●
15 ● ●
●
● ●
boxplots
●
●
●
●
●
●
● ●
● ●
● ●
● ● ●
● ● ●
● ● ● ●
● ●
● ● ●
● ●
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ●
Studio
boxplots
● ● ●
● ●
●
●
●
●
●
●
●
●
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ●
● ●
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ● ●
Studio
boxplots
● ● ●
● ●
●
●
●
●
●
●
median
●
● ● ●
●
●
●
●
●
●
(50th percentile)
● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ●
● ●
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ● ●
Studio
boxplots
● ● ●
● ●
●
●
●
●
●
●
median
●
● ● ●
●
●
●
●
●
●
(50th percentile)
● ● ● ●
● ● ● ●
● ●
●
● ●
●
●
(25th percentile)
● ● ●
● ●
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ● ●
Studio
boxplots
● ● ●
● ●
●
●
●
●
●
●
(75th percentile)
●
median
●
● ● ●
●
●
●
●
●
●
(50th percentile)
● ● ● ●
● ● ● ●
● ●
●
● ●
●
●
(25th percentile)
● ● ●
● ●
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ● ●
Studio
boxplots
● ● ●
Outliers
● ●
●
●
●
●
●
●
●
●
Common
Typical
● ● ●
● ● ● values values
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ●
● ●
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ● ●
Studio
boxplots
● ● ●
● ●
●
●
●
●
●
●
●
● Inter-Quartile Range
(IQR)
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ●
● ●
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ● ●
Studio
boxplots
● ● ●
● ●
●
●
● 1.5 x IQR
●
●
●
●
● Inter-Quartile Range
(IQR)
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ●
● ● ●
● ● 1.5 x IQR
● ●
© 2014 RStudio, Inc. All rights reserved.
● ● ● ●
How could we make the
relationship between class
and hwy easier to read?
Help pages
Tips:
Usage
Good place to spot default values
Arguments
explanation of each argument
Value
what the function returns
Examples
Most helpful section!
© 2014 RStudio, Inc. All rights reserved.
qplot(reorder(class, hwy, FUN = median), hwy, data = mpg,
geom = "boxplot") © 2014 RStudio, Inc. All rights reserved.
http://docs.ggplot2.org/current/
© 2014 RStudio, Inc. All rights reserved.
Diamonds
Studio
Fill
geoms that span an area have both a color aesthetic and a fill
aesthetic.
cut
3000
Fair
Good
count
Very Good
Premium
2000
Ideal
1000
D E F G H I J
color
geom == "bar",
color, data = diamonds, geom
qplot(color, "bar", fill
fill == cut)
cut
© 2014 RStudio, Inc. All rights reserved.
qplot(color, data = diamonds, geom = "bar", fill = cut)
© 2014 RStudio, Inc. All rights reserved.
Studio
position adjustment
How your graph arranges geoms that overlap with each
other.
identity
(overlaps, last on top)
fill
(displays proportions)
this plot?
40
●
● ●
●
●
● ●
● ● ●
● ● ● ●
30 ● ● ●
hwy ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ●
● ● ●
●
20 ● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
●
●
10 15 20 25 30 35
cty
qplot(cty, hwy, data = mpg) © 2014 RStudio, Inc. All rights reserved.
● ●
this plot?
40
●
● ●
●
●
● ●
● ● ●
● ● ● ●
30 ● ● ●
hwy ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ●
● ● ●
20
The measurements were probably
● ● ●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
●
!
●
40
●
●
●
●●
35
●
● ●
● ● ●
●
● ●●● ● ●
●
●
30 ●
●
●
●
● ● ●● ●
●● ● ●
●
● ● ●●●● ● ●
● ●● ●
hwy
●● ●
● ● ●● ● ●
●
● ●●●● ● ●
● ●●
●● ● ● ●
●●● ●
●● ● ● ●
●
●
●
●●●
●●
●
● ●
● ● ● ●● ●●●
25 ● ●● ● ● ● ●
● ●
● ●●●
● ●
●●
● ● ●
●●● ●
● ●
●
● ● ●●
●
● ●
●
●
●
20 ●● ●● ● ●●●
●
●
● ●●●●●●● ●● ●
● ●
●
● ● ●● ● ●
●●
●●● ● ●
● ●●● ●
●
●
●●
●● ●●●●
● ●●●
●● ●●●
● ● ●●● ●
●
●
15 ●
●
●●●
●●
●
●
●
●●
●●
10 15 20 25 30 35
cty
qplot(cty, hwy, data = mpg, position = "jitter") © 2014 RStudio, Inc. All rights reserved.
● ●
40
●
●
●
●●
35
●
● ●
● ● ●
●
● ●●● ● ●
●
●
30 ●
●
●
●
● ● ●● ●
●● ● ●
●
● ● ●●●● ● ●
● ●● ●
hwy
●● ●
● ● ●● ● ●
●
● ●●●● ● ●
● ●●
●● ● ● ●
●●● ●
●● ● ● ●
●
●
●
●●●
●●
●
● ●
● ● ● ●● ●●●
25 ● ●● ● ● ● ●
● ●
● ●●●
● ●
●●
● ● ●
●●● ●
● ●
●
● ● ●●
●
● ●
●
●
●
20 ●● ●● ● ●●●
●
●
● ●●●●●●● ●● ●
● ●
●
● ● ●● ● ●
●●
●●● ● ●
● ●●● ●
●
●
●●
●● ●●●●
● ●●●
●● ●●●
● ● ●●● ●
●
●
15 ●
●
●●●
●●
●
●
●
●●
●●
10 15 20 25 30 35
cty
qplot(cty, hwy, data = mpg, geom = "jitter") © 2014 RStudio, Inc. All rights reserved.
Studio
jittering
The jittering adjustment adds random noise to each point.
As a result they are unlikely to overlap.
Summary
method effect
20 30 40
x © 2014 RStudio, Inc. All rights reserved.
binwidth
12
12
11
10
8 9
8 8
7 7
6 6
5
5 5 5 4
3 4 4 4 4
3 3 3 3 3 1
2 2 2 2 2
1 1 1 1 1 1
20 30 40
x © 2014 RStudio, Inc. All rights reserved.
12
count
5
4
3
1
20 30 40
x © 2014 RStudio, Inc. All rights reserved.
binwidth
10
6
5
4
3 3
1 1
20 30 40
x © 2014 RStudio, Inc. All rights reserved.
20 30 40
x © 2014 RStudio, Inc. All rights reserved.
Studio
Parameters
Similar to aesthetics.
Parameters
parameter
value
name
e n
e r
i ff r s
d t e
se e
u a m
s a r
o m p
e nd
t g a
e n i c s
e r e t
i ff t h
qplot(carat, data = diamonds) D es
stat_bin: binwidth defaulted to range/30. a© 2014 RStudio, Inc. All rights reserved.
Studio
Additional variables
count 3000
2000
1000
56 58 60 62 64 66 68 70
zoom <- coord_cartesian(xlim
depth= c(55, 70))
qplot(depth, data = diamonds, binwidth = 0.2) + zoom
© 2014 RStudio, Inc. All rights reserved.
4000
3000
cut
Fair
Good
count
1000
2500
2000
1500
1000
500
0
count
Premium Ideal
But hard to
2500
compare because:
2000
1. separated into
1500 separate facets
1000 2. shape
500 compressed for
0
smaller groups
56 58 60 62 64 66 68 70 56 58 60 62 64 66 68 70 56 58 60 62 64 66 68 70
depth
qplot(depth, data = diamonds, binwidth = 0.2) +
zoom + facet_wrap(~ cut) © 2014 RStudio, Inc. All rights reserved.
What if we just drew a line along the tops of the
histograms, and threw away the bars?
© 2014 RStudio, Inc. All rights reserved.
freqpoly
# install.packages("hexbin")
qplot(carat, price, data = diamonds, geom = "hex")
© 2014 RStudio, Inc. All rights reserved.
density2d
color
group
se
method
r? e
te fil
pu at
getwd()
m th
co nd
ur fi
yo ouy
on ld
ou
C
© 2014 RStudio, Inc. All rights reserved.
Studio
Working directory
When you start R, it associates itself with a folder
(i.e, directory) on your computer.
Saving plots
# Uses size on screen:
ggsave("my-plot.pdf")
ggsave("my-plot.png")
!
25
hwy
20
15
2 3 4 5
displ
17 5 8 suv
30
20 2.7 4 pickup
17 4 6 suv
25 2.8 6 compact
27 3.1 6 compact
30 2 4 compact
25
25 2.8 6 compact
23 2.8 6 compact
26 3 6 midsize
17 5.4 8 pickup
hwy
28 2.5 5 subcompact
20
29 3.5 6 midsize
26 2.4 4 midsize
29 2 4 midsize
15 5.4 8 pickup
29 1.8 4 compact
15
18 5.7 8 suv
12 4.7 8 pickup
26 2.8 6 compact
24 3.3 6 minivan
2 3 4 5
displ
17 5 8 suv
30
20 2.7 4 pickup
17 4 6 suv
25 2.8 6 compact
27 3.1 6 compact
30 2 4 compact
25
25 2.8 6 compact
23 2.8 6 compact
26 3 6 midsize
17 5.4 8 pickup
hwy
28 2.5 5 subcompact
20
29 3.5 6 midsize
26 2.4 4 midsize
29 2 4 midsize
15 5.4 8 pickup
29 1.8 4 compact
15
18 5.7 8 suv
12 4.7 8 pickup
26 2.8 6 compact
24 3.3 6 minivan
2 3 4 5
displ
Data Geom Coordinate system © 2014 RStudio, Inc. All rights reserved.
Aesthetic mappings
17 5 8 suv
30
20 2.7 4 pickup
17 4 6 suv
25 2.8 6 compact
27 3.1 6 compact
30 2 4 compact
25
25 2.8 6 compact
23 2.8 6 compact
26 3 6 midsize
17 5.4 8 pickup
hwy
28 2.5 5 subcompact
20
29 3.5 6 midsize
26 2.4 4 midsize
29 2 4 midsize
15 5.4 8 pickup
29 1.8 4 compact
15
18 5.7 8 suv
12 4.7 8 pickup
26 2.8 6 compact
24 3.3 6 minivan
2 3 4 5
displ
Data Geom Coordinate system © 2014 RStudio, Inc. All rights reserved.
Aesthetic mappings
color
hwy disp cyl class
17 5 8 suv
30
20 2.7 4 pickup
17 4 6 suv
25 2.8 6 compact
27 3.1 6 compact
30 2 4 compact
25
25 2.8 6 compact
23 2.8 6 compact
26 3 6 midsize
17 5.4 8 pickup
hwy
28 2.5 5 subcompact
20
29 3.5 6 midsize
26 2.4 4 midsize
29 2 4 midsize
15 5.4 8 pickup
29 1.8 4 compact
15
18 5.7 8 suv
12 4.7 8 pickup
26 2.8 6 compact
24 3.3 6 minivan
2 3 4 5
displ
Data Geom Coordinate system © 2014 RStudio, Inc. All rights reserved.
Aesthetic mappings
y x color
Position Adjustment
hwy disp cyl class
17 5 8 suv
30
20 2.7 4 pickup
17 4 6 suv
25 2.8 6 compact
27 3.1 6 compact
30 2 4 compact
25
25 2.8 6 compact
23 2.8 6 compact
26 3 6 midsize
17 5.4 8 pickup
hwy
28 2.5 5 subcompact
20
29 3.5 6 midsize
26 2.4 4 midsize
29 2 4 midsize
15 5.4 8 pickup
29 1.8 4 compact
15
18 5.7 8 suv
12 4.7 8 pickup
26 2.8 6 compact
24 3.3 6 minivan
2 3 4 5
displ
Data Geom Coordinate system © 2014 RStudio, Inc. All rights reserved.
Aesthetic mappings Facet (or not)
y x color
Position Adjustment
hwy disp cyl class
17 5 8 suv 4 5
30 ●
20 2.7 4 pickup ●●
●
17 4 6 suv
●
25 2.8 6 compact 25
27 3.1 6 compact
30 2 4 compact 20 ●
25 2.8 6 compact
23 2.8 6 compact
15
26 3 6 midsize
17 5.4 8 pickup
hwy
28 2.5 5 subcompact 6 8
30
29 3.5 6 midsize ●
26 2.4 4 midsize ●
●●
29 2 4 midsize 25 ●
●
●
15 5.4 8 pickup
29 1.8 4 compact
20
18 5.7 8 suv ●
● ● ●
12 4.7 8 pickup
15 ●
26 2.8 6 compact
24 3.3 6 minivan ●
2 3 4 5 2 3 4 5
displ
Data Geom Coordinate system © 2014 RStudio, Inc. All rights reserved.
© 2014 RStudio, Inc. All rights reserved.