Data Visualization Using R & Ggplot2: Karthik Ram October 6, 2013
Data Visualization Using R & Ggplot2: Karthik Ram October 6, 2013
Karthik Ram
October 6, 2013
Some housekeeping
Section 1
Why ggplot2?
Why ggplot2?
Why ggplot2?
Why ggplot2?
Section 2
The Grammar
Some terminology
data
Must be a data.frame
head(iris)
##
##
##
##
##
##
##
1
2
3
4
5
6
plyr
iris[1:2, ]
##
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1
5.1
3.5
1.4
0.2 setosa
## 2
4.9
3.0
1.4
0.2 setosa
# Note the use of the . function to allow Species to be used
# without quoting
ddply(iris, .(Species), summarize,
mean.Sep.Wid = mean(Sepal.Width, na.rm = TRUE))
##
Species mean.Sep.Wid
## 1
setosa
3.428
## 2 versicolor
2.770
## 3 virginica
2.974
reshape2
iris[1:2, ]
##
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1
5.1
3.5
1.4
0.2 setosa
## 2
4.9
3.0
1.4
0.2 setosa
df <- melt(iris, id.vars = "Species")
df[1:2, ]
##
Species
variable value
## 1 setosa Sepal.Length
5.1
## 2 setosa Sepal.Length
4.9
reshape2
df[1:2, ]
##
Species
variable value
## 1 setosa Sepal.Length
5.1
## 2 setosa Sepal.Length
4.9
dcast(df, Species variable, mean)
##
##
##
##
##
##
##
##
Section 3
Aesthetics
Some terminology
data
aesthetics
a.k.a. mapping
Section 4
Geoms
Some terminology
data
aesthetics
geometry
Basic structure
Quick note
4.0
Sepal.Width
3.0
2.0
2.5
3.5
Sepal.Length
4.0
Sepal.Width
3.5
3.0
2.5
2.0
Sepal.Length
Sepal.Width
4.0
3.5
3.0
2.5
2.0
Species
setosa
versicolor
virginica
Sepal.Length
Sepal.Width
4.0
3.5
3.0
Species
setosa
versicolor
virginica
2.5
2.0
5
Sepal.Length
Exercise 1
# Make a small sample of the diamonds dataset
d2 <- diamonds[sample(1:dim(diamonds)[1], 1000), ]
price
15000
10000
5000
carat
color
Section 5
Stats
Some terminology
data
aesthetics
geometry
stats
5000
bwt
4000
3000
2000
1000
factor(race)
data: low, age, lwt, race, smoke, ptl, ht, ui, ftv,
bwt [189x10]
mapping: x = factor(race), y = bwt
faceting: facet_null()
----------------------------------geom_boxplot: outlier.colour = black, outlier.shape = 16, outlier.size =
stat_boxplot:
position_dodge: (width = NULL, height = NULL)
Section 6
Facets
Some terminology
data
aesthetics
Really powerful
geometry
stats
facets
Species
versicolor
Sepal.Length
virginica
4.5
4.0
3.5
3.0
2.5
2.0
4.5
4.0
3.5
3.0
2.5
2.0
4.5
4.0
3.5
3.0
2.5
2.0
setosa
Sepal.Width
setosa
versicolor
virginica
setosa
4.5
Sepal.Width
4.0
3.5
3.0
2.5
2.0
versicolor virginica
5 6 7 8 5 6 7 8 5 6 7 8
Sepal.Length
Species
setosa
versicolor
virginica
setosa
4.5
Sepal.Width
4.0
3.5
3.0
2.5
2.0
versicolor virginica
5 6 7 8 5 6 7 8 5 6 7 8
Sepal.Length
Species
setosa
versicolor
virginica
Section 7
Scales
Some terminology
data
aesthetics
geometry
stats
facets
scales
Colors
6
variable
value
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
2
0
setosa
versicolor
Species
virginica
Species
versicolor
Sepal.Length
virginica
4.5
4.0
3.5
3.0
2.5
2.0
4.5
4.0
3.5
3.0
2.5
2.0
4.5
4.0
3.5
3.0
2.5
2.0
setosa
Sepal.Width
setosa
versicolor
virginica
http://tools.medialab.sciences-po.fr/iwanthue/
bwt
4 Kg
3 Kg
2 Kg
1 Kg
factor(race)
scale_fill_discrete(); scale_colour_discrete()
scale_fill_hue(); scale_color_hue()
scale_fill_manual(); scale_color_manual()
scale_fill_brewer(); scale_color_brewer()
scale_linetype(); scale_shape_manual()
Section 8
Coordinates
Some terminology
data
aesthetics
geometry
stats
facets
scales
coordinates
Section 9
Putting it all together with more examples
Section 10
Histograms
count
150
100
50
0
0
50
100
waiting
150
count
60
40
20
0
30
50
70
waiting
90
Section 11
Line plots
## Error:
Anomaly10y
0.5
0.0
1920
1950
1980
Year
Anomaly10y
0.5
0.0
1920
1950
Year
1980
Exercise 2
Modify the previous plot and change it such that there are
three lines instead of one with a confidence band.
0.5
Anomaly10y
0.0
1920
1950
Year
1980
Section 12
Bar plots
Sepal.Length
300
200
100
0
setosa
versicolor
Species
virginica
750
value
variable
Sepal.Length
500
Sepal.Width
Petal.Length
Petal.Width
250
0
setosa
versicolor
Species
virginica
6
variable
value
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
2
0
setosa
versicolor
Species
virginica
6
variable
value
Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
2
0
setosa
versicolor
Species
virginica
Exercise 3
Using the d2 dataset you created earlier, generate this plot below.
Take a quick look at the data first to see if it needs to be binned.
75
cut
count
Fair
Good
50
Very Good
Premium
Ideal
25
0
I1
SI2
SI1
VS2
VS1
clarity
VVS2 VVS1
IF
Exercise 4
I
Anomaly10y
0.5
sign
FALSE
TRUE
0.0
1920
1950
Year
1980
Section 13
Density Plots
Density plots
ggplot(faithful, aes(waiting)) + geom_density()
density
0.03
0.02
0.01
0.00
50
60
70
waiting
80
90
Density plots
ggplot(faithful, aes(waiting)) +
geom_density(fill = "blue", alpha = 0.1)
density
0.03
0.02
0.01
0.00
50
60
70
waiting
80
90
ggplot(faithful, aes(waiting)) +
geom_line(stat = "density")
density
0.03
0.02
0.01
50
60
70
waiting
80
90
Section 14
Adding smoothers
Sepal.Width
4.0
3.5
3.0
Species
setosa
versicolor
virginica
2.5
2.0
5
Sepal.Length
4.5
versicolor
virginica
Sepal.Width
4.0
3.5
3.0
Species
setosa
versicolor
virginica
2.5
2.0
5
Sepal.Length
Section 15
Themes
Adding themes
A themed plot
Adding themes
setosa
4.5
versicolor
virginica
4.0
3.5
Sepal.Width
3.0
2.5
2.0
5 6 7 8
5 6 7 8
5 6 7 8
Sepal.Length
Species
setosa
versicolor
virginica
ggthemes library
install.packages("ggthemes")
library(ggthemes)
# Then add one of these themes to your plot
+theme_stata()
+theme_excel()
+theme_wsj()
+theme_solarized()
Section 16
Create functions to automate your plotting
Then just call your function to generate a plot. Its a lot easier to
fix one function that do it over and over for many plots
plot1 <- my_custom_plot(dataset1, title = "Figure 1")
Section 17
Publication quality figures
Specify a size
ggsave(file = "/path/to/figure/filename.png", width = 6,
height =4)
Further help
I
Practice
Work together