ggplot2
a layer-based introduction
NYC R Meetup
December
October 3rd,
1st, 2009
Harlan D. Harris
harlan@[Link]
ggplot's philosophy
● Graphics are (should be!) created by combining
a specification with data. (Wilkinson, 2005)
● The specification is not the name of the visual
form (bar graph, scatterplot, histogram).
● The specification is a collection of rules that
together describe how to build a graph, a
Grammar of Graphics
December 3, 2009 Harlan D. Harris 2
graphics as grammar
12
data
10
12 8 Column
10 1
6
8 Colum Column
2
6
4
n1
Colum date ct sz z 4
2
Column
n2 3
2 Colum 0
0 n3
Row 2 Row 4
Row 2 Row 4 Row 1 Row 3
Row 1 Row 3
x=date
y=ct/sz
me bars
group by z
December 3, 2009 Harlan D. Harris 3
advantages
● Flexible
● can define new graph types by changing
specifications
● can combine many forms into single graphs
● Smart
● compact: rules have useful defaults
● graphs always have meaning
● Reusable
● can plug new data into old specification
● can explore many types of plots from a set of data
December 3, 2009 Harlan D. Harris 4
ggplot2
● Hadley Wickham (Rice Univ.)
● also: reshape, plyr, etc.
● Extends & implements
The Grammar of Graphics (Wilkinson, 1995, 2005)
● Focus on layers; based on grid
● Specification as R objects constructed by functions
● Large library of components with good defaults
● ggplot2: Elegant Graphics for Data Analysis
(Wickham, 2009)
December 3, 2009 Harlan D. Harris 5
my gripes
● Specification is hierarchical structure;
grammar is left-to-right R expression
● Can't see the structure (usefully)
● Abuses both notation and R semantics
● Deep Magic with lazy evaluation, proto objects
● Existing tutorials lead to conceptual confusion,
requires relearning of fundamentals
● Start with the structure, not with the shortcuts
December 3, 2009 Harlan D. Harris 6
goal
December 3, 2009 Harlan D. Harris 7
data to plot
December 3, 2009 Harlan D. Harris 8
ggplot likes “long” data
December 3, 2009 Harlan D. Harris 9
will plot model vs. empirical
December 3, 2009 Harlan D. Harris 10
simplest plot
aes=”aesthetics”=”create mapping”
December 3, 2009 Harlan D. Harris 11
you don't need
to know this!
structure
ggplot(data=[Link], mapping=aes(x=Parameter, y=Errors,
color=Condition)) +
layer(geom="line")
ggplot
data layers mapping scales coords facets options
x=Param.
(copy) Ø y=Errs
color=Cond.
layer[1]
data mapping geom stat geom_ stat_
identity params params
line
● structure(p), str(p)
December 3, 2009 Harlan D. Harris 12
add empirical data and chance
December 3, 2009 Harlan D. Harris 13
you don't need
to know this!
structure so far
ggplot
data layers mapping scales coords facets options
x=Param.
(copy) y=Errs
color=Cond.
layer[1]
data mapping geom stat geom_ stat_
layer[1] line identity params params
data mapping geom stat geom_ stat_
layer[1] params params
(U)
data mapping
point
geom
identity
stat
size=3
geom_ stat_
layer[1] params params
(K)
data yint=Errs
mapping hline
geom hline
stat size=2
geom_ stat_
color=”black”
params params
linetype=2
yint=[64] hline hline size=.5
December 3, 2009 Harlan D. Harris 14
scales
December 3, 2009 Harlan D. Harris 15
coordinates & scales
● coordinates affect display of axes
● cartesian, polar, map, etc.
● scales affect data mapping
● colors, shapes, lines
● source of confusion
● set axis ticks/breaks and labels with
scale_x_continuous() or scale_y_discrete(), but
● set axis AND DATA range with
coord_cartesian(xlim=c(1,10))
December 3, 2009 Harlan D. Harris 16
options
December 3, 2009 Harlan D. Harris 17
shortcuts
● All those layer() calls are tedious!
● geom_*() creates a layer with a specific geom
(and various defaults, including a stat)
● stat_*() creates a layer with a specific stat
(and various defaults, including a geom)
● qplot() creates a ggplot and a layer
December 3, 2009 Harlan D. Harris 18
quick note on stats
● stat=”identity”
● stat=”lm”
● fit y=f(x) with lm(), generate new data to be plotted
by geom_line(), CIs with geom_ribbon()
● stat=”smooth”
● fit y=f(x) with loess()
● stat=”summary”
● y=f(x) with arbitrary f()
● stat=”bin”
● histograms
December 3, 2009 Harlan D. Harris 19
simplest faceted plot
December 3, 2009 Harlan D. Harris 20
everything else (+alpha)
December 3, 2009 Harlan D. Harris 21
other things I find useful
● scale_x_continuous(breaks=seq(1,9,2),
labels=c(“one”, “”, “five”, “”, “nine”))
● geom_text(aes(x=.., y=.., label=..))
● annotate(geom=”text”, x=14, y=19, “outlier!”)
● geom_density()
● stat_summary([Link]=”mean_cl_boot”,
geom=”crossbar”)
● geom_jitter(position=position_jitter(width=.5))
December 3, 2009 Harlan D. Harris 22
takehomes
● a ggplot graph is generated by a specification +
data
● ggplot specifications are a core object plus
layers
● mappings among data, x/y, scales, and other
attributes are fundamental
● geom and stat shortcuts allow smart/compact
construction of graphs
● ggplot encourages good graphs, with facets,
good use of color, no chartjunk
December 3, 2009 Harlan D. Harris 23
thanks!
December 3, 2009 Harlan D. Harris 24
resources
● Wickham, H. (2009) ggplot2: Elegant Graphics
for Data Analysis. Springer.
● [Link]
● [Link]
● [Link]
December 3, 2009 Harlan D. Harris 25