Ggplot2 Intro
Ggplot2 Intro
2
R Package: ggplot2
Used to produce statistical graphics, author = Hadley Wickham
"attempt to take the good things about base and lattice graphics
and improve on them with a strong, underlying model "
3
qplot()
ggplot2 provides two ways to produce plot objects:
uses some concepts of The Grammar of Graphics, but doesn’t provide full capability
and
designed to be very similar to plot() and simple to use
may have steeper learning curve but allows much more flexibility when building graphs
4
Grammar Defines Components of Graphics
data: in ggplot2, data must be stored as an R data frame
scales: for each aesthetic, describe how visual characteristic is converted to display values
- for example, log scales, color scales, size scales, shape scales, ...
facets: describe how data is split into subsets and displayed as multiple small graphs
5
Workshop Data Frame
extract from 2012 World Population Data Sheet produced by Population Reference Bureau
*definitions: infant mortality rate – annual number of deaths of infants under age 1 per 1,000 live births
total fertility rate – average number of children a woman would have assuming that current
age-specific birth rates remain constant throughout her childbearing years 6
ggplot()
creates a plot object that can be assigned to a variable
can specify data frame and aesthetics (visual characteristics that represent data)
8
Layer
purpose:
display the data – allows viewer to see
patterns, overall structure, local structure, outliers, ...
display statistical summaries of the data – allows viewer to see
counts, means, medians, IQRs, model predictions, ...
full specification:
9
Add a geom Layer
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr, color=area))
p + layer(geom="blank") p + layer(geom="line")
p + layer(geom="jitter") p + layer(geom="step")
10
Add a stat Layer
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr))
p + layer(geom="point", geom_params=list(shape=1)) +
layer(stat="smooth")
... group is <1000, so using loess. Use 'method = x' to change the smoothing method.
p + layer(geom="point", geom_params=list(shape=1)) +
layer(stat="smooth", stat_params=list(method="lm",se=FALSE))
11
geom_xxx and stat_xxx Shortcut Functions
can use geom_xxx()and stat_xxx() shortcut functions rather than layer()...
12
Shortcut Functions: Adding a geom Layer
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr, color=area))
p + geom_blank() p + geom_line()
p + geom_jitter() p + geom_step()
13
Add Layers Using Shortcut Functions
geom_xxx()
purpose: display the data –
allows viewer to see patterns, overall structure, local structure, outliers, ...
each geom_xxx() has a default stat (statistical transformation) associated with it,
but the default statistical transformation may be changed using stat parameter
stat_xxx()
purpose: display statistical summaries of the data –
allows viewer to see counts, means, medians, IQRs, model predictions, ...
each stat_xxx() has a default geom (geometric object) associated with it,
but the default geometric object may be changed using geom parameter
16
Statistical Transformation
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=area))
p + stat_bin() p + stat_bin(geom="bar")
18
Change Default Geometric Object
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le))
p + stat_bin(geom="line",binwidth=1)
p + stat_bin(geom="line", binwidth=1) + stat_bin(geom="point",binwidth=1)
19
Use Variables Created by stat_xxx()
bin area ..count..
stat_xxx() may create new variables in transformed data frame 1 Africa 48
2 Americas 25
aesthetics may be mapped to these new variables
3 Asia/Oceania 49
4 Europe 36
p + stat_bin(aes(fill=..count..))
20
Already Transformed Data
wb <- read.csv(file="WDS2012areabins.csv", head=TRUE, sep=",")
wb
bin area count
1 1 Africa 48
2 2 Americas 25
3 3 Asia/Oceania 49
4 4 Europe 36
21
Aesthetics
describe visual characteristics that represent data
- for example, x position, y position, size, color (outside), fill (inside),
point shape, line type, transparency
most layers have some required aesthetics and some optional aesthetics
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr, color=area))
p + geom_point() + geom_smooth(method="lm", se=FALSE)
22
Add or Remove Aesthetic Mapping
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr, color=area))
23
Aesthetic Mapping vs. Parameter Setting
aesthetic mapping
data value determines visual characteristic
use aes()
setting
constant value determines visual characteristic
use layer parameter
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr))
24
Position
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2
p <- ggplot(data=w, aes(x=area, fill=tfrGT2))
p + geom_bar() p + geom_bar(position="stack")
p + geom_bar(position="dodge") p + geom_bar(position="fill")
25
Bar Width
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=area))
p + geom_bar(width=.5) p + geom_bar(width=.97)
26
Position
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr))
p + geom_point()
p + geom_point
(position="jitter")
equivalent to
p + geom_jitter()
27
Transparency
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr))
p + geom_point
(size=3,
alpha=1/2)
p + geom_jitter
(size=4,
alpha=1/2)
techniques for overplotting: adjusting symbol size, shape, jitter, and transparency
28
Coordinate System
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(w, aes(x=factor(1), fill=area))
p + geom_bar() + coord_polar(theta="y",
p + geom_bar() + coord_flip()
direction=-1)
29
Data Frame
each plot layer may contain data from a different data frame
30
Labels
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
wna <- subset(w, region=="Northern Africa")
p <- ggplot(data=wna, aes(x=le, y=tfr))
p + geom_point() +
geom_text(aes(y=tfr + .2,
label=country), size=4) +
xlim(50,80)
p + geom_point() +
annotate("text", x=55, y=5.5,
label="South Sudan", color="red") +
annotate("text", x=62, y=4.3,
label="Sudan", color="red") +
ggtitle("Northern Africa") +
xlab("life expectancy")
31
Labels
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
labelset <-c("South Sudan", "Sudan", "Libya", "Tunisia")
32
Scale
controls the mapping from data to aesthetic
“takes data and turns it into something that can be perceived visually”
color and fill, shape, size, position
acts as a function from the data space to a place in the aesthetic space
provides axes or legends (“guides”) to allow viewer to perform inverse mapping from
aesthetic space back to data space
required for every aesthetic ... so ggplot2 always provides a default scale
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2
p <- ggplot(data=w, aes(x=area, fill=tfrGT2))
p + geom_bar(color="black")
equivalent to
p + geom_bar(color="black") +
scale_fill_discrete()
equivalent to
p + geom_bar(color="black") +
scale_fill_hue()
colors equally spaced around color wheel 33
Fill Scales
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2
p <- ggplot(data=w, aes(x=area, fill=tfrGT2))
p + geom_bar(color="black") +
scale_fill_grey()
p + geom_bar(color="black") +
scale_fill_brewer()
34
Fill Scales
library(RColorBrewer)
display.brewer.all()
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2
p <- ggplot(data=w,
aes(x=area, fill=tfrGT2))
p + geom_bar(color="black") +
scale_fill_brewer(palette="Set1")
35
Manual Scales
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2
p <- ggplot(data=w, aes(x=area, fill=tfrGT2))
p + geom_bar(color="black") +
scale_fill_manual(values=c("red","blue"),
labels=c("no", "yes"))
typical scale arguments: values
labels
breaks
limits
name
p + geom_point(aes(x=le, y=tfr,
shape=area, fill=NULL), size = 3) +
xlab("life expectancy") +
scale_shape_manual(values=c(1,16,2,8))
36
Position Scales
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w,
p <- ggplot(data=w, aes(x=le, y=tfr)) aes(x=le, y=pop2012))
p + geom_jitter() p + geom_jitter()
p + geom_jitter() +
p + geom_jitter() + scale_y_log10(breaks=c(10, 100,
scale_y_reverse() 1000), labels=c(10,100,1000))
37
Theme
controls appearance of non-data elements
... does not affect how data is displayed by geom_xxx() or stat_xxx() functions
theme elements inherit properties from other theme elements, for example:
title
inherits from
axis.title
axis.title.x axis.title.y
38
Theme: Titles, Tick Marks, and Tick Labels
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr))
p + geom_jitter() + ggtitle("Life Expectancy and TFR") +
xlab("life expectency (years)") + ylab("total fertility rate (tfr)") +
scale_x_continuous(breaks=seq(50,80,by=5),
labels=c(50,"fifty-five",60,65,70,75,80)) +
theme(title=element_text(color="blue", size=30),
axis.title=element_text(size=14,face="bold"),
axis.title.x=element_text(color="green"),
axis.text=element_text(size=14),
axis.text.y=element_text(color="black"),
axis.text.x=element_text(color="purple"),
axis.ticks.y=element_blank())
39
Theme: Legends
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2
p <- ggplot(data=w, aes(x=area, fill=tfrGT2))
p + geom_bar() +
scale_fill_manual(name="TFR value",
values = c("red","blue"),
labels=c("<=2", ">2")) +
theme(legend.position="left",
legend.text.align=1)
p + geom_point(aes(x=le, y=tfr,
shape=area, fill=NULL), size = 3) +
xlab("life expectancy") +
scale_shape_manual(name="Area: ",
values=c(1,16,2,8)) +
theme(legend.key=element_blank(),
legend.direction="horizontal",
legend.position="bottom")
40
Theme: Overall Look
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=w, aes(x=le, y=tfr))
42
Facets
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2
p <- ggplot(data=w, aes(x=le, y=imr)) + geom_jitter()
43
Saving Graphs
ggsave()
ggsave(file="le_tfr1.jpg")
ggsave(file="le_tfr2.jpg", scale=2)
ggsave(file="le_tfr3.jpg", width=5, height=5, unit="in")
ggsave(file="le_tfr4.png")
ggsave(file="le_tfr5.pdf")
44
Part 2: Examples
45
Contents and Purpose of ggplot2 Graphs
ggplot2 graph is typically created to show:
- data
- data + annotation
- statistical summary
- statistical summary + annotation
- data + statistical summary
- data + statistical summary + annotation
purpose of graph:
- explore data to
increase understanding of data
- communicate about data ... Graph associated with (online) NY Times Op-Ed piece by Thomas B. Edsall,
“Does Rising Inequality Make Us Hardhearted?” December 10, 2013.
often by showing data and/or
http://www.nytimes.com/imagepages/2013/12/11/opinion/11edsall-
statistical summary plus annotation chart4.html?ref=opinion
46
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
popLT300 <- subset(w,pop2012<300)
p <- ggplot(data=popLT300,
aes(x=area, y=tfr, size=pop2012))
p + geom_jitter(position=
position_jitter(w=.2, h=.1),shape=21) +
scale_size_area(max_size=10)
47
Data + Annotation
p <- ggplot(data=popLT300,
aes(x=area, y=tfr, size=pop2012))
p + geom_jitter(position=
position_jitter(w=.2, h=.1),shape=21) +
scale_y_continuous(breaks=
c(1,2,3,4,5,6,7)) +
scale_size_area(max_size=10) +
annotate("text", x=1.3,y=7.1,
label="Niger", size=4) +
labs(title="Country Total Fertiity Rates
(TFRs), 2012",
x="\nNote: United States, China and
India are not included.",
y="Total\nFertility\nRate\n(TFR)",
size="2012 Population\n
(millions)") +
theme_bw() +
theme(axis.title.x=element_text(size=10,
hjust=0),
axis.title.y=element_text(angle=0),
legend.key=element_blank(),
legend.text.align=1)
48
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
49
Data + Statistical Summary
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
50
Data + Statistical Summary + Annotation
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=reorder(factor(region),tfr,FUN=median),
y=tfr, color=region))
p + geom_boxplot(outlier.size=0) +
geom_jitter(position=
position_jitter(w=.2,h=0)) +
annotate("text",x=1.2, y=5.5,
label="South Sudan", size=4) +
annotate("text",x=3.3, y=1.5,
label="Mauritius", size=4) +
annotate("text",x=4.8, y=7.1,
label="Niger", size=4) +
annotate("text",x=4, y=3.2,
label="Gabon", size=4) +
labs(title="Country TFR's for Africa, 2012",
x="", y="TFR") +
theme(axis.ticks.x=element_blank(),
axis.title.y=element_text(angle=0),
legend.position="none")
51
Statistical Summary
violin plot:
kernel density estimates, mirrored to have a symmetrical shape
52
Statistical Summaries
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
53
Statistical Summary
density distribution
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(w, aes(x=le, color=area)) p <- ggplot(w, aes(x=le, fill=area))
p + geom_line(stat="density") p + geom_density()
54
Statistical Summary + Annotation
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(w, aes(x=le, fill=area))
p + geom_density(alpha=.4) +
scale_fill_manual(values=c("red", "green", "blue", "yellow")) +
scale_x_continuous(breaks=c(45,50,55,60,65,70,75,80,85)) +
theme(axis.text=element_text(color="black", size=12)) +
labs(title="Distribution of Life Expectancy, by Area, 2012", x="life expectancy")
55
Statistical Summaries
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(w, aes(x=le))
p + geom_line(stat="density",
p + geom_freqpoly(color="red", color="red", size=2, bin=1) +
size=1, bin=1) scale_y_continuous(limits=c(0,0.1))
p + geom_bar(aes(y=..density..),
fill="darkgray", bin=1) +
p + geom_bar(fill="darkgray", bin=1) + geom_line(stat="density", color="red",
geom_freqpoly(color="red", size=1, bin=1) size=2) + ylim(0,0.1)
56
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=tfr,y=reorder(factor(country),tfr)))
p + geom_point()
57
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
p <- gplot(data=subset(w,area=="Africa"),
aes(x=tfr,y=reorder(factor(country),tfr)))
p + geom_segment(aes(yend=country,xend=0)) +
geom_point() +
theme_minimal() +
scale_x_continuous(breaks=
c(0,1,2,3,4,5,6,7)) +
labs(x="Total Fertility Rate (TFR)", y="",
title="Total Fertility Rates (TFRs)
in Africa, by Country, 2012") +
theme(panel.grid.major.y=element_blank(),
axis.ticks=element_blank())
58
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=tfr, y=reorder(factor(country),tfr)))
p + geom_segment(aes(yend=country,xend=0),
size=2) +
theme_minimal() +
scale_x_continuous(breaks=
c(0,1,2,3,4,5,6,7)) +
labs(x="Total Fertility Rate (TFR)", y="",
title="Total Fertility Rates (TFRs)
in Africa, by Country, 2012") +
theme(panel.grid.major.y=element_blank(),
axis.ticks=element_blank())
59
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=tfr,
y=reorder(factor(country),tfr)))
p + geom_text(aes(x=tfr+.1, label=country,
hjust=0), size=4) +
geom_segment(aes(yend=country,xend=0), size=2) +
theme_minimal() +
scale_x_continuous(breaks=c(1,2,3,4,5,6,7),
limits=c(0,8)) +
labs(x="", y="",
title="Total Fertility Rates (TFRs)
in Africa, by Country, 2012") +
theme(panel.grid.major.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank())
60
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=tfr, y=reorder(factor(country),tfr)))
p + geom_text(aes(x=tfr-.1, label=country,
hjust=1), size=4) +
geom_point() +
theme_minimal() +
scale_x_continuous(breaks=c(1,2,3,4,5,6,7),
limits=c(0,8)) +
labs(x="", y="",
title="Total Fertility Rates (TFRs) in
Africa, by Country, 2012") +
theme(panel.grid.major.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank())
61
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=tfr, y=reorder(factor(country),tfr)))
p + geom_text(aes(x=tfr+.1, label=country,
hjust=0), size= 4) +
geom_point() +
theme_minimal() +
scale_x_continuous(breaks=c(1,2,3,4,5,6,7),
limits=c(0,8)) +
labs(x="", y="",
title="Total Fertility Rates (TFRs)
in Africa, by Country, 2012") +
theme(panel.grid.major.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank())
62
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
a <- subset(w,area=="Africa")
a$region <- factor(a$region,levels=
c("Northern Africa","Southern Africa",
"Western Africa", "Middle Africa",
"Eastern Africa" ))
p <- ggplot(data=a,aes(x=tfr,
y=reorder(factor(country),tfr)))
p + geom_segment(aes(yend=country,xend=0)) +
geom_point() + scale_x_continuous(breaks=
c(0,1,2,3,4,5,6,7)) +
labs(x="Total Fertility Rate (TFR)", y="",
title="Total Fertility Rates (TFRs) in
Africa, by Country, 2012") +
theme(
axis.text=element_text(color="black"),
strip.text.y=element_text(size=9),
strip.background=element_rect(fill="white"),
panel.grid.major.y=element_blank(),
panel.grid.minor.x=element_blank(),
axis.ticks=element_blank()) +
facet_grid(region ~ .)
63
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
a <- subset(w,area=="Africa")
a$region <- factor(a$region,levels=
c("Northern Africa","Southern Africa",
"Western Africa", "Middle Africa",
"Eastern Africa" ))
p <- ggplot(data=a,aes(x=tfr,
y=reorder(factor(country),tfr)))
p +
geom_segment(aes(yend=country,xend=0)) +
geom_point() + scale_x_continuous(breaks=
c(0,1,2,3,4,5,6,7)) +
labs(x="Total Fertility Rate (TFR)",
y="",
title="Total Fertility Rates (TFRs) in
Africa, by Country, 2012") +
theme(
axis.text=element_text(color="black"),
strip.text.y=element_text(size=9),
strip.background=element_rect(fill="white
"),
panel.grid.major.y=element_blank(),
panel.grid.minor.x=element_blank(),
axis.ticks=element_blank()) +
facet_grid(region ~ ., scales="free_y")
64
Show Data
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
a <- subset(w,area=="Africa")
a$region <- factor(a$region,levels=
c("Northern Africa","Southern Africa",
"Western Africa", "Middle Africa",
"Eastern Africa" ))
p <- ggplot(data=a,aes(x=tfr,
y=reorder(factor(country),tfr)))
p + geom_segment(aes(yend=country,xend=0)) +
geom_point() + scale_x_continuous(breaks=
c(0,1,2,3,4,5,6,7)) +
labs(x="Total Fertility Rate (TFR)", y="",
title="Total Fertility Rates (TFRs) in
Africa, by Country, 2012") +
theme(
axis.text=element_text(color="black"),
strip.text.y=element_text(size=9),
strip.background=element_rect(fill="white"),
panel.grid.major.y=element_blank(),
panel.grid.minor.x=element_blank(),
axis.ticks=element_blank()) +
facet_grid(region ~ .,
scales="free_y", space="free_y")
65
Show Data
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=reorder(factor(country),leF),y=leF))
p + geom_point(color="red") +
geom_point(aes(y=leM), color="blue") +
theme_bw() +
scale_y_continuous(breaks=c(45,50,55,60,65,70,75,80)) +
labs(x="", y="Life Expectancy",
title="Life Expectancy in Africa, by Country and Gender, 2012") +
theme(axis.text.x=element_text(angle=60, hjust=1),
axis.text=element_text(color="black"))
66
Show Data
library(reshape2)
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
w.melt <- melt(w, measure.vars=c("leM", "leF"))
p <- ggplot(data=subset(w.melt,area=="Africa"),
aes(x=reorder(factor(country),le), y=value, color=variable))
p + geom_point() + theme_bw() +
scale_y_continuous(breaks=c(45,50,55,60,65,70,75,80)) +
scale_color_manual(values=c("blue", "red"), name="", labels=c("male", "female")) +
labs(x="", y="Life Expectancy",
title="Life Expectancy in Africa, by Country and Gender, 2012") +
theme(axis.text.x=element_text(angle=60, hjust=1),
axis.text=element_text(color="black"), legend.key=element_blank())
67
Show Data
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
p <- ggplot(data=subset(w,area=="Africa"),
aes(x=reorder(factor(country),leF),y=leF))
p + geom_point(color="red") + geom_point(aes(y=leM), color="blue") +
geom_point(x=43, y=48, color="blue") + geom_point(x=43, y=46, color="red") +
annotate("text", x=45, y=48, label="male", color="black") +
annotate("text", x=45.5, y=46, label="female", color="black") +
geom_segment(y=50,x=42,yend=50,xend=48 )+ geom_segment(y=50,x=42,yend=45,xend=42) +
theme_bw() + scale_y_continuous(breaks=c(45,50,55,60,65,70,75,80)) +
labs(x="", y="Life Expectancy",
title="Life Expectancy in Africa, by Country and Gender, 2012") +
theme(axis.text.x=element_text(angle=60, hjust=1),
axis.text=element_text(color="black"))
68
Statistical Summary
w <- read.csv(file="WDS2012.csv", w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",") head=TRUE, sep=",")
w$tfrGT2 <- w$tfr > 2 w$imrGT15 <- w$imr > 15
p <- ggplot(data=w, p <- ggplot(data=w,
aes(x=area, fill=tfrGT2)) aes(x=area, fill=imrGT15))
p + geom_bar() + p + geom_bar() +
scale_fill_manual(name="TFR value", scale_fill_manual(name="IMR value",
values = c("red","blue"), values = c("red","blue"),
labels=c("<=2", ">2")) + labels=c("<=15", ">15")) +
theme(legend.text.align=1) theme(legend.text.align=1)
69
Data + Statistical Summary + Annotation
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
70
Data + Statistical Summary
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
p + geom_point(aes(color=area)) +
stat_smooth(method="lm", fill=NA,
color="purple") +
stat_smooth(method="lm", fill=NA, color="black",
linetype="dashed", geom="ribbon") +
scale_color_manual(values=c("red", "blue",
"green", "orange")) +
scale_y_continuous(breaks=c(0,1,2,3,4,5,6,7),
limits=c(0,7.8)) +
scale_x_continuous(breaks=c(0,15,30,45,60,75,
90,105,120)) +
theme_bw() +
theme(legend.position="bottom",
legend.direction="horizontal",
legend.key=element_blank())
71
Data + Statistical Summary
w <- read.csv(file="WDS2012.csv",
head=TRUE, sep=",")
m <- lm(tfr ~ imr, data=w)
mp <- predict(m, interval="confidence")
wp <- cbind(w, mp)
72
Graphing Regression Diagnostics
approach: make diagnostic data easily available
use all ggplot2 capabilities to visualize data
flexibility
fortify(model)
73
(Regression Diagnostic) Data + Statistical Summary
+ Annotation
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
m <- lm(tfr ~ imr, data=w)
mf <- fortify(m)
74
(Regression Diagnostic) Data + Statistical Summary
+ Annotation
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
m <- lm(tfr ~ imr, data=w)
wf <- fortify(m,w)
75
(Regression Diagnostic) Data + Statistical Summary
+ Annotation
w <- read.csv(file="WDS2012.csv", head=TRUE, sep=",")
m <- lm(tfr ~ imr, data=w)
wf <- fortify(m,w)
76
Part 3: Recap and Additional Resources
77
country
Algeria
Egypt
.
tfr
2.9
2.9
.
imr
24
24
.
area
Africa
Africa
.
Recap
. . . .
. . . .
Canada 1.7 5.1 Americas
United States 1.9 6.0 Americas
. . . .
. . . .
. . . .
Armenia 1.7 11 Asia/Oceania ggplot2
Azerbaijan 2.3 11 Asia/Oceania
. . . .
. . . .
. . . .
Denmark 1.8 3.5 Europe
Estonia 2.5 3.3 Europe
. . . .
. . . .
. . . .
- coordinate system
- statistical transformations of data
- which values will be represented by various visual characteristics (aesthetics)
- how values will mapped to visual characteristics (scales)
- geometric rendering chart
- whether data might be displayed as “small multiples” (facets) topology
- adding additional annotation
78
Additional Resources
official "Package ggplot2" documentation and help
- http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf
- http://docs.ggplot2.org/current/
books
- ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham. Springer, 2009.
- R Graphics Cookbook by Winston Chang. O’Reilly, 2012.
- The Grammar of Graphics by Leland Wilkinson. Springer, 2005.
videos
- A Backstage Tour of ggplot2 with Hadley Wickham, Feb. 2012.
http://www.youtube.com/watch?v=RHu5vgBZ1yQ
- Plotting with ggplot2: Part 2 with Roger Peng, Johns Hopkins University, Oct. 2013.
http://www.youtube.com/watch?v=n8kYa9vu1l8