0% found this document useful (0 votes)
96 views98 pages

Advanced Data Visualization in R: Iris Malone

The document outlines a presentation on advanced data visualization in R. It discusses different graphing packages in R and their pros and cons, including base graphics, lattice, and ggplot2. It presents examples comparing visualizations from each package. The document argues that ggplot2 is preferable because it is used professionally, produces attractive visualizations by default, is easy to manipulate, has great online support, and its knowledge transfers to other packages and languages, though it has a steep learning curve.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views98 pages

Advanced Data Visualization in R: Iris Malone

The document outlines a presentation on advanced data visualization in R. It discusses different graphing packages in R and their pros and cons, including base graphics, lattice, and ggplot2. It presents examples comparing visualizations from each package. The document argues that ggplot2 is preferable because it is used professionally, produces attractive visualizations by default, is easy to manipulate, has great online support, and its knowledge transfers to other packages and languages, though it has a steep learning curve.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Advanced Data Visualization in R

Iris Malone

November 6, 2015

Iris Malone Advanced Data Visualization in R November 6, 2015 1 / 68


Outline

What I’m Covering:

Iris Malone Advanced Data Visualization in R November 6, 2015 2 / 68


Outline

What I’m Covering:

pros/cons different graphing packages

Iris Malone Advanced Data Visualization in R November 6, 2015 2 / 68


Outline

What I’m Covering:

pros/cons different graphing packages


ggplot and the grammar of graphics

Iris Malone Advanced Data Visualization in R November 6, 2015 2 / 68


Outline

What I’m Covering:

pros/cons different graphing packages


ggplot and the grammar of graphics
how to visualize summary stats and regression results

Iris Malone Advanced Data Visualization in R November 6, 2015 2 / 68


Outline

What I’m Covering:

pros/cons different graphing packages


ggplot and the grammar of graphics
how to visualize summary stats and regression results
basic spatial visualization

Iris Malone Advanced Data Visualization in R November 6, 2015 2 / 68


Download code and slides at:

web.stanford.edu/~imalone/VAM.html

Iris Malone Advanced Data Visualization in R November 6, 2015 3 / 68


Choosing a Visualization Package

Toy Example: Plot number of democracies and autocracies over time.

Iris Malone Advanced Data Visualization in R November 6, 2015 4 / 68


Choosing a Visualization Package

Toy Example: Plot number of democracies and autocracies over time.


Options:

plot (graphics)

Iris Malone Advanced Data Visualization in R November 6, 2015 4 / 68


Option 1: Plot
Code:

plot(year, numdems, #x, y


#aesthetics stuff (color, plot type, size)
col = "blue", type = "l" , lwd=3,
#main title and axes labels
main = "Number of Democracies and Autocracies, 1800-2012",
xlab = "Year",
ylab = "Count")
lines(year, numauts, type = "l", col = "red", lwd =3)
legend("topleft", #location of legend
legend=c("Democracies","Autocracies"), #legend labels/var
col = c(4, 2), # colors for each,
#legend colors default 1= black, 2 = red, 4=blue
lty = c(1, 1),
title="Regime Type") # Name of Legend

Iris Malone Advanced Data Visualization in R November 6, 2015 5 / 68


Option 1: Plot
Visual:
Number of Democracies and Autocracies, 1800−2012

Regime Type
Democracies
Autocracies
80
60
Count

40
20
0

1800 1850 1900 1950 2000

Year

Iris Malone Advanced Data Visualization in R November 6, 2015 6 / 68


Choosing a Visualization Package

Toy Example.
Options:

plot (graphics)

Iris Malone Advanced Data Visualization in R November 6, 2015 7 / 68


Choosing a Visualization Package

Toy Example.
Options:

plot (graphics) Overly simplistic.

Iris Malone Advanced Data Visualization in R November 6, 2015 7 / 68


Choosing a Visualization Package

Toy Example.
Options:

plot (graphics) Overly simplistic. Error interp.

Iris Malone Advanced Data Visualization in R November 6, 2015 7 / 68


Choosing a Visualization Package

Toy Example.
Options:

plot (graphics) Overly simplistic. Error interp. Limited online support.

Iris Malone Advanced Data Visualization in R November 6, 2015 7 / 68


Choosing a Visualization Package

Toy Example.
Options:

plot (graphics) Overly simplistic. Error interp. Limited online support.


xyplot (lattice)

Iris Malone Advanced Data Visualization in R November 6, 2015 7 / 68


Option 2: xyplot

Step 1: Build First Layer for Number of Democracies.

library(lattice)
layer1 = xyplot(numdems ~ year, #format: y ~ x
type = "l",
# add title
main = "Number of Democracies and Autocracies, 1800-2012",
col = "blue", lwd=1,
#legend
key=list(space="right", # location
# aesthetics (aes)
lines=list(col=c("red","blue"), lty=c(1,1), lwd=2),
#labels for each line
text=list(c("Autocracies","Democracies"))))

Iris Malone Advanced Data Visualization in R November 6, 2015 8 / 68


Option 2: xyplot

Step 2: Build Second Layer for Number of Autocracies.

layer2 = xyplot(numauts ~ year,


type = "l",
col = "red")

Iris Malone Advanced Data Visualization in R November 6, 2015 9 / 68


Option 2: xyplot
Step 3: Add the layers.

#need extra package to put layers on top


suppressMessages(library(latticeExtra))
layer1 + layer2

Number of Democracies and Autocracies, 1800−2012

100

80

60
numdems

Autocracies
Democracies
40

20

1800 1850 1900 1950 2000

year

Iris Malone Advanced Data Visualization in R November 6, 2015 10 / 68


Graphics Packages

Options:

plot (graphics) Overly simplistic. Error Interp. Limited online support.


xyplot (lattice)

Iris Malone Advanced Data Visualization in R November 6, 2015 11 / 68


Graphics Packages

Options:

plot (graphics) Overly simplistic. Error Interp. Limited online support.


xyplot (lattice) Supplemental packages.

Iris Malone Advanced Data Visualization in R November 6, 2015 11 / 68


Graphics Packages

Options:

plot (graphics) Overly simplistic. Error Interp. Limited online support.


xyplot (lattice) Supplemental packages. Default look could be nicer.

Iris Malone Advanced Data Visualization in R November 6, 2015 11 / 68


Graphics Packages

Options:

plot (graphics) Overly simplistic. Error Interp. Limited online support.


xyplot (lattice) Supplemental packages. Default look could be nicer.
ggplot (ggplot2)

Iris Malone Advanced Data Visualization in R November 6, 2015 11 / 68


Option 3: ggplot

suppressMessages(library(ggplot2))
ggplot(data = NULL) +
#line for dems and auts
geom_line(aes(x=year, y = numdems, colour = "numdems")) +
geom_line(aes(x=year, y = numauts, colour = "numauts")) +
xlab("Year") + ylab("Count") +
ggtitle("Number of Democracies and Autocracies, 1800-2012") +
# legend aesthetics
scale_color_manual(name = "Regime Type", # Name
labels = c(numdems="Democracies", numauts = "Autocracies"), #
values=c(numdems=4,numauts=2))

Iris Malone Advanced Data Visualization in R November 6, 2015 12 / 68


Option 3: ggplot

Number of Democracies and Autocracies, 1800−2012


100

75

Regime Type
Count

50 Autocracies
Democracies

25

1800 1850 1900 1950 2000


Year

Iris Malone Advanced Data Visualization in R November 6, 2015 13 / 68


Summary of Options:

plot (graphics)
xyplot (lattice)
ggplot (ggplot2)

Iris Malone Advanced Data Visualization in R November 6, 2015 14 / 68


Why ggplot?

Used professionally

Iris Malone Advanced Data Visualization in R November 6, 2015 15 / 68


Why ggplot?
Example 1

Iris Malone Advanced Data Visualization in R November 6, 2015 16 / 68


Why ggplot?
Example 2

Iris Malone Advanced Data Visualization in R November 6, 2015 17 / 68


Why ggplot?

- Used professionally

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate
- Great support online

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate
- Great support online
- Knowledge transfers to
other packages/languages
(ggvis, Shiny, Python)

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate
- Great support online
- Knowledge transfers to
other packages/languages
(ggvis, Shiny, Python)
- Steep Learning Curve

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate
- Great support online
- Knowledge transfers to
other packages/languages
(ggvis, Shiny, Python)
- Steep Learning Curve
- Lots of syntax

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate
- Great support online
- Knowledge transfers to
other packages/languages
(ggvis, Shiny, Python)
- Steep Learning Curve
- Lots of syntax
- Can be slow

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate
- Great support online
- Knowledge transfers to
other packages/languages
(ggvis, Shiny, Python)
- Steep Learning Curve
- Lots of syntax
- Can be slow
- Defaults to weird colors

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


Why ggplot?

- Used professionally
- Very pretty
- Easy to manipulate
- Great support online
- Knowledge transfers to
other packages/languages
(ggvis, Shiny, Python)
- Steep Learning Curve
- Lots of syntax
- Can be slow
- Defaults to weird colors
- Summary: Worth it.

Iris Malone Advanced Data Visualization in R November 6, 2015 18 / 68


ggplot Syntax

Following next few slides adapated from Samantha Tyner

Based on Grammar of Graphics book by Leland Wilkinson hence ‘gg’

Iris Malone Advanced Data Visualization in R November 6, 2015 19 / 68


ggplot Syntax

Following next few slides adapated from Samantha Tyner

Based on Grammar of Graphics book by Leland Wilkinson hence ‘gg’


New Zealander Hadley Wickham → R

Iris Malone Advanced Data Visualization in R November 6, 2015 19 / 68


ggplot Syntax

Following next few slides adapated from Samantha Tyner

Based on Grammar of Graphics book by Leland Wilkinson hence ‘gg’


New Zealander Hadley Wickham → R
Analogy: Think of parts of a plot like parts of a sentence

Iris Malone Advanced Data Visualization in R November 6, 2015 19 / 68


ggplot Syntax

Following next few slides adapated from Samantha Tyner

Based on Grammar of Graphics book by Leland Wilkinson hence ‘gg’


New Zealander Hadley Wickham → R
Analogy: Think of parts of a plot like parts of a sentence
Warning: qplot

Iris Malone Advanced Data Visualization in R November 6, 2015 19 / 68


ggplot Syntax
Noun → Data

ggplot(data = df)

Iris Malone Advanced Data Visualization in R November 6, 2015 20 / 68


ggplot Syntax
Noun → Data

ggplot(data = df)

Verb → “geom_” + Plot Type

ggplot(data = df) + geom_bar()

Table 1: Geom Types

abline area boxplot


errorbar histogram line
point ribbon smooth
blank density jitter
polygon quantile vline

Iris Malone Advanced Data Visualization in R November 6, 2015 20 / 68


ggplot Syntax

Noun → Data

ggplot(data = df)

Verb → “geom_” + Plot Type

ggplot(data = df) + geom_bar()

Adjectives → Aesthetics (“aes”) (x, y, fill, colour, linetype)

ggplot(data = df, aes(x=categorical.var, fill=group.var)) +


geom_bar()

Iris Malone Advanced Data Visualization in R November 6, 2015 21 / 68


ggplot Syntax

Adjectives → Aesthetics (“aes”)

Iris Malone Advanced Data Visualization in R November 6, 2015 22 / 68


ggplot Syntax. Aesthetics Sidebar.
Note. Difference between fill, colour, and placement.
Default.

ggplot(data = NULL, aes(x=numdems)) + geom_bar()

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x

40
count

20

0 25 50 75 100
numdems

Iris Malone Advanced Data Visualization in R November 6, 2015 23 / 68


ggplot Syntax. Aesthetics Sidebar.
Fill.

ggplot(data = NULL, aes(x=numdems)) +


geom_bar(fill="red")

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x

40
count

20

0 25 50 75 100
numdems

Iris Malone Advanced Data Visualization in R November 6, 2015 24 / 68


ggplot Syntax. Aesthetics Sidebar.
Colour.

ggplot(data = NULL, aes(x=numdems)) +


geom_bar(colour="red")

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x

40
count

20

0 25 50 75 100
numdems

Iris Malone Advanced Data Visualization in R November 6, 2015 25 / 68


ggplot Syntax. Aesthetics Sidebar.
Fill and Colour.

ggplot(data = NULL, aes(x=numdems)) +


geom_bar(fill="white", colour="red")

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x

40
count

20

0 25 50 75 100
numdems

Iris Malone Advanced Data Visualization in R November 6, 2015 26 / 68


ggplot Syntax. Aesthetics Sidebar
Defined inside the aesthetics argument. Ack!

ggplot(data = NULL) +
geom_bar(aes(x=numdems, colour ="red"))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x

40
count

"red"
red

20

0 25 50 75 100
numdems

Iris Malone Advanced Data Visualization in R November 6, 2015 27 / 68


ggplot Syntax. Aesthetics Sidebar
Defined inside the aesthetics argument.

ggplot(data = NULL) +
geom_bar(aes(x=numdems,
fill=factor(I(year)<1950)))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x

40

factor(I(year) < 1950)


count

FALSE
TRUE

20

0 25 50 75
Iris Malone Advancednumdems
Data Visualization in R November 100
6, 2015 28 / 68
ggplot Syntax

Adverb → stat (e.g. identity, bin)

ggplot(data = df, aes(x=categorical.var, fill=group.var))


+ geom_bar(stat = "bin")

Iris Malone Advanced Data Visualization in R November 6, 2015 29 / 68


ggplot Syntax

Adverb → stat (e.g. identity, bin)

ggplot(data = df, aes(x=categorical.var, fill=group.var))


+ geom_bar(stat = "bin")

Preposition → position (e.g. fill, dodge, identity)

ggplot(data = df, aes(x=categorical.var, fill=group.var)) +


geom_bar(stat="bin", position = "identity", binwidth=5)

Iris Malone Advanced Data Visualization in R November 6, 2015 29 / 68


Toy Example for Learning Syntax.

Fearon and Laitin (2003). Let’s suppose we would like to get a feel for their
data by first just looking at the number of civil wars in their dataset as a
function of two variables: (1) region and (2) time.

library(foreign)
df = read.dta("repdata.dta")
#subset data so 1 obs/civil war
dfonset = subset(df, df$onset == 1)

D.V. Civil War Onset


I.V. Ethnic Fractionalization, Mountainous Terrain, Oil, New State, Others

Iris Malone Advanced Data Visualization in R November 6, 2015 30 / 68


ggplot Syntax. Example. Civil War by Region and Decade

p = ggplot(data = dfonset, aes(x=decade,


fill = region)) +
geom_bar(position = "identity",
binwidth=5)
p

10

region
western democracies and japan
e. europe and the former soviet union
count

asia
n. africa and the middle east
sub−saharan africa
5 latin america and the caribbean

1960 1970 1980 1990


decade

Iris Malone Advanced Data Visualization in R November 6, 2015 31 / 68


ggplot Syntax. Example. Civil War by Region and Decade

p = ggplot(data = dfonset, aes(x=decade,


fill = region)) +
geom_bar(position = "dodge",
binwidth=5)
p

10

region
western democracies and japan
e. europe and the former soviet union
count

asia
n. africa and the middle east
sub−saharan africa
5 latin america and the caribbean

1960 1970
decade
1980 1990
5
Iris Malone Advanced Data Visualization in R November 6, 2015 32 / 68
ggplot Syntax: Title
Add Title.

p = p +
ggtitle("Civil Wars by Space and Time")
p

Civil Wars by Space and Time

10
region
western democracies and japan
e. europe and the former soviet union
count

asia
n. africa and the middle east
sub−saharan africa
5 latin america and the caribbean

1960 1970 1980 1990


decade

Iris Malone Advanced Data Visualization in R November 6, 2015 33 / 68


ggplot Syntax: Axes
Add Axes.

p = p + xlab("Decade") +
ylab("Civil War Frequency")
p

Civil Wars by Space and Time

10
region
Civil War Frequency

western democracies and japan


e. europe and the former soviet union
asia
n. africa and the middle east
sub−saharan africa
5 latin america and the caribbean

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 34 / 68


ggplot Syntax: Theme

Change Theme.

p = p + theme_bw()
p

Civil Wars by Space and Time

10
region
Civil War Frequency

western democracies and japan


e. europe and the former soviet union
asia
n. africa and the middle east
sub−saharan africa
5 latin america and the caribbean

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 35 / 68


ggplot Syntax: Theme

Change Theme.

p = p + theme_classic() #looks like plot!


p

Civil Wars by Space and Time

10
region
Civil War Frequency

western democracies and japan


e. europe and the former soviet union
asia
n. africa and the middle east
sub−saharan africa
5 latin america and the caribbean

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 36 / 68


ggplot Syntax: Theme

What’s great about ggplot is you can customize your own! This is what’s
going on, under the hood, for theme_classic for example:

p = p + theme(panel.grid.major = element_blank(), #grid lines


panel.grid.minor = element_blank() ,
panel.border = element_blank(), #border
panel.background = element_blank(), #background
#change axes line
axis.line = element_line(colour = "black"),
axis.text.x=element_text(colour="black"),
axis.text.y=element_text(colour="black"))

Iris Malone Advanced Data Visualization in R November 6, 2015 37 / 68


ggplot Syntax: Color.
Change Colors. Reds! Use scale_fill_manual for bins,
scale_colour_manual for lines, points, and scale_linetype_manual
otherwise.

rhg_cols = c("#771C19","#AA3929","#E25033","#F27314",
"#F8A31B","#E2C59F","#556670","#000000")
p = p + scale_fill_manual(values = rhg_cols)
p
Civil Wars by Space and Time

10 region
Civil War Frequency

western democracies and japan


e. europe and the former soviet union
asia
n. africa and the middle east
sub−saharan africa
5
latin america and the caribbean

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 38 / 68


ggplot Syntax: Color.
Or choose others! Blues!
#default brewer colors
p = p +
scale_fill_brewer()

## Scale for 'fill' is already present. Adding another scale f


p
Civil Wars by Space and Time

10 region
Civil War Frequency

western democracies and japan


e. europe and the former soviet union
asia
n. africa and the middle east
sub−saharan africa
5
latin america and the caribbean

1960 1970 1980 1990


Decade
Iris Malone Advanced Data Visualization in R November 6, 2015 39 / 68
ggplot Syntax: Color.
#default grey
p = p + scale_fill_grey()

## Scale for 'fill' is already present. Adding another scale f


p
Civil Wars by Space and Time

10
region
Civil War Frequency

western democracies and japan


e. europe and the former soviet union
asia
n. africa and the middle east
sub−saharan africa
5 latin america and the caribbean

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 40 / 68


ggplot Syntax: Legend.

Change Legend Labels (or Colors)

p = p +
scale_fill_manual(name ="My Super Awesome Legend Title",
values=c("darkblue", "darkred", "darkgreen",
"grey", "darkorange", "purple"),
labels=c("Western Dems and Japan", "Former USSR", "Asia",
"North Africa and \n Middle East", "Sub-Saharan Africa",
"Latin America"))

## Scale for 'fill' is already present. Adding another scale f

Iris Malone Advanced Data Visualization in R November 6, 2015 41 / 68


ggplot Syntax: Legend.

Why define fill (or color or linetype) inside aes? Keep track of variables!
Recall:

ggplot(data = NULL, aes(x = year)) +


geom_line(aes(y = numdems, colour = "numdems")) +
geom_line(aes(y = numauts, colour = "numauts")) +
scale_color_manual(name = "Regime Type",
labels = c(numdems="Democracies", numauts = "Autocracies")
values=c(numdems="blue",numauts="red"))

Iris Malone Advanced Data Visualization in R November 6, 2015 42 / 68


ggplot Syntax: Legend.

Note: p is ggplot object with 2 components in aes: x = decade and fill =


region

p = p +
scale_fill_manual(name ="My Super Awesome Legend Title",
values=c("darkblue", "darkred", "darkgreen",
"grey", "darkorange", "purple"),
labels=c("Western Dems and Japan", "Former USSR", "Asia",
"North Africa and \n Middle East", "Sub-Saharan Africa",
"Latin America"))

## Scale for 'fill' is already present. Adding another scale f

Iris Malone Advanced Data Visualization in R November 6, 2015 43 / 68


ggplot Syntax: Legend

Civil Wars by Space and Time

10
My Super Awesome Legend Title
Civil War Frequency

Western Dems and Japan


Former USSR
Asia
North Africa and
Middle East
Sub−Saharan Africa
5 Latin America

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 44 / 68


ggplot Syntax: Legend.

Change the position of the legend.

p = p + theme(legend.position="top")
p

Civil Wars by Space and Time


North Africa and
My Super Awesome Legend Title Western Dems and Japan Former USSR Asia Sub−Saharan Africa Latin America
Middle East

10
Civil War Frequency

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 45 / 68


ggplot Syntax: Legend.

Change the position of the legend.

# Position legend in graph, where x,y is 0,0 (bottom left)


# to 1,1 (top right)
p = p + theme(legend.position=c(0.10, .8))
p

Civil Wars by Space and Time


My Super Awesome Legend Title
Western Dems and Japan
Former USSR
Asia
North Africa and
Middle East
10 Sub−Saharan Africa
Civil War Frequency

Latin America

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 46 / 68


ggplot Syntax: Legend.

Remove the legend.

p = p + theme(legend.position="none")
p

Civil Wars by Space and Time

10
Civil War Frequency

1960 1970 1980 1990


Decade

Iris Malone Advanced Data Visualization in R November 6, 2015 47 / 68


ggplot Syntax: Annotation.
Suppose you want to add a label. For example, what’s the one western
democracy with a civil war in the 1960s?
#it's the UK vs the IRA
p = p + annotate("text", label = "UK vs IRA",
x = 1959, y = 2, size = 6, colour = "black")
p
Civil Wars by Space and Time

10
Civil War Frequency

UK vs IRA

1950 1960 1970 1980 1990


Decade
Iris Malone Advanced Data Visualization in R November 6, 2015 48 / 68
ggplot Syntax: Multiple plots.
Suppose you want a separate barplot for every decade.

ggplot(data = dfonset, aes(x=decade, group = region,


fill = factor(region))) +
geom_bar(stat="bin", position = "dodge", binwidth=5) +
facet_wrap(~region) #var you want separate plots by
western democracies and japan e. europe and the former soviet union asia

10

5
factor(region)
western democracies and japan

0 e. europe and the former soviet union


count

n. africa and the middle east sub−saharan africa latin america and the caribbean asia
n. africa and the middle east
sub−saharan africa
latin america and the caribbean
10

0
1960 1970 1980 1990 1960 1970 1980 1990 1960 1970 1980 1990
decade

Iris Malone Advanced Data Visualization in R November 6, 2015 49 / 68


ggplot Syntax: Saving results.

Option 1.

pdf("nameoffile.pdf", width=12, height = 5)


p
dev.off()

## pdf
## 2

Option 2.

ggsave(p, file="nameoffile.pdf", width=12, height=5)

Iris Malone Advanced Data Visualization in R November 6, 2015 50 / 68


Visualizing Regressions

Visualizing Regression Results

Coef Plots
Not covered here: marginal effects, predicted outcomes

Iris Malone Advanced Data Visualization in R November 6, 2015 51 / 68


Coef Plot: Canned Function

m1 = glm(onset ~ warl + gdpenl + lpopl1 + lmtnest + ncontig +


library(coefplot)
coefplot(m1)

Coefficient Plot
relfrac

ethfrac

polity2l

instab

nwstate
Coefficient

Oil

ncontig

lmtnest

lpopl1

gdpenl

warl

(Intercept)

−6 −3 0 3
Value

Iris Malone Advanced Data Visualization in R November 6, 2015 52 / 68


Coefplot: Our Own Function
Adapted from Stat Bandit
#Format the data
coefplot.gg = function(model, data){
# data is a data frame with 4 columns
# data$names gives variable names
# data$modelcoef gives center point
# data$ylo gives lower limits
# data$yhi gives upper limits
modelcoef = summary(model)$coefficients[1:length(model$coeff
modelse = summary(model)$coefficients[1:length(model$coeffic
ylo = modelcoef - qt(.975, nrow(data))*(modelse)
yhi = modelcoef + qt(.975, nrow(data))*(modelse)
names = names(m1$coefficients)
dfplot = data.frame(names, modelcoef, modelse, ylo, yhi)
# ...
}
Iris Malone Advanced Data Visualization in R November 6, 2015 53 / 68
Coefplot
Define the plot

coefplot.gg = function(model, data){


# ...
#define plot
library(ggplot2)
p = ggplot(dfplot, aes(x=names,
y=modelcoef,
ymin=ylo, ymax=yhi))
+ geom_pointrange(colour=ifelse(ylo < 0 & yhi > 0,
"red", "blue"))
+ theme_bw() + coord_flip()
+ geom_hline(aes(x=0), lty=2)
+ xlab('Variable') + ylab('')
return(p)
}

Iris Malone Advanced Data Visualization in R November 6, 2015 54 / 68


Coefplot
Evaluate the function

coefplot.gg(m1, df)

warl

relfrac

polity2l

Oil

nwstate

ncontig
Variable

lpopl1

lmtnest

instab

gdpenl

ethfrac

(Intercept)

−6 −3 0 3

Iris Malone Advanced Data Visualization in R November 6, 2015 55 / 68


Spatial Visualization

Packages:

rworldmap
maps
ggmap

suppressMessages(library(maps))
suppressMessages(library(ggmap))
suppressMessages(library(mapproj))

Iris Malone Advanced Data Visualization in R November 6, 2015 56 / 68


maps

Adapted from Mahbubul Majumder.

dfworldmap = map_data("world")
ggplot() + geom_polygon(aes(x=long,y=lat, group=group),
fill="grey65",
data=dfworldmap) + theme_bw()

50
lat

−50

−100 0 100 200


long

Iris Malone Advanced Data Visualization in R November 6, 2015 57 / 68


Chloropleth maps
Suppose we want to map different levels of a variable by some unit like a
state or country. For a toy example, we’ll map 1973 murder rates by state
using the USArrests data.
Step 1: Format data

suppressMessages(library(dplyr))
us = map_data("state")
head(us)

## long lat group order region subregion


## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
Iris Malone Advanced Data Visualization in R November 6, 2015 58 / 68
Chloropleth maps

head(USArrests)

## Murder Assault UrbanPop Rape


## Alabama 13.2 236 58 21.2
## Alaska 10.0 263 48 44.5
## Arizona 8.1 294 80 31.0
## Arkansas 8.8 190 50 19.5
## California 9.0 276 91 40.6
## Colorado 7.9 204 78 38.7

#mismatch between region and state


# need to add var to arrest data to match 1:1

Iris Malone Advanced Data Visualization in R November 6, 2015 59 / 68


Chloropleth maps
arrest = USArrests %>%
add_rownames("region") %>%
#use mutate function from plyr
mutate(region=tolower(region)) #make it all lowercase
#format to work with map
head(arrest)

## Source: local data frame [6 x 5]


##
## region Murder Assault UrbanPop Rape
## (chr) (dbl) (int) (int) (dbl)
## 1 alabama 13.2 236 58 21.2
## 2 alaska 10.0 263 48 44.5
## 3 arizona 8.1 294 80 31.0
## 4 arkansas 8.8 190 50 19.5
## 5 california 9.0 276 91 40.6
## 6 colorado
Iris Malone
7.9Advanced Data
204Visualization in R78 38.7 November 6, 2015 60 / 68
Chloropleth maps
Step 2: Plot the base map layer

g = ggplot()
#must define map first
g = g + geom_map(data=us, map=us,
aes(x=long, y=lat, map_id=region),
fill="#ffffff", color="#ffffff", size=0.15
g
50

45

40
lat

35

30

25

−120 −100 −80


long

Iris Malone Advanced Data Visualization in R November 6, 2015 61 / 68


Chloropleth maps

Step 3: Add our arrest data

g = g + geom_map(data=arrest, map=us,
aes(fill=Murder, map_id=region),
color="#ffffff", size=0.15)
g

50

45

Murder

40 15
lat

10
35
5

30

25

−120 −100 −80


long

Iris Malone Advanced Data Visualization in R November 6, 2015 62 / 68


Chloropleth maps
Step 4: Make it look pretty.

g = g + scale_fill_continuous(low='thistle2', high='darkblue',
guide='colorbar') + xlab("")
g = g + theme(panel.border = element_blank()) + theme(panel.ba
g

Murder

15

10

Iris Malone Advanced Data Visualization in R November 6, 2015 63 / 68


ggmap
map1 = suppressMessages(get_map(
location = 'Stanford University', zoom = 14, #zoom-in level
maptype="satellite")) #map type
ggmap(map1)

37.44

37.43
lat

37.42

37.41
Iris Malone Advanced Data Visualization in R November 6, 2015 64 / 68
ggmap
Step 1: Pull a location you want to plot from Google maps.

map = suppressMessages(get_map(location = 'Europe', zoom = 4))


ggmap(map)

60
lat

50

40

Iris Malone Advanced Data Visualization in R November 6, 2015 65 / 68


ggmap

Step 2: Get geocoordinates for points or locations you’re interested in.

europegps = suppressMessages(geocode(c(
"Lisbon, Portugal",
"Eiffel Tower",
"Berlin, Germany",
"Crimea, Ukraine"), source="google"))
europegps

## lon lat
## 1 -9.139337 38.72225
## 2 2.294481 48.85837
## 3 13.404954 52.52001
## 4 34.102417 44.95212

Iris Malone Advanced Data Visualization in R November 6, 2015 66 / 68


ggmap
Step 3: Add geocoordinates to ggmap! It’s just like working with another
ggplot object.

ggmap(map) + geom_point(aes(x=europegps$lon, y = europegps$lat


lwd = 4, colour = "red") + ggtitle("Place I would like to

Place I would like to Visit!

60
lat

50

40
Iris Malone Advanced Data Visualization in R November 6, 2015 67 / 68
Summary

ggplot is super powerful

Iris Malone Advanced Data Visualization in R November 6, 2015 68 / 68


Summary

ggplot is super powerful but kind of annoying to learn

Iris Malone Advanced Data Visualization in R November 6, 2015 68 / 68


Summary

ggplot is super powerful but kind of annoying to learn


Ability to make complicated graphs awesome

Iris Malone Advanced Data Visualization in R November 6, 2015 68 / 68


Summary

ggplot is super powerful but kind of annoying to learn


Ability to make complicated graphs awesome
Benefits outweigh the start-up costs

Iris Malone Advanced Data Visualization in R November 6, 2015 68 / 68


Summary

ggplot is super powerful but kind of annoying to learn


Ability to make complicated graphs awesome
Benefits outweigh the start-up costs

Iris Malone Advanced Data Visualization in R November 6, 2015 68 / 68

You might also like