R Graphics1
R Graphics1
Michael Friendly
SCS Short Course
March, 2018
http://datavis.ca/courses/RGraphics/
Course outline
1. Overview of R graphics
2. Standard graphics in R
3. Grid & lattice graphics
4. ggplot2
Outline: Session 1
• Session 1: Overview of R graphics, the big picture
Getting started: R, R Studio, R package tools
Roles of graphics in data analysis
• Exploration, analysis, presentation
What can I do with R graphics?
• Anything you can think of!
• Standard data graphs, maps, dynamic, interactive graphics –
we’ll see a sampler of these
• R packages: many application-specific graphs
Reproducible analysis and reporting
• knitr, R markdown
• R Studio
-#-
Outline: Session 2
• Session 2: Standard graphics in R
R object-oriented design
Hadley Wickham, ggplot2: Elegant graphics for data analysis, 2nd Ed.
1st Ed: Online, http://ggplot2.org/book/
ggplot2 Quick Reference: http://sape.inf.usi.ch/quick-reference/ggplot2/
Complete ggplot2 documentation: http://docs.ggplot2.org/current/
7
Resources: cheat sheets
R Studio provides a variety of handy cheat sheets for aspects of data analysis &
graphics See: https://www.rstudio.com/resources/cheatsheets/
Download, laminate,
paste them on your
fridge
8
Getting started: Tools
• To profit best from this course, you need to install
both R and R Studio on your computer
Publish: A variety of R packages make it easy to write and publish research reports
and slide presentations in various formats (HTML, Word, LaTeX, …), all within R
Studio
Web apps: R now has several powerful connections to preparing dynamic, web-
based data display and analysis applications.
10
Getting started: R Studio
command history
workspace: your variables
files
R console plots
(just like Rterm) packages
help
R Studio navigation
12
R Studio projects
R Studio projects are a handy way to
organize your work
13
R Studio projects
An R Studio project for a research paper: R files (scripts), Rmd files (text, R “chunks”)
14
Graphics: Why plot your data?
• Three data sets with exactly the same bivariate summary
statistics:
Same correlations, linear regression lines, etc
Indistinguishable from standard printed output
Ah ha!
Ooh!
17
The 80-20 rule: Data analysis
• Often ~80% of data analysis time is spent on data preparation
and data cleaning
1. data entry, importing data set to R, assigning factor labels,
2. data screening: checking for errors, outliers, …
3. Fitting models & diagnostics: whoops! Something wrong, go back to step 1
• Whatever you can do to reduce this, gives more time for:
Thoughtful analysis,
Comparing models,
Insightful graphics,
Telling the story of your results and conclusions
18
The 80-20 rule: Graphics
• Analysis graphs: Happily, 20% of effort can give 80% of a
desired result
Default settings for plots often give something reasonable
90-10 rule: Plot annotations (regression lines, smoothed curves, data
ellipses, …) add additional information to help understand patterns,
trends and unusual features, with only 10% more effort
• Presentation graphs: Sadly, 80% of total effort may be
required to give the remaining 20% of your final graph
Graph title, axis and value labels: should be directly readable
Grouping attributes: visually distinct, allowing for BW vs color
• color, shape, size of point symbols;
• color, line style, line width of lines
Legends: Connect the data in the graph to interpretation
Aspect ratio: need to consider the H x V size and shape
19
What can I do with R graphics?
A wide variety of standard plots (customized)
line graph: plot()
barchart()
hist()
3D plot: persp()
boxplot()
pie()
Bivariate plots
R base graphics provide a wide variety of different plot types for bivariate data
Some plotting
functions take a
matrix argument &
plot all columns
21
Bivariate plots
A number of specialized plot types are also available in base R graphics
Plot methods for factors and tables are designed to show the association between
categorical variables
The vcd & vcdExtra
packages provide more
and better plots for
categorical data
22
Mosaic plots
Similar to a grouped bar chart
Shows a frequency table with tiles,
area ~ frequency
> data(HairEyeColor)
> HEC <- margin.table(HairEyeColor, 1:2)
> HEC
Eye
Hair Brown Blue Hazel Green
Black 68 20 15 5
Brown 119 84 54 29
Red 26 17 14 14
Blond 7 94 10 16
> chisq.test(HEC)
data: HEC
X-squared = 140, df = 9, p-value <2e-16
> round(residuals(chisq.test(HEC)),2)
Eye
Hair Brown Blue Hazel Green
Black 4.40 -3.07 -0.48 -1.95
Brown 1.23 -1.95 1.35 -0.35
Red -0.07 -1.73 0.85 2.28
Blond -5.85 7.05 -2.23 0.61
data(Duncan, package=“car”)
plot(~ prestige + income + education,
data=Duncan)
pairs(~ prestige + income + education,
data=Duncan)
25
Multivariate plots
These basic plots can be enhanced in
many ways to be more informative.
library(car)
scatterplotMatrix(~prestige + income + education, data=Duncan, id.n=2)
26
Multivariate plots: corrgrams
For larger data sets, visual
summaries are often more useful
than direct plots of the raw data
See: Friendly, M. Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 2002, 56, 316-324
27
Multivariate plots: corrgrams
For even larger data sets, more
abstract visual summaries are
necessary to see the patterns of
relationships.
See: Friendly, M. Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 2002, 56, 316-324
28
Generalized pairs plots
Generalized pairs plots from the gpairs
package handle both categorical (C) and
quantitative (Q) variables in sensible ways
x y plot
Q Q scatterplot
C Q boxplot
Q C barcode
C C mosaic
library(gpairs)
data(Arthritis)
gpairs(Arthritis[, c(5, 2:5)], …)
29
Models: diagnostic plots
Linear statistical models (ANOVA,
regression), y = X β + ε, require some
assumptions: ε ~ N(0, σ2)
30
Models: Added variable plots
The car package has many more functions for plotting linear model objects
Among these, added variable plots show the partial relations of y to each x, holding all
other predictors constant.
library(car)
avPlots(duncan.mod, id.n=2,ellipse=TRUE, …)
31
Models: Interpretation
Fitted models are often difficult to interpret from tables of coefficients
Call:
lm(formula = prestige ~ income + education + type, data = Duncan)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.18503 3.71377 -0.050 0.96051 How to understand
income
education
0.59755
0.34532
0.08936
0.11361
6.687 5.12e-08 ***
3.040 0.00416 **
effect of each
typeprof 16.65751 6.99301 2.382 0.02206 * predictor?
typewc -14.66113 6.10877 -2.400 0.02114 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
32
Models: Effect plots
Fitted models are more easily interpreted by plotting the predicted values.
Effect plots do this nicely, making plots for each high-order term, controlling for others
library(effects)
duncan.eff1 <- allEffects(duncan.mod1)
plot(duncan.eff1)
33
Models: Coefficient plots
Sometimes you need to report or display the coefficients from a fitted model.
A plot of coefficients with CIs is sometimes more effective than a table.
library(coefplot)
duncan.mod2 <- lm(prestige ~ income * education, data=Duncan)
coefplot(duncan.mod2, intercept=FALSE, lwdInner=2, lwdOuter=1,
title="Coefficient plot for duncan.mod2")
34
Coefficient plots become
increasingly useful as:
(a) models become more complex
(b) we have several models to family income - wife's income
compare
log wage rate for working women
It uses:
lattice::wireframe(z ~ x + y, …)
36
3D graphics: code
1. Generate data for the model z = 10 + .5x +.3y + .2 x*y
b0 <- 10 # intercept
b1 <- .5 # x coefficient
b2 <- .3 # y coefficient
int12 <- .2 # x*y coefficient
g <- expand.grid(x = 1:20, y = 1:20)
g$z <- b0 + b1*g$x + b2*g$y + int12*g$x*g$y
37
3D graphics
38
Statistical animations
Statistical concepts can often be
illustrated in a dynamic plot of some
process.
39
Maps and spatial visualizations
Spatial visualization in R, combines map data sets, statistical models for spatial data,
and a growing number of R packages for map-based display
library(HistData)
SnowMap(density=TRUE,
main=“Snow's Cholera Map, Death Intensity”)
SnowMap(density=TRUE,
main="Snow's Cholera Map with Pump Neighborhoods“)
library(igraph)
tree <- graph.tree(10) full <- graph.full(10)
tree <- set.edge.attribute(tree, "color", value="black") fullIgraph <- set.edge.attribute(full, "color",
plot(treeIgraph, value="black")
layout=layout.reingold.tilford(tree, plot(full, layout=layout.circle)
root=1, flip.y=FALSE))
43
Diagrams: Network diagrams
graphvis (http://www.graphviz.org/) is a comprehensive program for drawing
network diagrams and abstract graphs. It uses a simple notation to describe nodes
and edges.
The Rgraphviz package (from Bioconductor) provides an R interface
44
Diagrams: Flow charts
The diagram package:
library(sem)
union.mod <- specifyEquations(covs="x1, x2", text="
y1 = gam12*x2
y2 = beta21*y1 + gam22*x2
y3 = beta31*y1 + beta32*y2 + gam31*x1
")
union.sem <- sem(union.mod, union, N=173)
pathDiagram(union.sem,
edge.labels="values",
file="union-sem1",
min.rank=c("x1", "x2"))
46
Dynamically updated data visualizations
The wind map app, http://hint.fm/wind/ is one of a growing number of R-based
applications that harvests data from standard sources, and presents a visualization
47
Web scraping: CRAN package history
R has extensive facilities for extracting and processing information obtained from web
pages. The XML package is one useful tool for this purpose.
This example:
• downloads information about all R
packages from the CRAN web site,
• finds & counts all of those available for
each R version,
• plots the counts with ggplot2, adding a
smoothed curve, and plot annotations
48
shiny: dynamic app showing downloads of R packages
https://gallery.shinyapps.io/087-crandash/
Reproducible analysis & reporting
R Studio, together with the knitr
and rmarkdown packages provide
an easy way to combine writing,
analysis, and R output into
complete documents
50
Output formats and templates
The integration of R, R Studio, knitr,
rmarkdown and other tools is now
highly advanced.
52
R code chunks
R code chunks are run by knitr, and the results are inserted in the output document
An R chunk:
```{r name, options}
# R code here
```
53
The R Markdown Cheat Sheet provides most of the details
https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf
54
R notebooks
Often, you just want to “compile” an R script, and get the output embedded in the
result, in HTML, Word, or PDF. Just type Ctrl-Shift-K or tap the Compile Report button
55
Summary & Homework
• Today has been mostly about an overview of R
graphics, but with emphasis on:
R, R Studio, R package tools
Roles of graphics in data analysis,
A small gallery of examples of different kinds of graphic applications in
R; only small samples of R code
Work flow: How to use R productively in analysis & reporting
• Next week: start on skills with traditional graphics
• Homework:
Find one or more examples of data graphs from your research area
• What are the graphic elements: points, lines, areas, regions, text, labels, ???
• How could they be “described” to software such as R?
• How could they be improved?
56