0% found this document useful (0 votes)
4 views67 pages

MDPN460 Lecture06

Uploaded by

mohamedggharib02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views67 pages

MDPN460 Lecture06

Uploaded by

mohamedggharib02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

MDPN460 – Industrial

Engineering Lab
Lecture 6

Programming Statistical Graphics in R


1 / 67
Today’s Lecture

Simple high level plots

Low level graphics functions

Graphics as a language: ggplot2

2 / 67
Graphics in R

There are several different graphics systems in R.

The oldest one is now known as base graphics
which is analogous to drawing with ink on paper.

You build up a picture by drawing fixed things on
it, and once something is drawn, it is permanent,
though you might be able to cover it with
something else.

Since the very beginning, base graphics has
been designed to allow easy production of good
quality scientific plots. 3 / 67
Graphics in R

The grid package provides the basis for a newer
graphics system.

The programmer has access to the individual
pieces of a graph, and can modify them: a graph
is more like a physical model being built and
displayed, rather than just drawn.

The ggplot2 and lattice packages provide
functions for high level plots based on grid
graphics.
4 / 67
Graphics in R

In ggplot2 the code to draw a plot is an abstract
description of the intention of what to show in
the plot, rather than how to draw it.

The package translates that description into grid
commands when you ask to draw it.

There are other more exotic graphics systems
available in R as well, providing interactive
graphics, 3D displays, etc.

5 / 67
Simple High-Level Plots


Bar charts and dot charts

Pie charts

Histograms

Boxplots

Scatterplots

Plotting data from data frames

QQ plots
6 / 67
Bar charts and dot charts

Bar and dot charts are simple graphs that
represent a single set of values.
> head(WorldPhones)
> wf60 <- WorldPhones[6,]
> barplot(wf60)

7 / 67
Elements in Bar Charts

Adding titles and axes labels
> barplot(wf60, main = "Telephone Usage in 1960", cex.names = 0.75,
+ cex.axis = 0.75, ylab = "Telephones (in Thousands)", xlab="Region")

8 / 67
Elements in Bar Charts

> barplot(wf60, main = "Telephone Usage in 1960", cex.names = 0.75,


+ cex.axis = 0.75, ylab = "Telephones (in Thousands)", xlab="Region")


cex.names = 0.75 → reduce the size of the region names to
0.75 of their former size,

cex.axis = 0.75 → reduce the labels on the vertical axis by
the same amount.

The main argument sets the main title for the plot,

the ylab and xlab arguments are used to include axis labels

9 / 67
Dot Charts

An alternative way to plot the same kind of data
is in a dot chart:
> dotchart(wf60, xlab = "Number of phones ('000s)")

10 / 67
Bar Plots

Data sets having more complexity can also be displayed using
these graphics functions.

The barplot() function has a number of options which allow for
side-by-side or stacked styles of displays, legends can be
included using the legend argument, and so on.
– Example: The VADeaths data set in R contains death rates
(number of deaths per 1000 population per year) in various
sub-populations within the state of Virginia in 1940.
> head(VADeaths)
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1 11 / 67
70-74 66.0 54.3 71.1 50.0
Bar Plots

This data set may be displayed as a sequence of bar charts,
one for each subgroup
> barplot(VADeaths, beside = TRUE, ylim = c(0, 90),
+ ylab = "Deaths per 1000",
+ main = "Death rates in Virginia", cex.names=0.75, cex.axis = 0.5,
+ legend = TRUE, args.legend = list(x = "topright",inset = c(0, -0.4)))

12 / 67
Bar Plots

> barplot(VADeaths, beside = TRUE, ylim = c(0, 90),


+ ylab = "Deaths per 1000",
+ main = "Death rates in Virginia", cex.names=0.75, cex.axis = 0.5,
+ legend = TRUE, args.legend = list(x = "topright",inset = c(0, -0.4)))


The bars correspond to each number in the matrix.

The beside = TRUE argument causes the values in each column to be plotted
side-by-side;

The ylim = c(0, 90) argument modifies the vertical scale of the graph to make room
for the legend.

The main = "Death rates in Virginia" sets the main title for the plot;

cex.names and cexaxis reduce the sizes of the labels in the axes to the stated
percentage of their default size;

legend = TRUE causes the legend in the top right to be added;

args.legend = list(x = "topright",inset = c(0, -0.4)) modifies the position of
the legend to move upward to avoid its overlap with the bars. 13 / 67
Dot Charts (again)

An alternative way to plot the same kind of data
is in a dot chart:
> dotchart(VADeaths, xlim = c(0, 75), xlab = "Deaths per 1000",
+ main = "Death rates in Virginia", cex = 0.6)

14 / 67
Dot Charts (again)

> dotchart(VADeaths, xlim = c(0, 75), xlab = "Deaths per 1000",


+ main = "Death rates in Virginia", cex = 0.6)


We set the x-axis limits to run from 0 to 75 so that zero is included,
because it is natural to want to compare the total rates in the different
groups.

We have also set cex to 0.6. This shrinks the plotting character to 60% of
its default size, but more importantly, shrinks the axis tick labels to 60% of
their default size.

For this example, the default setting would cause some overlapping of the
tick labels, making them more difficult to read.

15 / 67
Pie Charts

Pie charts display a vector of numbers by breaking up a
circular disk into pieces whose angle (and hence area) is
proportional to each number.

For example, the letter grades assigned to a class might arise
in the proportions, A: 18%, B: 30%, C: 32%, D: 10%, and F: 10%.
These data are graphically using the following R code
> groupsizes <- c(18, 30, 32, 10, 10)
> labels <- c("A", "B", "C", "D", "F")
> pie(groupsizes, labels,
+ col = c("purple", "green", "blue", "red", "yellow"))

16 / 67
Histograms

A histogram is a special type of bar chart that is used to
show the frequency distribution of a collection of
numbers. Each bar represents the count of x values that
fall in the range indicated by the base of the bar.
> hist(log(1000*islands, 10), xlab = "Area (on base 10 log scale)",
+ main = "Areas of the World's Largest Landmasses")

17 / 67
Histograms – Number of Bars

If you have n values of x , R, by default, divides the range
into approximately log2(n) + 1 intervals, giving rise to that
number of bars.
> length(islands)
[1] 48
> 2^5
[1] 32
> 2^6
[1] 64
> log(48, base=2)
[1] 5.584963

it can be seen that R should choose about 5 or 6 bars. In
fact, it chose 8, because it also attempts to put the
breaks at round numbers (multiples of 0.5 in this case).
18 / 67
Histograms – Number of Bars
● The log2(n)+1 rule (known as the “Sturges rule”) is not
always satisfactory for large values of n, giving too few
bars.

Current research suggests that the number of bars
should increase proportionally to n1/3 instead of log2(n).
The breaks = "Scott" and breaks ="Freedman-Diaconis"
options provide variations on this choice.

19 / 67
Histograms – Number of Bars
> r <- sample(1:1000, 10000, replace=TRUE)
> r <- r + sample(-300:300, 10000, replace=TRUE)
> r <- r * sample(-5:5, 10000, replace=TRUE)
> hist(r)
> hist(r, breaks="Freedman-Diaconis")
> hist(r, breaks="Scott")

Freedman-Diaconis Scott
20 / 67
Boxplots

A boxplot (or “box-and-whisker plot”) is an
alternative to a histogram to give a quick
visual display of the main features of a set of
data.

A rectangular box is drawn, together with lines
which protrude from two opposing sides.

The box gives an indication of the location and
spread of the central portion of the data, while
the extent of the lines (the “whiskers”) provides
an idea of the range of the bulk of the data.

In some implementations, outliers
(observations that are very different from the
rest of the data) are plotted as separate points.
21 / 67
Boxplots

The box thus drawn defines the interquartile range
(IQR). This is the difference between the upper
quartile and the lower quartile.

We use the IQR to give a measure of the amount of
variability in the central portion of the data set, since
about 50% of the data will lie within the box.

The lower whisker is drawn from the lower end of the
box to the smallest value that is no smaller than 1.5
IQR below the lower quartile.

Similarly, the upper whisker is drawn from the middle
of the upper end of the box to the largest value that is
no larger than 1.5 IQR above the upper quartile.

The rationale for these definitions is that when data
are drawn from the normal distribution or other
distributions with a similar shape, about 99% of the
22 / 67
observations will fall between the whiskers.
Boxplot example
> boxplot(Sepal.Length ~ Species, data = iris,
+ ylab = "Sepal length (cm)", main = "Iris measurements",
+ boxwex = 0.5)


This example compares the distributions
of the sepal length measurements
between the different species. Here we
have used R’s formula-based interface
to the graphics function: the syntax
Sepal.Length ˜ Species is read as
“Sepal.Length depending on Species,”
where both are columns of the data
frame specified by data = iris .

The boxplot() function draws separate
side-by-side boxplots for each species.

From these, we can see substantial
differences between the mean lengths
for the species, and that there is one
unusually small specimen among the
virginica samples. 23 / 67
Scatterplots

When doing statistics and data science, most of the
interesting problems have to do with the relationships
between different variables. To study this, one of the
most commonly used plots is the scatterplot, in which
points (xi ,yi ), i = 1, . . . ,n are drawn using dots or other
symbols.

These are drawn to show relationships between the x i
and y i values. In R, scatterplots (and many other kinds of
plots) are drawn using the plot() function.

Its basic usage is plot(x, y, ...) where x and y are numeric
vectors of the same length holding the data to be
plotted. 24 / 67
Scatterplots
> x <- rnorm(100) # assigns 100 random normal observations to x
> y <- rpois(100, 30) # assigns 100 random Poisson observations
# to y; mean value is 30
# the resulting value should be near 30
> mean(y)
[1] 30.39
> plot(x, y, main = "Poisson versus Normal")

25 / 67
Scatterplots

Try the following variants to see their effects.
> plot(x, y, main = "Poisson versus Normal")
> plot(x, y, main = "Poisson versus Normal", pch=15)> plot(x, y, main =
"Poisson versus Normal", pch=10, type="l")
> plot(x, y, main = "Poisson versus Normal", pch=15, type="l")
> plot(x, y, main = "Poisson versus Normal", type="l")
> plot(sort(x), sort(y), main = "Poisson versus Normal", type="l")

26 / 67
Plotting data from data frames
> head(Orange)
Tree age circumference
1 1 118 30
2 1 484 58
3 1 664 87
4 1 1004 115
5 1 1231 120
6 1 1372 142
> plot(circumference ~ age, data=Orange)

27 / 67
Plotting data from data frames
> plot(circumference ~ age, data = Orange, pch = as.character(Tree), cex=0.6)

28 / 67
QQ Plots

Quantile-quantile plots (otherwise known as QQ
plots) are a type of scatterplot used to compare
the distributions of two groups or to compare a
sample with a reference distribution.

n the case where there are two groups of equal
size, the QQ plot is obtained by first sorting the
observations in each group: X[1] ≤ · · · ≤ X[n] and
Y[1] ≤ · · · ≤ Y[n]. Next, draw a scatterplot of
(X[i],Y[i]), for i = 1, . . . ,n.

29 / 67
QQ Plots

When the groups are of different sizes, some
scheme must be used to artificially match them.
R reduces the size of the larger group to the size
of the smaller one by keeping the minimum and
maximum values, and choosing equally spaced
quantiles between.

For example, if there were five X values but 20 Y
values, then the X values would be plotted
against the minimum, lower quartile, median,
upper quartile and maximum of the Y values.
30 / 67
QQ Plots

When plotting a single sample against a reference
distribution, theoretical quantiles are used for one coordinate.
R normally puts the theoretical quantiles on the x-axis and the
data on the y-axis, but some authors make the opposite
choice.

To avoid biases, quantiles are chosen corresponding to
probabilities (i − 1/2)/n: these are centered evenly between
zero and one.

When the distributions of X and Y match, the points in the QQ
plot will lie near the line y = x. We will see a different straight
line if one distribution is a linear transformation of the other.

On the other hand, if the two distributions are not the same,
we will see systematic patterns in the QQ plot. 31 / 67
QQ Plot Examples
> par(mfrow = c(1,4))
> X <- rnorm(1000)
> A <- rnorm(1000)
> qqplot(X, A, main = "A and X are the same")
> B <- rnorm(1000, mean = 3, sd = 2)
> qqplot(X, B, main = "B is rescaled X")
> C <- rt(1000, df = 2)
> qqplot(X, C, main = "C has heavier tails")
> D <- rexp(1000)
> qqplot(X, D, main = "D is skewed to the right")

32 / 67
QQ Plot Examples
> par(mfrow = c(1,4))
> X <- rnorm(1000)
> A <- rnorm(1000)
> qqplot(X, A, main = "A and X are the same")
> B <- rnorm(1000, mean = 3, sd = 2)
> qqplot(X, B, main = "B is rescaled X")
> C <- rt(1000, df = 2)
> qqplot(X, C, main = "C has heavier tails")
> D <- rexp(1000)
> qqplot(X, D, main = "D is skewed to the right")


The mfrow parameter of the par() function is giving a 1 × 4 layout

The first plot is based on identical normal distributions, the second
plot is based on normal distributions having different means and
standard deviations, the third plot is based on a standard normal and
a t distribution on 2 degrees of freedom, and the fourth plot is based
on a standard normal compared with an exponential distribution.
33 / 67
Low level graphics functions

Functions like barplot() , dotchart() , and plot() do their work by
using low level graphics functions to draw lines and points, to
establish where they will be placed on a page, and so on.

Several functions exist to add components to existing graphs:

34 / 67
Add Lines to Scatter Plots

Consider the Orange data frame again. In addition to
using different plotting characters for the different trees,
we will pass lines of best fit (i.e. least-squares regression
lines) through the points corresponding to each tree.
> plot(circumference ~ age, pch = as.numeric(as.character(Tree)),
+ data = Orange)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "1"),
+ lty = 1)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "2"),
+ lty = 2)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "3"),
+ lty = 3)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "4"),
+ lty = 4)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "5"),
+ lty = 5)
> legend("topleft", legend = paste("Tree", 1:5), lty = 1:5, pch = 1:5,
35 / 67
+ lwd = c(1, 1, 2, 1, 1))
Add Lines to Scatter Plots
> plot(circumference ~ age, pch = as.numeric(as.character(Tree)),
+ data = Orange)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "1"),
+ lty = 1)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "2"),
+ lty = 2)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "3"),
+ lty = 3)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "4"),
+ lty = 4)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "5"),
+ lty = 5)
> legend("topleft", legend = paste("Tree", 1:5), lty = 1:5, pch = 1:5,
+ lwd = c(1, 1, 2, 1, 1))

The best-fit lines for the five trees can be obtained using the lm()
function which relates circumference to age for each tree.

A legend has been added to identify which data points come from
the different trees.
36 / 67

In these plots lty gives the line type, and lwd gives the line width.
Add Lines to Scatter Plots

37 / 67
Connecting lines instead of the
best fit line

Redo the previous commands with changing lm
with lines.
> plot(circumference ~ age, pch = as.numeric(as.character(Tree)),
+ data = Orange)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "1"),
+ lty = 1)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "2"),
+ lty = 2)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "3"),
+ lty = 3)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "4"),
+ lty = 4)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "5"),
+ lty = 5)
> legend("topleft", legend = paste("Tree", 1:5), lty = 1:5, pch = 1:5,
+ lwd = c(1, 1, 2, 1, 1))
38 / 67
Graphics as a language - ggplot2

The ideas behind ggplot2 were first described in
a 1999 book called “The Grammar of Graphics”
by Leland Wilkinson.

A second expanded edition was published in
2005.

They were expanded again and popularized
when Wickham published ggplot2 in 2007.

Our own description is based on version 3.3.2 of
that package, published in 2020.
39 / 67
Graphics as a language - ggplot2

The ggplot2 package gives a somewhat abstract
but very rich way to describe graphics.

We will start our discussion with an example
showing how to re-draw the bar chart of the
world telephones presented earlier, adding more
detail as we proceed.

40 / 67
Plotting a Bar Chart

To plot the world phone data that we saw at the
start of this lecture, we would write
> library(ggplot2)
> region <- names(WorldPhones[6,])
> phones60 <- data.frame(Region = factor(region, levels = region),
+ Telephones = WorldPhones[6,])
> ggplot(data = phones60, aes(x=Region, y=Telephones)) + geom_col()

41 / 67
Plotting a Bar Chart

> library(ggplot2)
> region <- names(WorldPhones[6,])
> phones60 <- data.frame(Region = factor(region, levels = region),
+ Telephones = WorldPhones[6,])
> ggplot(data = phones60, aes(x=Region, y=Telephones)) + geom_col()

The first lines of this snippet are needed to load the plotting package and
to prepare a data frame consisting of the telephone counts that
correspond to the various world regions.

The new feature is in the ggplot invocation where aes says that we want
the Region names on the x-axis in their original order, and the telephone
counts on the y-axis.

We want to display the data using bars, hence the use of the geom_col
function.

42 / 67
The Idea Behind ggplot2

The general idea in ggplot2 is that plots are described by
a sum of objects produced by function calls.

As with any addition in R, we use + , but you should think
of the whole expression as a way to describe the plot as
a combination of different components.

43 / 67
Sequence of Using ggplot2

Most ggplot2 plot expressions start with a call to the
ggplot() function.

Its first argument is data, and that’s where we specify the
data component of the plot, which is always a data
frame.

The second component of every plot is called the
“aesthetic mapping” of the plot, or “aesthetics” for short.

This doesn’t refer to the appreciation of beauty; it refers
to the ways that quantities in our data are expressed in
the plot.

We use the aes() function to specify the aesthetics.
44 / 67
About aes()

The aesthetics don’t tell us how Region is displayed on
the x-axis, just that it is. To specify how it is displayed, we
give one or more layers, using geom_*() function calls.

In the previous example, we requested a bar plot by
using the geom_col() function.

Because Region is a factor, geom_col() displays one bar
per level.

Because we had aes(x = Region, y = Telephones) the bars
are vertical.

We could get horizontal bars by using aes(y = Region, x =
Telephones) .
45 / 67
The ggplot2 Grammar

ggplot2 plots are usually created as a sum of function
calls.

Each of those function calls produces a special object,
which the ggplot2 code knows how to combine, provided
you follow certain rules.

First, you need to start with a "ggplot" object. This can be
produced by a call to ggplot() or to some other function
that calls it, and it can be saved in a variable and used
later in a different plot.

46 / 67
Creating a ggplot2 Object

The "ggplot" object sets certain defaults which can be used by the
layers of the plot. Normally the first argument specifies a data
frame, and that data can be used in all layers of the plot.
> library(ggplot2)
> region <- names(WorldPhones[6,])
> phones60 <- data.frame(Region = factor(region, levels = region),
+ Telephones = WorldPhones[6,])
> g1 <- ggplot(phones60, aes(Region, Telephones))

Because we assigned the result to g1 , it is not printed, and no
graph is displayed. To display it, we can print that object:
> g1

47 / 67
Adding Objects to ggplot2 Objects


The most common objects are the layers produced by
the geom_*() functions (discussed later).

Other less common components include:

48 / 67
Adding Objects to ggplot2 Objects


Scales are more qualitative than the others. We have seen two
scales so far in WorldPhones example.

Because Region is a factor, it is automatically displayed using a
discrete scale, and because Telephones is a number, it is
displayed on a continuous scale.

These automatic choices could be changed by adding in a call
to a different scale_*() function. 49 / 67
Adding Objects to ggplot2 Objects


Transformations are changes to values before plotting.
For example:
scale_y_continuous(trans = "log10")

will take the base 10 logarithm of the y-axis values
before plotting.
50 / 67
scale_* example
> g3 <- g1 + geom_col() + scale_y_continuous(trans = "log10")
> g3

51 / 67
Coordinate System with coord_*

The coordinate system determines how the x and y values are
displayed on the plot. For example, to display a pie chart in
ggplot2 , you display a bar plot in polar coordinates:
> ggplot(phones60, aes(x = "", y = Telephones, fill = Region)) +
+ coord_polar(theta = "y") +
+ geom_col()

52 / 67
theme_* example
> g4 <- g3 + theme_dark()
> g4

53 / 67
Layers in ggplot2

There are many ways to display data, and ggplot2 puts
“ways to display data" into the geom_*() layer functions.

Version 3.3.2 of ggplot2 contains 52 of these functions,
and others are available in other contributed packages.

54 / 67
Layers in ggplot2

Each kind of layer works with a different set of
aesthetics.

We have already seen x and y aesthetics; others that are
commonly supported are:

55 / 67
Layers example
> g1 <- ggplot(phones60, aes(Region, Telephones))
> g2 <- g1 + geom_col() + geom_point(col = "red")
> g2
> g3 <- g2 + geom_line(col = "blue", aes(x = as.numeric(Region)))
> g3

56 / 67
Layers example
> g2 <- g1 + geom_col(alpha = 0.4) + geom_point(col = "red")
> g3 <- g2 + geom_line(col = "blue", aes(x = as.numeric(Region)))
> g3

57 / 67
Layers example
> ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot()

58 / 67
Layers example
> ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_violin()

59 / 67
Colors in R

There are several different ways to identify colors in R.
They can be specified by name;

The function colors() lists hundreds of names recognized
by R:

> colors()
[1] "white" "aliceblue" "antiquewhite"
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"


[112] "darkslategray4" "darkslategrey" "darkturquoise"
[115] "darkviolet" "deeppink" "deeppink1"

[652] "yellow" "yellow1" "yellow2"
[655] "yellow3" "yellow4" "yellowgreen"
60 / 67
Hexadecimal Colors

They can also be constructed using hexadecimal (base
16) codes for the levels of red, green, and blue. For
example, red would be specified as "#FF0000" , where
FF , the base 16 representation of 255, is the maximum
level of red, and both green and blue have zero
contribution.

> g2 <- g1 + geom_col(alpha = 0.4, col = "#FF00FF") + geom_point(col = "red")


> g2

61 / 67
Color Pallets in R

R also maintains a palette of a small number of colors
that can be referenced by number. Since version 4.0.0,
there have been several choices of palettes by name:
> palette.pals()
[1] "R3" "R4" "ggplot2" "Okabe-Ito" "Accent"
[6] "Dark 2" "Paired" "Pastel 1" "Pastel 2" "Set 1"
[11] "Set 2" "Set 3" "Tableau 10" "Classic Tableau" "Polychrome 36"
[16] "Alphabet"
> palette()
[1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
"gray62"

to choose red points. Any of the standard R specifications for red would have
worked equally well: "red" , "#FF0000" , or, assuming we are using the "R3"
palette, the number 2 .

62 / 67
Specifying Colors in ggplot2

For the bars in a geom_col() layer, col controls the outline
color, and argument fill controls the fill color.

The second way to specify color in ggplot2 is to use the
col or fill aesthetic.
> ggplot(phones60, aes(Region, Telephones, fill = Region)) +
+ geom_col() +
+ scale_fill_brewer(palette = "Set2")

63 / 67
Specifying Colors in ggplot2

When the mapped variable is continuous, ggplot2 will
default to a gradient scale from light blue to dark blue,
produced by the scale_fill_gradient() function. For
example,
> ggplot(phones60, aes(Region, Telephones, fill = Telephones)) +
+ geom_col()

64 / 67
Customizing the Look of a Graph


There are several functions to change the labeling on the
graph. The ggtitle() function sets a title at the top, and xlab()
and ylab() set titles on the axes.

The theme() and theme_*() functions can be used to
change many details of the overall look of a graph.

The scale_*() functions can be used to customize the
mapping for each aesthetic.

The annotate() function works like a layer function, but
with fixed vectors of aesthetics, not values taken from
the data set for the plot.
65 / 67
Faceting


A strategy for displaying relations among three or
more variables is to divide the data into subsets
using the values of some of the variables, and then
draw multiple plots of the values of the other
variables in each of those subsets.

In ggplot2 this is called “faceting,” and the
facet_wrap() and facet_grid() functions are used to
implement it.

66 / 67
facet_wrap()

To study the trends over time in the WorldPhones data,
we first need to convert it to a data frame.
> phones <- data.frame(Year = as.numeric(rep(rownames(WorldPhones), 7)),
+ Region = rep(colnames(WorldPhones), each = 7),
+ Telephones = as.numeric(WorldPhones))
> ggplot(phones, aes(x = Region, y = Telephones, fill = Region)) +
+ geom_col() +
+ facet_wrap(vars(Year)) +
+ theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
+ xlab(element_blank())

67 / 67

You might also like