0% found this document useful (0 votes)

4 views67 pages

MDPN460 Lecture06

Uploaded by

mohamedggharib02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views67 pages

MDPN460 Lecture06

Uploaded by

mohamedggharib02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

MDPN460 – Industrial

Engineering Lab
Lecture 6

Programming Statistical Graphics in R

1 / 67
Today’s Lecture
●
Simple high level plots
●
Low level graphics functions
●
Graphics as a language: ggplot2

2 / 67
Graphics in R
●
There are several different graphics systems in R.
●
The oldest one is now known as base graphics
which is analogous to drawing with ink on paper.
●
You build up a picture by drawing fixed things on
it, and once something is drawn, it is permanent,
though you might be able to cover it with
something else.
●
Since the very beginning, base graphics has
been designed to allow easy production of good
quality scientific plots. 3 / 67
Graphics in R
●
The grid package provides the basis for a newer
graphics system.
●
The programmer has access to the individual
pieces of a graph, and can modify them: a graph
is more like a physical model being built and
displayed, rather than just drawn.
●
The ggplot2 and lattice packages provide
functions for high level plots based on grid
graphics.
4 / 67
Graphics in R
●
In ggplot2 the code to draw a plot is an abstract
description of the intention of what to show in
the plot, rather than how to draw it.
●
The package translates that description into grid
commands when you ask to draw it.
●
There are other more exotic graphics systems
available in R as well, providing interactive
graphics, 3D displays, etc.

5 / 67
Simple High-Level Plots

●
Bar charts and dot charts
●
Pie charts
●
Histograms
●
Boxplots
●
Scatterplots
●
Plotting data from data frames
●
QQ plots
6 / 67
Bar charts and dot charts
●
Bar and dot charts are simple graphs that
represent a single set of values.
> head(WorldPhones)
> wf60 <- WorldPhones[6,]
> barplot(wf60)

7 / 67
Elements in Bar Charts
●
Adding titles and axes labels
> barplot(wf60, main = "Telephone Usage in 1960", cex.names = 0.75,
+ cex.axis = 0.75, ylab = "Telephones (in Thousands)", xlab="Region")

8 / 67
Elements in Bar Charts

> barplot(wf60, main = "Telephone Usage in 1960", cex.names = 0.75,

+ cex.axis = 0.75, ylab = "Telephones (in Thousands)", xlab="Region")

●
cex.names = 0.75 → reduce the size of the region names to
0.75 of their former size,
●
cex.axis = 0.75 → reduce the labels on the vertical axis by
the same amount.
●
The main argument sets the main title for the plot,
●
the ylab and xlab arguments are used to include axis labels

9 / 67
Dot Charts
●
An alternative way to plot the same kind of data
is in a dot chart:
> dotchart(wf60, xlab = "Number of phones ('000s)")

10 / 67
Bar Plots
●
Data sets having more complexity can also be displayed using
these graphics functions.
●
The barplot() function has a number of options which allow for
side-by-side or stacked styles of displays, legends can be
included using the legend argument, and so on.
– Example: The VADeaths data set in R contains death rates
(number of deaths per 1000 population per year) in various
sub-populations within the state of Virginia in 1940.
> head(VADeaths)
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1 11 / 67
70-74 66.0 54.3 71.1 50.0
Bar Plots
●
This data set may be displayed as a sequence of bar charts,
one for each subgroup
> barplot(VADeaths, beside = TRUE, ylim = c(0, 90),
+ ylab = "Deaths per 1000",
+ main = "Death rates in Virginia", cex.names=0.75, cex.axis = 0.5,
+ legend = TRUE, args.legend = list(x = "topright",inset = c(0, -0.4)))

12 / 67
Bar Plots

> barplot(VADeaths, beside = TRUE, ylim = c(0, 90),

+ ylab = "Deaths per 1000",
+ main = "Death rates in Virginia", cex.names=0.75, cex.axis = 0.5,
+ legend = TRUE, args.legend = list(x = "topright",inset = c(0, -0.4)))

●
The bars correspond to each number in the matrix.
●
The beside = TRUE argument causes the values in each column to be plotted
side-by-side;
●
The ylim = c(0, 90) argument modifies the vertical scale of the graph to make room
for the legend.
●
The main = "Death rates in Virginia" sets the main title for the plot;
●
cex.names and cexaxis reduce the sizes of the labels in the axes to the stated
percentage of their default size;
●
legend = TRUE causes the legend in the top right to be added;
●
args.legend = list(x = "topright",inset = c(0, -0.4)) modifies the position of
the legend to move upward to avoid its overlap with the bars. 13 / 67
Dot Charts (again)
●
An alternative way to plot the same kind of data
is in a dot chart:
> dotchart(VADeaths, xlim = c(0, 75), xlab = "Deaths per 1000",
+ main = "Death rates in Virginia", cex = 0.6)

14 / 67
Dot Charts (again)

> dotchart(VADeaths, xlim = c(0, 75), xlab = "Deaths per 1000",

+ main = "Death rates in Virginia", cex = 0.6)

●
We set the x-axis limits to run from 0 to 75 so that zero is included,
because it is natural to want to compare the total rates in the different
groups.
●
We have also set cex to 0.6. This shrinks the plotting character to 60% of
its default size, but more importantly, shrinks the axis tick labels to 60% of
their default size.
●
For this example, the default setting would cause some overlapping of the
tick labels, making them more difficult to read.

15 / 67
Pie Charts
●
Pie charts display a vector of numbers by breaking up a
circular disk into pieces whose angle (and hence area) is
proportional to each number.
●
For example, the letter grades assigned to a class might arise
in the proportions, A: 18%, B: 30%, C: 32%, D: 10%, and F: 10%.
These data are graphically using the following R code
> groupsizes <- c(18, 30, 32, 10, 10)
> labels <- c("A", "B", "C", "D", "F")
> pie(groupsizes, labels,
+ col = c("purple", "green", "blue", "red", "yellow"))

16 / 67
Histograms
●
A histogram is a special type of bar chart that is used to
show the frequency distribution of a collection of
numbers. Each bar represents the count of x values that
fall in the range indicated by the base of the bar.
> hist(log(1000*islands, 10), xlab = "Area (on base 10 log scale)",
+ main = "Areas of the World's Largest Landmasses")

17 / 67
Histograms – Number of Bars
●
If you have n values of x , R, by default, divides the range
into approximately log2(n) + 1 intervals, giving rise to that
number of bars.
> length(islands)
[1] 48
> 2^5
[1] 32
> 2^6
[1] 64
> log(48, base=2)
[1] 5.584963
●
it can be seen that R should choose about 5 or 6 bars. In
fact, it chose 8, because it also attempts to put the
breaks at round numbers (multiples of 0.5 in this case).
18 / 67
Histograms – Number of Bars
● The log2(n)+1 rule (known as the “Sturges rule”) is not
always satisfactory for large values of n, giving too few
bars.
●
Current research suggests that the number of bars
should increase proportionally to n1/3 instead of log2(n).
The breaks = "Scott" and breaks ="Freedman-Diaconis"
options provide variations on this choice.

19 / 67
Histograms – Number of Bars
> r <- sample(1:1000, 10000, replace=TRUE)
> r <- r + sample(-300:300, 10000, replace=TRUE)
> r <- r * sample(-5:5, 10000, replace=TRUE)
> hist(r)
> hist(r, breaks="Freedman-Diaconis")
> hist(r, breaks="Scott")

Freedman-Diaconis Scott
20 / 67
Boxplots
●
A boxplot (or “box-and-whisker plot”) is an
alternative to a histogram to give a quick
visual display of the main features of a set of
data.
●
A rectangular box is drawn, together with lines
which protrude from two opposing sides.
●
The box gives an indication of the location and
spread of the central portion of the data, while
the extent of the lines (the “whiskers”) provides
an idea of the range of the bulk of the data.
●
In some implementations, outliers
(observations that are very different from the
rest of the data) are plotted as separate points.
21 / 67
Boxplots
●
The box thus drawn defines the interquartile range
(IQR). This is the difference between the upper
quartile and the lower quartile.
●
We use the IQR to give a measure of the amount of
variability in the central portion of the data set, since
about 50% of the data will lie within the box.
●
The lower whisker is drawn from the lower end of the
box to the smallest value that is no smaller than 1.5
IQR below the lower quartile.
●
Similarly, the upper whisker is drawn from the middle
of the upper end of the box to the largest value that is
no larger than 1.5 IQR above the upper quartile.
●
The rationale for these definitions is that when data
are drawn from the normal distribution or other
distributions with a similar shape, about 99% of the
22 / 67
observations will fall between the whiskers.
Boxplot example
> boxplot(Sepal.Length ~ Species, data = iris,
+ ylab = "Sepal length (cm)", main = "Iris measurements",
+ boxwex = 0.5)

●
This example compares the distributions
of the sepal length measurements
between the different species. Here we
have used R’s formula-based interface
to the graphics function: the syntax
Sepal.Length ˜ Species is read as
“Sepal.Length depending on Species,”
where both are columns of the data
frame specified by data = iris .
●
The boxplot() function draws separate
side-by-side boxplots for each species.
●
From these, we can see substantial
differences between the mean lengths
for the species, and that there is one
unusually small specimen among the
virginica samples. 23 / 67
Scatterplots
●
When doing statistics and data science, most of the
interesting problems have to do with the relationships
between different variables. To study this, one of the
most commonly used plots is the scatterplot, in which
points (xi ,yi ), i = 1, . . . ,n are drawn using dots or other
symbols.
●
These are drawn to show relationships between the x i
and y i values. In R, scatterplots (and many other kinds of
plots) are drawn using the plot() function.
●
Its basic usage is plot(x, y, ...) where x and y are numeric
vectors of the same length holding the data to be
plotted. 24 / 67
Scatterplots
> x <- rnorm(100) # assigns 100 random normal observations to x
> y <- rpois(100, 30) # assigns 100 random Poisson observations
# to y; mean value is 30
# the resulting value should be near 30
> mean(y)
[1] 30.39
> plot(x, y, main = "Poisson versus Normal")

25 / 67
Scatterplots
●
Try the following variants to see their effects.
> plot(x, y, main = "Poisson versus Normal")
> plot(x, y, main = "Poisson versus Normal", pch=15)> plot(x, y, main =
"Poisson versus Normal", pch=10, type="l")
> plot(x, y, main = "Poisson versus Normal", pch=15, type="l")
> plot(x, y, main = "Poisson versus Normal", type="l")
> plot(sort(x), sort(y), main = "Poisson versus Normal", type="l")

26 / 67
Plotting data from data frames
> head(Orange)
Tree age circumference
1 1 118 30
2 1 484 58
3 1 664 87
4 1 1004 115
5 1 1231 120
6 1 1372 142
> plot(circumference ~ age, data=Orange)

27 / 67
Plotting data from data frames
> plot(circumference ~ age, data = Orange, pch = as.character(Tree), cex=0.6)

28 / 67
QQ Plots
●
Quantile-quantile plots (otherwise known as QQ
plots) are a type of scatterplot used to compare
the distributions of two groups or to compare a
sample with a reference distribution.
●
n the case where there are two groups of equal
size, the QQ plot is obtained by first sorting the
observations in each group: X[1] ≤ · · · ≤ X[n] and
Y[1] ≤ · · · ≤ Y[n]. Next, draw a scatterplot of
(X[i],Y[i]), for i = 1, . . . ,n.

29 / 67
QQ Plots
●
When the groups are of different sizes, some
scheme must be used to artificially match them.
R reduces the size of the larger group to the size
of the smaller one by keeping the minimum and
maximum values, and choosing equally spaced
quantiles between.
●
For example, if there were five X values but 20 Y
values, then the X values would be plotted
against the minimum, lower quartile, median,
upper quartile and maximum of the Y values.
30 / 67
QQ Plots
●
When plotting a single sample against a reference
distribution, theoretical quantiles are used for one coordinate.
R normally puts the theoretical quantiles on the x-axis and the
data on the y-axis, but some authors make the opposite
choice.
●
To avoid biases, quantiles are chosen corresponding to
probabilities (i − 1/2)/n: these are centered evenly between
zero and one.
●
When the distributions of X and Y match, the points in the QQ
plot will lie near the line y = x. We will see a different straight
line if one distribution is a linear transformation of the other.
●
On the other hand, if the two distributions are not the same,
we will see systematic patterns in the QQ plot. 31 / 67
QQ Plot Examples
> par(mfrow = c(1,4))
> X <- rnorm(1000)
> A <- rnorm(1000)
> qqplot(X, A, main = "A and X are the same")
> B <- rnorm(1000, mean = 3, sd = 2)
> qqplot(X, B, main = "B is rescaled X")
> C <- rt(1000, df = 2)
> qqplot(X, C, main = "C has heavier tails")
> D <- rexp(1000)
> qqplot(X, D, main = "D is skewed to the right")

32 / 67
QQ Plot Examples
> par(mfrow = c(1,4))
> X <- rnorm(1000)
> A <- rnorm(1000)
> qqplot(X, A, main = "A and X are the same")
> B <- rnorm(1000, mean = 3, sd = 2)
> qqplot(X, B, main = "B is rescaled X")
> C <- rt(1000, df = 2)
> qqplot(X, C, main = "C has heavier tails")
> D <- rexp(1000)
> qqplot(X, D, main = "D is skewed to the right")

●
The mfrow parameter of the par() function is giving a 1 × 4 layout
●
The first plot is based on identical normal distributions, the second
plot is based on normal distributions having different means and
standard deviations, the third plot is based on a standard normal and
a t distribution on 2 degrees of freedom, and the fourth plot is based
on a standard normal compared with an exponential distribution.
33 / 67
Low level graphics functions
●
Functions like barplot() , dotchart() , and plot() do their work by
using low level graphics functions to draw lines and points, to
establish where they will be placed on a page, and so on.
●
Several functions exist to add components to existing graphs:

34 / 67
Add Lines to Scatter Plots
●
Consider the Orange data frame again. In addition to
using different plotting characters for the different trees,
we will pass lines of best fit (i.e. least-squares regression
lines) through the points corresponding to each tree.
> plot(circumference ~ age, pch = as.numeric(as.character(Tree)),
+ data = Orange)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "1"),
+ lty = 1)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "2"),
+ lty = 2)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "3"),
+ lty = 3)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "4"),
+ lty = 4)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "5"),
+ lty = 5)
> legend("topleft", legend = paste("Tree", 1:5), lty = 1:5, pch = 1:5,
35 / 67
+ lwd = c(1, 1, 2, 1, 1))
Add Lines to Scatter Plots
> plot(circumference ~ age, pch = as.numeric(as.character(Tree)),
+ data = Orange)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "1"),
+ lty = 1)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "2"),
+ lty = 2)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "3"),
+ lty = 3)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "4"),
+ lty = 4)
> abline(lm(circumference ~ age, data = Orange, subset = Tree == "5"),
+ lty = 5)
> legend("topleft", legend = paste("Tree", 1:5), lty = 1:5, pch = 1:5,
+ lwd = c(1, 1, 2, 1, 1))
●
The best-fit lines for the five trees can be obtained using the lm()
function which relates circumference to age for each tree.
●
A legend has been added to identify which data points come from
the different trees.
36 / 67
●
In these plots lty gives the line type, and lwd gives the line width.
Add Lines to Scatter Plots

37 / 67
Connecting lines instead of the
best fit line
●
Redo the previous commands with changing lm
with lines.
> plot(circumference ~ age, pch = as.numeric(as.character(Tree)),
+ data = Orange)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "1"),
+ lty = 1)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "2"),
+ lty = 2)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "3"),
+ lty = 3)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "4"),
+ lty = 4)
> abline(lines(circumference ~ age, data = Orange, subset = Tree == "5"),
+ lty = 5)
> legend("topleft", legend = paste("Tree", 1:5), lty = 1:5, pch = 1:5,
+ lwd = c(1, 1, 2, 1, 1))
38 / 67
Graphics as a language - ggplot2
●
The ideas behind ggplot2 were first described in
a 1999 book called “The Grammar of Graphics”
by Leland Wilkinson.
●
A second expanded edition was published in
2005.
●
They were expanded again and popularized
when Wickham published ggplot2 in 2007.
●
Our own description is based on version 3.3.2 of
that package, published in 2020.
39 / 67
Graphics as a language - ggplot2
●
The ggplot2 package gives a somewhat abstract
but very rich way to describe graphics.
●
We will start our discussion with an example
showing how to re-draw the bar chart of the
world telephones presented earlier, adding more
detail as we proceed.

40 / 67
Plotting a Bar Chart
●
To plot the world phone data that we saw at the
start of this lecture, we would write
> library(ggplot2)
> region <- names(WorldPhones[6,])
> phones60 <- data.frame(Region = factor(region, levels = region),
+ Telephones = WorldPhones[6,])
> ggplot(data = phones60, aes(x=Region, y=Telephones)) + geom_col()

41 / 67
Plotting a Bar Chart

> library(ggplot2)
> region <- names(WorldPhones[6,])
> phones60 <- data.frame(Region = factor(region, levels = region),
+ Telephones = WorldPhones[6,])
> ggplot(data = phones60, aes(x=Region, y=Telephones)) + geom_col()
●
The first lines of this snippet are needed to load the plotting package and
to prepare a data frame consisting of the telephone counts that
correspond to the various world regions.
●
The new feature is in the ggplot invocation where aes says that we want
the Region names on the x-axis in their original order, and the telephone
counts on the y-axis.
●
We want to display the data using bars, hence the use of the geom_col
function.

42 / 67
The Idea Behind ggplot2
●
The general idea in ggplot2 is that plots are described by
a sum of objects produced by function calls.
●
As with any addition in R, we use + , but you should think
of the whole expression as a way to describe the plot as
a combination of different components.

43 / 67
Sequence of Using ggplot2
●
Most ggplot2 plot expressions start with a call to the
ggplot() function.
●
Its first argument is data, and that’s where we specify the
data component of the plot, which is always a data
frame.
●
The second component of every plot is called the
“aesthetic mapping” of the plot, or “aesthetics” for short.
●
This doesn’t refer to the appreciation of beauty; it refers
to the ways that quantities in our data are expressed in
the plot.
●
We use the aes() function to specify the aesthetics.
44 / 67
About aes()
●
The aesthetics don’t tell us how Region is displayed on
the x-axis, just that it is. To specify how it is displayed, we
give one or more layers, using geom_*() function calls.
●
In the previous example, we requested a bar plot by
using the geom_col() function.
●
Because Region is a factor, geom_col() displays one bar
per level.
●
Because we had aes(x = Region, y = Telephones) the bars
are vertical.
●
We could get horizontal bars by using aes(y = Region, x =
Telephones) .
45 / 67
The ggplot2 Grammar
●
ggplot2 plots are usually created as a sum of function
calls.
●
Each of those function calls produces a special object,
which the ggplot2 code knows how to combine, provided
you follow certain rules.
●
First, you need to start with a "ggplot" object. This can be
produced by a call to ggplot() or to some other function
that calls it, and it can be saved in a variable and used
later in a different plot.

46 / 67
Creating a ggplot2 Object
●
The "ggplot" object sets certain defaults which can be used by the
layers of the plot. Normally the first argument specifies a data
frame, and that data can be used in all layers of the plot.
> library(ggplot2)
> region <- names(WorldPhones[6,])
> phones60 <- data.frame(Region = factor(region, levels = region),
+ Telephones = WorldPhones[6,])
> g1 <- ggplot(phones60, aes(Region, Telephones))
●
Because we assigned the result to g1 , it is not printed, and no
graph is displayed. To display it, we can print that object:
> g1

47 / 67
Adding Objects to ggplot2 Objects

●
The most common objects are the layers produced by
the geom_*() functions (discussed later).
●
Other less common components include:

48 / 67
Adding Objects to ggplot2 Objects

●
Scales are more qualitative than the others. We have seen two
scales so far in WorldPhones example.
●
Because Region is a factor, it is automatically displayed using a
discrete scale, and because Telephones is a number, it is
displayed on a continuous scale.
●
These automatic choices could be changed by adding in a call
to a different scale_*() function. 49 / 67
Adding Objects to ggplot2 Objects

●
Transformations are changes to values before plotting.
For example:
scale_y_continuous(trans = "log10")
●
will take the base 10 logarithm of the y-axis values
before plotting.
50 / 67
scale_* example
> g3 <- g1 + geom_col() + scale_y_continuous(trans = "log10")
> g3

51 / 67
Coordinate System with coord_*
●
The coordinate system determines how the x and y values are
displayed on the plot. For example, to display a pie chart in
ggplot2 , you display a bar plot in polar coordinates:
> ggplot(phones60, aes(x = "", y = Telephones, fill = Region)) +
+ coord_polar(theta = "y") +
+ geom_col()

52 / 67
theme_* example
> g4 <- g3 + theme_dark()
> g4

53 / 67
Layers in ggplot2
●
There are many ways to display data, and ggplot2 puts
“ways to display data" into the geom_*() layer functions.
●
Version 3.3.2 of ggplot2 contains 52 of these functions,
and others are available in other contributed packages.

54 / 67
Layers in ggplot2
●
Each kind of layer works with a different set of
aesthetics.
●
We have already seen x and y aesthetics; others that are
commonly supported are:

55 / 67
Layers example
> g1 <- ggplot(phones60, aes(Region, Telephones))
> g2 <- g1 + geom_col() + geom_point(col = "red")
> g2
> g3 <- g2 + geom_line(col = "blue", aes(x = as.numeric(Region)))
> g3

56 / 67
Layers example
> g2 <- g1 + geom_col(alpha = 0.4) + geom_point(col = "red")
> g3 <- g2 + geom_line(col = "blue", aes(x = as.numeric(Region)))
> g3

57 / 67
Layers example
> ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot()

58 / 67
Layers example
> ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_violin()

59 / 67
Colors in R
●
There are several different ways to identify colors in R.
They can be specified by name;
●
The function colors() lists hundreds of names recognized
by R:

> colors()
[1] "white" "aliceblue" "antiquewhite"
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
…
…
[112] "darkslategray4" "darkslategrey" "darkturquoise"
[115] "darkviolet" "deeppink" "deeppink1"
…
[652] "yellow" "yellow1" "yellow2"
[655] "yellow3" "yellow4" "yellowgreen"
60 / 67
Hexadecimal Colors
●
They can also be constructed using hexadecimal (base
16) codes for the levels of red, green, and blue. For
example, red would be specified as "#FF0000" , where
FF , the base 16 representation of 255, is the maximum
level of red, and both green and blue have zero
contribution.

> g2 <- g1 + geom_col(alpha = 0.4, col = "#FF00FF") + geom_point(col = "red")

> g2

61 / 67
Color Pallets in R
●
R also maintains a palette of a small number of colors
that can be referenced by number. Since version 4.0.0,
there have been several choices of palettes by name:
> palette.pals()
[1] "R3" "R4" "ggplot2" "Okabe-Ito" "Accent"
[6] "Dark 2" "Paired" "Pastel 1" "Pastel 2" "Set 1"
[11] "Set 2" "Set 3" "Tableau 10" "Classic Tableau" "Polychrome 36"
[16] "Alphabet"
> palette()
[1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
"gray62"

to choose red points. Any of the standard R specifications for red would have
worked equally well: "red" , "#FF0000" , or, assuming we are using the "R3"
palette, the number 2 .

62 / 67
Specifying Colors in ggplot2
●
For the bars in a geom_col() layer, col controls the outline
color, and argument fill controls the fill color.
●
The second way to specify color in ggplot2 is to use the
col or fill aesthetic.
> ggplot(phones60, aes(Region, Telephones, fill = Region)) +
+ geom_col() +
+ scale_fill_brewer(palette = "Set2")

63 / 67
Specifying Colors in ggplot2
●
When the mapped variable is continuous, ggplot2 will
default to a gradient scale from light blue to dark blue,
produced by the scale_fill_gradient() function. For
example,
> ggplot(phones60, aes(Region, Telephones, fill = Telephones)) +
+ geom_col()

64 / 67
Customizing the Look of a Graph

●
There are several functions to change the labeling on the
graph. The ggtitle() function sets a title at the top, and xlab()
and ylab() set titles on the axes.
●
The theme() and theme_*() functions can be used to
change many details of the overall look of a graph.
●
The scale_*() functions can be used to customize the
mapping for each aesthetic.
●
The annotate() function works like a layer function, but
with fixed vectors of aesthetics, not values taken from
the data set for the plot.
65 / 67
Faceting

●
A strategy for displaying relations among three or
more variables is to divide the data into subsets
using the values of some of the variables, and then
draw multiple plots of the values of the other
variables in each of those subsets.
●
In ggplot2 this is called “faceting,” and the
facet_wrap() and facet_grid() functions are used to
implement it.

66 / 67
facet_wrap()
●
To study the trends over time in the WorldPhones data,
we first need to convert it to a data frame.
> phones <- data.frame(Year = as.numeric(rep(rownames(WorldPhones), 7)),
+ Region = rep(colnames(WorldPhones), each = 7),
+ Telephones = as.numeric(WorldPhones))
> ggplot(phones, aes(x = Region, y = Telephones, fill = Region)) +
+ geom_col() +
+ facet_wrap(vars(Year)) +
+ theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
+ xlab(element_blank())

67 / 67

06 Plots Export Plots
100% (1)
06 Plots Export Plots
17 pages
DSR - Unit 2-2.1 ExploringBasicgraphs
No ratings yet
DSR - Unit 2-2.1 ExploringBasicgraphs
51 pages
Charts and Graphs in R
No ratings yet
Charts and Graphs in R
50 pages
Unit 5 R Programming
No ratings yet
Unit 5 R Programming
43 pages
DA R Unit-4
No ratings yet
DA R Unit-4
32 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
05 Charts and Graphs in R
No ratings yet
05 Charts and Graphs in R
51 pages
02 Graphs and Chart in R-2012
No ratings yet
02 Graphs and Chart in R-2012
24 pages
Graphics Using R
No ratings yet
Graphics Using R
96 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
19 pages
R Programming Unit 3
No ratings yet
R Programming Unit 3
48 pages
R-Charts and Graphs
No ratings yet
R-Charts and Graphs
16 pages
MA304 - Lecture 4
No ratings yet
MA304 - Lecture 4
60 pages
Experiment 3
No ratings yet
Experiment 3
43 pages
IDS Unit-5
No ratings yet
IDS Unit-5
39 pages
Graphics in R
No ratings yet
Graphics in R
8 pages
Week4 2020
No ratings yet
Week4 2020
25 pages
Practical 7 Visulization
No ratings yet
Practical 7 Visulization
9 pages
Lecture 2 Data Presentation
No ratings yet
Lecture 2 Data Presentation
18 pages
R Module 10 - Data - Visualization
No ratings yet
R Module 10 - Data - Visualization
49 pages
On Eda
No ratings yet
On Eda
60 pages
R UNIT 3 STatistic N Probabilty
No ratings yet
R UNIT 3 STatistic N Probabilty
17 pages
R Module 4
No ratings yet
R Module 4
42 pages
R - Charts and Graphs
No ratings yet
R - Charts and Graphs
21 pages
Unit III - R Programming
No ratings yet
Unit III - R Programming
21 pages
R-Unit 4
No ratings yet
R-Unit 4
93 pages
Charts
No ratings yet
Charts
8 pages
R Graphics Chapter1
No ratings yet
R Graphics Chapter1
22 pages
Unit3 R
No ratings yet
Unit3 R
30 pages
P6ADBMS
No ratings yet
P6ADBMS
34 pages
R Graphics
No ratings yet
R Graphics
76 pages
Grpahs and Charts in R
No ratings yet
Grpahs and Charts in R
12 pages
Dsur Ea2352001010391 W6
No ratings yet
Dsur Ea2352001010391 W6
4 pages
Experiment # 4
No ratings yet
Experiment # 4
10 pages
Visualizing Data in R
No ratings yet
Visualizing Data in R
20 pages
Presentation 4
No ratings yet
Presentation 4
22 pages
In Line
No ratings yet
In Line
9 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
DV - Unit 2
No ratings yet
DV - Unit 2
73 pages
Exp-6 SDMA
No ratings yet
Exp-6 SDMA
7 pages
BAB 5-2 MTK Graph in R PT 2 Materi Line Plot
No ratings yet
BAB 5-2 MTK Graph in R PT 2 Materi Line Plot
9 pages
R Unit5
No ratings yet
R Unit5
12 pages
Search: Barplot (Values
No ratings yet
Search: Barplot (Values
1 page
Introduction To R Charts Graphs AN 15 09 2024
No ratings yet
Introduction To R Charts Graphs AN 15 09 2024
8 pages
Training in R For Data Statistics
No ratings yet
Training in R For Data Statistics
113 pages
Experiment No 9
No ratings yet
Experiment No 9
5 pages
Data Visualization
No ratings yet
Data Visualization
46 pages
MIT 302 - Statistical Computing II - Tutorial 04
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 04
7 pages
Mind Action Series Igrade 11 Paper 2 Memorandum
No ratings yet
Mind Action Series Igrade 11 Paper 2 Memorandum
10 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
Graphics in R
No ratings yet
Graphics in R
8 pages
Bar Chart PDF
No ratings yet
Bar Chart PDF
3 pages
R Chart Exercise
No ratings yet
R Chart Exercise
9 pages
2 R - Zajecia - 4 - Eng
No ratings yet
2 R - Zajecia - 4 - Eng
7 pages
Measures of Variability GROUPED DATA
100% (1)
Measures of Variability GROUPED DATA
13 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
Math10282 Ex03 - An R Session
No ratings yet
Math10282 Ex03 - An R Session
10 pages
Lecture 2 - R Graphics PDF
No ratings yet
Lecture 2 - R Graphics PDF
68 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
CH-3 PPT For Basic Stat (Repaired)
No ratings yet
CH-3 PPT For Basic Stat (Repaired)
43 pages
A Comprehensive Statistics Cheat Sheet For Data Science 1685659812
No ratings yet
A Comprehensive Statistics Cheat Sheet For Data Science 1685659812
39 pages
Stat 101 Mid Term 2021
No ratings yet
Stat 101 Mid Term 2021
6 pages
Sections 2.1 - 2.3: Mind On Statistics
No ratings yet
Sections 2.1 - 2.3: Mind On Statistics
22 pages
Data Exploration
No ratings yet
Data Exploration
5 pages
Statistics QP 3rd Sem 2013 To 2023
No ratings yet
Statistics QP 3rd Sem 2013 To 2023
23 pages
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
No ratings yet
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
33 pages
Python Datavisualization
No ratings yet
Python Datavisualization
69 pages
Assignment #3
100% (1)
Assignment #3
9 pages
Introduction To Statistics: Ungrouped Data
No ratings yet
Introduction To Statistics: Ungrouped Data
8 pages
Cia 1.1
No ratings yet
Cia 1.1
7 pages
Chapter 14 - Multiple Regression - 2019
No ratings yet
Chapter 14 - Multiple Regression - 2019
71 pages
Measurement of Study Variables
0% (1)
Measurement of Study Variables
12 pages
P&S UNIT-4 Sampling Theory
No ratings yet
P&S UNIT-4 Sampling Theory
9 pages
Statistical Hydrology
No ratings yet
Statistical Hydrology
39 pages
BA 328 Chap 3 and 4 Quiz
No ratings yet
BA 328 Chap 3 and 4 Quiz
42 pages
CHAPTER-3 - LESSON2 Final
No ratings yet
CHAPTER-3 - LESSON2 Final
19 pages
Intro CH 4-1
No ratings yet
Intro CH 4-1
16 pages
1 - 1 Mean Median and Mode PDF
No ratings yet
1 - 1 Mean Median and Mode PDF
7 pages
Book 111
No ratings yet
Book 111
3 pages
Chi Square Test: Case Processing Summary
No ratings yet
Chi Square Test: Case Processing Summary
4 pages
14 - Chapter 7 PDF
No ratings yet
14 - Chapter 7 PDF
39 pages
Model Tata Ruang Kantor Terhadap Efisiensi Kerja Karyawan Di Fakultas Keguruan Dan Ilmu Pendidikan Universitas Muhammadiyah Prof. Dr. Hamka
No ratings yet
Model Tata Ruang Kantor Terhadap Efisiensi Kerja Karyawan Di Fakultas Keguruan Dan Ilmu Pendidikan Universitas Muhammadiyah Prof. Dr. Hamka
23 pages
Pembahasan Bab 3 Bakso Belut
No ratings yet
Pembahasan Bab 3 Bakso Belut
7 pages
Biostatistics
No ratings yet
Biostatistics
23 pages
Box Plots PDF
No ratings yet
Box Plots PDF
4 pages
Seminar Worksheet. Measures of Dispersion 2
No ratings yet
Seminar Worksheet. Measures of Dispersion 2
5 pages
CH 7 - BKM 12e Ch07 Excel Application Two Security Model
No ratings yet
CH 7 - BKM 12e Ch07 Excel Application Two Security Model
1 page
Anti-Aliasing with MSAA vs ABAA
From Everand
Anti-Aliasing with MSAA vs ABAA
Michel A Rohner
No ratings yet
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Raster Graphics: Understanding the Foundations of Raster Graphics in Computer Vision
From Everand
Raster Graphics: Understanding the Foundations of Raster Graphics in Computer Vision
Fouad Sabry
No ratings yet

MDPN460 Lecture06

Uploaded by

MDPN460 Lecture06

Uploaded by

MDPN460 – Industrial

Programming Statistical Graphics in R

> barplot(wf60, main = "Telephone Usage in 1960", cex.names = 0.75,

> barplot(VADeaths, beside = TRUE, ylim = c(0, 90),

> dotchart(VADeaths, xlim = c(0, 75), xlab = "Deaths per 1000",

> g2 <- g1 + geom_col(alpha = 0.4, col = "#FF00FF") + geom_point(col = "red")

You might also like