0% found this document useful (0 votes)

23 views6 pages

Ggplot2 Exercise

1) The document discusses using the R package ggplot2 to create statistical plots from data. ggplot2 allows for customizing plots through its grammar of graphics approach. 2) It provides an example of a complex plot with multiple panels showing observed data points and statistical model predictions across different levels of categorical variables. 3) The document instructs readers to try recreating aspects of the example plot using the diamonds dataset to become familiar with ggplot2's capabilities and customization options.

Uploaded by

retokoller44

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views6 pages

Ggplot2 Exercise

Uploaded by

retokoller44

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Applied Statistics Bo Markussen

Statistical Methods for the Biosciences November, 2021

Graphics with the R-package “ggplot2”

ggplot2 is an R package for making statistical plots. It is created by
Hadley Wickham, and it is based on the Grammar of Graphics (Wilkinson,
2005). A grammar is part of a language, just like Danish and English, with
syntax and semantics. The purpose of a Grammar-of-Graphics is to describe
a statistical plot to the computer. After the computer knows the details of
the wanted plot it can make the plot. But it is also possible to store the
grammatical description of the plot, which then e.g. may be extended later.
Using ggplot2 is not the only way to make graphics in R. The old Base
Graphics system inside R also produce very nice (but not as nice) plots, and
sometimes I prefer to use this. And before ggplot2 became available the
lattice package was very popular (but it is not recommended any more).
For one of my publications (Asare et al., 2017) I wanted to make the following
plot:

The only way I know of making such a plot in a visually appealing layout is
via ggplot2. From a statistical point of view the plot contains two things:

1. Observed data (the points).

2. Predictions from a statistical model (the lines).

1
Applied Statistics Bo Markussen
Statistical Methods for the Biosciences November, 2021

In general I like to include such a figure in my scientific papers. It not only

presents the underlying data to the reader of paper, but it also displays the
applicability of the used statistical model as well as the biological variation.
The figure above has quite a few features:

The y-axis is transformed. More precisely using the cubic root. Al-
though the cubic root isn’t a standard transformation (like the loga-
rithm), it can still be implemented rather easily (but we will not try
this in this exercise). Note also, that the reader of the figure don’t need
to know the precise transformation in order to perceive the information
contained in the graphics.

The plot has been split into 3 × 4 = 12 panels arranged in 3 columns

and 2 × 2 = 4 rows. This has been done according to the levels of 1 and
1 + 1 = 2 categorical variables (with 3, 2 and 2 levels, respectively).

The same x- and y-axes are used in all 12 panels. This makes it possible
to compare data across the panels.

Within each panel 4 different plotting symbols and 4 different line types
are used to distinguish observed data and model predictions from 4
different regions. These symbols and line types are summarized in the
legend placed to the right of the 12 panels.

The model predictions (i.e. the lines) are only made in the range of
the observed data. And omitted if there are no observed data in the
corresponding panel.

Now it is time for you to work

Please work on the following questions in small groups of 2 to 4 persons:

1. Discuss the features of the figure on page 1. Do you agree with descrip-
tion I made above?

2. In order to try ggplot2 yourselves open RStudio and install the

ggplot2-package (if you haven’t got the package already).

3. The basic reference on ggplot2 is the book (which should be freely

available to students at KU via the link embedded in this pdf):

Hadley Wickham, “ggplot 2”, Springer, Use R! series, 2009.

2
Applied Statistics Bo Markussen
Statistical Methods for the Biosciences November, 2021

The main data example in Chapter 2 of this book is the diamonds

dataset, which contains information on the price of about 54,000 dia-
monds. We will also be using this dataset for this exercise. Execute the
following R codes in the Console window, and read about the diamonds
dataset:

> library(ggplot2)
> ?diamonds

4. The data frame diamonds is very big with almost 54,000 observations.
Sometimes it can be useful to make a random selection of the obser-
vations in order to avoid having a lot of points plotted on top of each
other. Execute the following R codes, and discuss what they do:

> mypoints <- diamonds[sample(1:nrow(diamonds),1000),]

> mypoints
> head(mypoints)

5. Let’s make our first plots using ggplot2! Try the following R codes
one-by-one (!) and discuss the relations between graphical output and
the R code. Can you see what is plotted?

> ggplot(mypoints,aes(x=carat,y=price)) + geom_point()

> ggplot(mypoints,aes(x=carat,y=price,shape=color)) + geom_point()
> ggplot(mypoints,aes(x=carat,y=price,color=color)) + geom_point()
> ggplot(mypoints,aes(x=carat,y=price,color=color,size=cut)) + geom_point()

6. Basically the ggplot()-function has two arguments:

The first argument is a data frame. This is somewhat similar to
an Excel sheet, that is, a spreadsheet like organization of data.
In our example this is the data frame that we called mypoints in
item 4 above.
The second argument is a so-called aesthetic mapping, which tells
R how the variables inside the data frame should be used. This is
defined using another R function, namely aes(x=carat,y=price).
Here we tell R that the carat value should be on the x-axis, and
the price should be on the y-axis. Ok?
The ggplot() only sets up the data/plotting situation. To plot any-
thing you should tell what to plot. In the example above this is done
by adding points. If we instead add lines, then we get something less
useful (do you agree?) in this example:

3
Applied Statistics Bo Markussen
Statistical Methods for the Biosciences November, 2021

> ggplot(mypoints,aes(x=carat,y=price,color=color)) + geom_line()

7. Above we have written (almost) the same ggplot() code 5 times. To

avoid doing this even more times, let’s define it as a variable. Please
try

> myplot <- ggplot(mypoints,aes(x=carat,y=price,color=color))

> myplot + geom_point() + aes(shape=cut)

Note, that after executing the first line the variable “myplot” appears
in the Environment window. This variable now contains the results of
the ggplot-call, and can be used instead of writing this.

8. It’s easy to subdivide the plot into several panels. Let’s try this accord-
ing to the categorical variables cut and clarity. Execute the following
R codes one-by-one and discuss the output:

> myplot + geom_point() + facet_grid(cut~clarity)

> myplot + geom_point() + facet_grid(clarity~cut)

9. To export the most recent figure, such that it can be inserted in a paper
or a report, use the ggsave() function. The following codes saves the
plot to your working directory in png and pdf format1 , respectively:

> ggsave("diamonds.png")
> ggsave("diamonds.pdf")

Try to insert the generated figure in a Word document.

10. As already hinted at several places above the output of a call to ggplot()
is not a graphical output, but a grammatical description of that out-
put. What you see on the screen is a print() of that description. One
implication of this is that you can add more “layers” to the description
before it is printed. The symbol for adding components is “+”, which
in this context shouldn’t be confused with the mathematical operation
of adding numbers. To change the axes to be logarithmic you add this
information. As above; try and think about:

> myplot + geom_point() + scale_x_log10() + scale_y_log10()

1
I recommend png format if you use Word, and pdf if you use PDFLaTeX.

4
Applied Statistics Bo Markussen
Statistical Methods for the Biosciences November, 2021

What happens if you remove either scale_x_log10() or scale_y_log10()?

I guess that you are unhappy with the appearance of the x-axis. There
are too few tick points, right? This can be fixed by hand via

> myplot + geom_point() +

scale_x_log10(breaks=seq(0.5,3,0.5)) + scale_y_log10()

11. You can also add smoothing lines and other statistical output to the
graph:

> myplot + geom_point() + geom_smooth()

Note, that a smoothing line is generated for each of the diamond colors.
The reason for this is that the separation into distinct colors is inherited
from the aes() code inside our variable “myplot”. To make a single
smoothing line for all diamonds we must turn down the inheritance,
and restate the necessary aesthetics. Thus,

> myplot + geom_point() +

geom_smooth(aes(x=carat,y=price),inherit.aes = FALSE)

Note that the smoothing method has change from loess (= local poly-
nomial regression fitting) to gam (= generalized additive model).

Can you find out why this is the case? Hint: see the help page by
executing ?geom_smooth in the R console.
Can you change back to loess-smooting? Hint: use the option
method="loess"

12. A linear relation between price and carat appears to be plausible on

the log-log scale:

> myplot + geom_point() + scale_x_log10() + scale_y_log10() +

geom_smooth(aes(x=carat,y=price), inherit.aes = FALSE, method="lm")

Now try to make a figure that contains both a non-linear smoothing

line (either loess or gam) and the linear regression line in the same plot!

13. We have written the two options “aes(x=carat,y=price)” and “inherit.aes = FALSE”
many time above. Discuss whether it would have been more clever to
define “myplot” as

5
Applied Statistics Bo Markussen
Statistical Methods for the Biosciences November, 2021

> myplot <- ggplot(mypoints,aes(x=carat,y=price))

How would this change the solution code for the above questions?

Please note that the lines inserted on the figure on page 1 were not gen-
erated automatically by ggplot2. Instead I used predictions from a linear
mixed effects model fitted via the R-package lme4. In order to do that I
made another data frame with the model predictions, and used this new data
frame together with the geom_line() function. I hope to be able to give an
example of this technique later in the course.
If you want to read more about ggplot2, then you might start at the
homepage:

http://www.r-bloggers.com/basic-introduction-to-ggplot2/

Or perhaps even better read Chapter 3 in the book:

http://r4ds.had.co.nz/

End of exercise

Financial Accounting If Rs Principles 5 e 2019
50% (4)
Financial Accounting If Rs Principles 5 e 2019
2 pages
Alboukadel Kassambara - Ggplot2: The Elements For Elegant Data Visualization in R
80% (15)
Alboukadel Kassambara - Ggplot2: The Elements For Elegant Data Visualization in R
311 pages
TOP DARKWEB MARKET LINKS With .Onion Deep Web Directory-1
0% (1)
TOP DARKWEB MARKET LINKS With .Onion Deep Web Directory-1
6 pages
Mitutoyo SJ 201P - Manual RUGOSIMETRO
100% (1)
Mitutoyo SJ 201P - Manual RUGOSIMETRO
187 pages
Guide To Create: Beautiful Graphics in R
No ratings yet
Guide To Create: Beautiful Graphics in R
48 pages
Beautiful Graphics in R
No ratings yet
Beautiful Graphics in R
238 pages
Figures With GGPlot
No ratings yet
Figures With GGPlot
58 pages
Plotting With Ggplot: Install - Packages ("Ggplot2") Library (Ggplot2)
No ratings yet
Plotting With Ggplot: Install - Packages ("Ggplot2") Library (Ggplot2)
3 pages
PRACTICUM, Day 1: R Graphing: Basic Plotting and Ggplot2: CRG Bioinformatics Unit, Sarah - Bonnin@crg - Eu May 6th, 2016
No ratings yet
PRACTICUM, Day 1: R Graphing: Basic Plotting and Ggplot2: CRG Bioinformatics Unit, Sarah - Bonnin@crg - Eu May 6th, 2016
52 pages
Lecture 2 - R Graphics PDF
No ratings yet
Lecture 2 - R Graphics PDF
68 pages
Lab02
No ratings yet
Lab02
28 pages
2 R - Zajecia - 4 - Eng
No ratings yet
2 R - Zajecia - 4 - Eng
7 pages
R Ggplot2 Package
No ratings yet
R Ggplot2 Package
21 pages
Ggplot 2: Elegant Graphics For Data Analysis. Second Edition.
No ratings yet
Ggplot 2: Elegant Graphics For Data Analysis. Second Edition.
277 pages
(Use R!) Keon-Woong Moon - Learn Ggplot2 Using Shiny App (2017, Springer) PDF
100% (3)
(Use R!) Keon-Woong Moon - Learn Ggplot2 Using Shiny App (2017, Springer) PDF
356 pages
Using Ggplot2 For Plots in R
No ratings yet
Using Ggplot2 For Plots in R
8 pages
Week4 2020
No ratings yet
Week4 2020
25 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
combined-8-15
No ratings yet
combined-8-15
8 pages
Introduction To Ggplot2: Saier (Vivien) Ye September 16, 2013
No ratings yet
Introduction To Ggplot2: Saier (Vivien) Ye September 16, 2013
32 pages
Ggplot2 Book PDF
100% (2)
Ggplot2 Book PDF
281 pages
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
No ratings yet
Ggplot2 Elegant Graphics For Data Analysis (2016, Springer) PDF
281 pages
ProgrammingForDS15_dataviz (1)
No ratings yet
ProgrammingForDS15_dataviz (1)
40 pages
Lab01 Note R
No ratings yet
Lab01 Note R
7 pages
Ggplot2 - Easy Way To Mix Multiple Graphs On The Same Page - Articles - STHDA
No ratings yet
Ggplot2 - Easy Way To Mix Multiple Graphs On The Same Page - Articles - STHDA
54 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson All Chapters Instant Download
100% (1)
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson All Chapters Instant Download
40 pages
Graphics Chapter
No ratings yet
Graphics Chapter
49 pages
Module 4-1
No ratings yet
Module 4-1
84 pages
Unit 4- r Programming
No ratings yet
Unit 4- r Programming
26 pages
Unit 3Data Visualization With Ggplot2
No ratings yet
Unit 3Data Visualization With Ggplot2
19 pages
Advanced R Programming GGPLOT2 Notes
No ratings yet
Advanced R Programming GGPLOT2 Notes
8 pages
MIT 302 - Statistical Computing II - Tutorial 04
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 04
7 pages
5th Report
No ratings yet
5th Report
23 pages
Exercise 1
No ratings yet
Exercise 1
5 pages
ggplot2_1743413576
No ratings yet
ggplot2_1743413576
9 pages
Actex Pa Sample
No ratings yet
Actex Pa Sample
12 pages
R Graphics Essentials For Great Data Visualization
No ratings yet
R Graphics Essentials For Great Data Visualization
28 pages
How To Make Any Plot in Ggplot2?: Topics
No ratings yet
How To Make Any Plot in Ggplot2?: Topics
18 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Ggplot2 Essentials - Sample Chapter
No ratings yet
Ggplot2 Essentials - Sample Chapter
52 pages
11 Data Visualization
No ratings yet
11 Data Visualization
44 pages
DS-R Block 4 All
No ratings yet
DS-R Block 4 All
50 pages
Ezplot Sample
No ratings yet
Ezplot Sample
15 pages
R Workshop
No ratings yet
R Workshop
47 pages
Statistical Analysis With R - A Quick Start
100% (1)
Statistical Analysis With R - A Quick Start
47 pages
Ggplot2 Slides
No ratings yet
Ggplot2 Slides
82 pages
Modelling With R
No ratings yet
Modelling With R
3 pages
pdf copy
No ratings yet
pdf copy
19 pages
Lecture 2 Data Presentation
No ratings yet
Lecture 2 Data Presentation
18 pages
Geom - Histogram Ggplot2 Geom - Histogram : # Library
No ratings yet
Geom - Histogram Ggplot2 Geom - Histogram : # Library
9 pages
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson download
No ratings yet
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson download
40 pages
Exercise-9..Study and Implementation of Data Visulization With Ggplot
No ratings yet
Exercise-9..Study and Implementation of Data Visulization With Ggplot
1 page
DSR_Unit 2 -3.3 LineGraphs
No ratings yet
DSR_Unit 2 -3.3 LineGraphs
45 pages
r-graphics-essentials-great-data-visualization
No ratings yet
r-graphics-essentials-great-data-visualization
248 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
Charts & Diagrams Primer
From Everand
Charts & Diagrams Primer
Beam Vanwaardenberg
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Study On The Analysis of Near-Miss Ship Collisions Using Logistic Regression
No ratings yet
Study On The Analysis of Near-Miss Ship Collisions Using Logistic Regression
7 pages
SQL Server Till Basic Group by
No ratings yet
SQL Server Till Basic Group by
31 pages
Car-Net Activation App EN
No ratings yet
Car-Net Activation App EN
5 pages
Adhoc Important Question Paper
No ratings yet
Adhoc Important Question Paper
12 pages
Spinning Jenny
No ratings yet
Spinning Jenny
10 pages
b 990 Embroidery Workbook.v1.08.01.24
No ratings yet
b 990 Embroidery Workbook.v1.08.01.24
42 pages
Screenshot 2024-04-29 at 11.56.57 AM
No ratings yet
Screenshot 2024-04-29 at 11.56.57 AM
2 pages
7- Ôn tập giữa học kì I Review 1 - Test 2
No ratings yet
7- Ôn tập giữa học kì I Review 1 - Test 2
4 pages
Linux Shell Programs-Teachics
No ratings yet
Linux Shell Programs-Teachics
13 pages
Chapter2-2 - Types of DC Motor
100% (1)
Chapter2-2 - Types of DC Motor
38 pages
7012
No ratings yet
7012
2 pages
Astbury 2007
100% (1)
Astbury 2007
16 pages
A733 Datasheet PDF
No ratings yet
A733 Datasheet PDF
2 pages
Effective: Study Guide
No ratings yet
Effective: Study Guide
5 pages
Memo KD .3.9
No ratings yet
Memo KD .3.9
7 pages
An Introduction To Fouling in Fired Heaters
No ratings yet
An Introduction To Fouling in Fired Heaters
4 pages
Datasheet
No ratings yet
Datasheet
28 pages
RD730 Product Brochure-English
No ratings yet
RD730 Product Brochure-English
2 pages
Ecs Review 2
No ratings yet
Ecs Review 2
8 pages
Raymarine ST70 ServiceManual
No ratings yet
Raymarine ST70 ServiceManual
38 pages
Vishal H: Associate Consultant Senior Test Engineer Roles and Responsibilities
No ratings yet
Vishal H: Associate Consultant Senior Test Engineer Roles and Responsibilities
5 pages
Project Proposal Template
No ratings yet
Project Proposal Template
12 pages
Squeeze-Tube-Options-Aurora-V-Line-Atlas (1)
No ratings yet
Squeeze-Tube-Options-Aurora-V-Line-Atlas (1)
2 pages
Cam Analysis
No ratings yet
Cam Analysis
3 pages
Construction History Types Examples Facts Britannica
No ratings yet
Construction History Types Examples Facts Britannica
8 pages
Six Sigma Handbook - Google Search
0% (1)
Six Sigma Handbook - Google Search
1 page
EVALUATION 2 Solutions
100% (1)
EVALUATION 2 Solutions
17 pages

Ggplot2 Exercise

Uploaded by

Ggplot2 Exercise

Uploaded by

Applied Statistics Bo Markussen

Statistical Methods for the Biosciences November, 2021

Graphics with the R-package “ggplot2”

1. Observed data (the points).

2. Predictions from a statistical model (the lines).

In general I like to include such a figure in my scientific papers. It not only

 The plot has been split into 3 × 4 = 12 panels arranged in 3 columns

Now it is time for you to work

2. In order to try ggplot2 yourselves open RStudio and install the

3. The basic reference on ggplot2 is the book (which should be freely

Hadley Wickham, “ggplot 2”, Springer, Use R! series, 2009.

The main data example in Chapter 2 of this book is the diamonds

> mypoints <- diamonds[sample(1:nrow(diamonds),1000),]

> ggplot(mypoints,aes(x=carat,y=price)) + geom_point()

6. Basically the ggplot()-function has two arguments:

> ggplot(mypoints,aes(x=carat,y=price,color=color)) + geom_line()

7. Above we have written (almost) the same ggplot() code 5 times. To

> myplot <- ggplot(mypoints,aes(x=carat,y=price,color=color))

> myplot + geom_point() + facet_grid(cut~clarity)

Try to insert the generated figure in a Word document.

> myplot + geom_point() + scale_x_log10() + scale_y_log10()

What happens if you remove either scale_x_log10() or scale_y_log10()?

> myplot + geom_point() +

> myplot + geom_point() + geom_smooth()

> myplot + geom_point() +

12. A linear relation between price and carat appears to be plausible on

> myplot + geom_point() + scale_x_log10() + scale_y_log10() +

Now try to make a figure that contains both a non-linear smoothing

> myplot <- ggplot(mypoints,aes(x=carat,y=price))

Or perhaps even better read Chapter 3 in the book:

You might also like

The plot has been split into 3 × 4 = 12 panels arranged in 3 columns