0% found this document useful (0 votes)
15 views10 pages

Tutorials

Uploaded by

Guta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Tutorials

Uploaded by

Guta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

TUTORIALS

13JUN 2021

ONE-WAY ANOVA AND BOX PLOT IN R

ANOVA, ggplot, compact letter display,


Tukey’s test
 Loading and checking the data
 Analysis of variance for one factor – One-Way ANOVA
 Tukey’s test
 Compact letter display to indicate significant differences
 Creating a table with the summarised data and the compact letter
display
 Basic boxplot
 Customizing x and y titles
 Formating the overall visualisation
 Adding compact letter display from Tukey’s test
 Adding colours to the boxplots
 Boxplots coloured according to the factor (explanatory variable)
 Boxplots coloured according to the median
 Saving the final figure

In this R tutorial, you are going to learn how to:

 perform analysis of variance and Tukey’s test


 obtain the compact letter display to indicate significant
differences
 build a boxplot with the results
 add the compact letter display to the boxplot
 customize the boxplot colours
 colour the boxes according to the median value.

We are going to use the results of a one-factor experiment conducted


to measure and compare the effectiveness of various feed supplements
on the growth rate of chickens.1 The data file (chickwts) is available in
the R datasets library.
We are going to start by loading the appropriate libraries,
the datasets to access the data file, the ggplot2 for the
plots, multcompView to obtain the compact letter display, and
the dplyr for building a table with the summarized data.

# loading the appropriate libraries


library(datasets)
library(ggplot2)
library(multcompView)
library(dplyr)

Loading and checking the data


The first step of the analysis is to load the data file. We will use
the str function to check the file structure.

# loading and checking the data


str(chickwts)
## 'data.frame': 71 obs. of 2 variables:
## $ weight: num 179 160 136 227 217 168 108 124 143 140 ...
## $ feed : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2
2 ...
The data file presents one column with the response variable (weight)
and another column for the studied factor (feed). The file structure
shows that we have 71 observations of 2 variables, weight is numeric,
and feed is a factor.

Analysis of variance for one factor –


One-Way ANOVA
The next step is to perform the analysis of variance, mostly known as
ANOVA, using the aov function. The arguments are the response
variable “weight” as a function of the explanatory variable “feed”. The
result will be stored in an object called “anova”, and to visualise it, we
need to run the function summary.

# analysis of variance
anova <- aov(weight ~ feed, data = chickwts)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## feed 5 231129 46226 15.37 5.94e-10 ***
## Residuals 65 195556 3009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Tukey’s test
The means comparison by Tukey’s test can be run on the object
resulting from the analysis of variance (anova). The result (below) is an
extensive table with all pairwise comparisons and the p-value for each
one of them. This data can be tricky to interpret and it is usual to use
letters to indicate significant differences among the means.

# Tukey's test
tukey <- TukeyHSD(anova)
print(tukey)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ feed, data = chickwts)
##
## $feed
## diff lwr upr p adj
## horsebean-casein -163.383333 -232.346876 -94.41979 0.0000000
## linseed-casein -104.833333 -170.587491 -39.07918 0.0002100
## meatmeal-casein -46.674242 -113.906207 20.55772 0.3324584
## soybean-casein -77.154762 -140.517054 -13.79247 0.0083653
## sunflower-casein 5.333333 -60.420825 71.08749 0.9998902
## linseed-horsebean 58.550000 -10.413543 127.51354 0.1413329
## meatmeal-horsebean 116.709091 46.335105 187.08308 0.0001062
## soybean-horsebean 86.228571 19.541684 152.91546 0.0042167
## sunflower-horsebean 168.716667 99.753124 237.68021 0.0000000
## meatmeal-linseed 58.159091 -9.072873 125.39106 0.1276965
## soybean-linseed 27.678571 -35.683721 91.04086 0.7932853
## sunflower-linseed 110.166667 44.412509 175.92082 0.0000884
## soybean-meatmeal -30.480519 -95.375109 34.41407 0.7391356
## sunflower-meatmeal 52.007576 -15.224388 119.23954 0.2206962
## sunflower-soybean 82.488095 19.125803 145.85039 0.0038845
Compact letter display to indicate
significant differences
The use of letters to indicate significant differences in pairwise
comparisons is called compact letter display, and can simplify the
visualisation and discussion of significant differences among means.
We are going to use the multcompLetters4 function from
the multcompView package. The arguments are the object from
an aov function and the object from the TukeyHSD function.

# compact letter display


cld <- multcompLetters4(anova, tukey)
print(cld)
## $feed
## sunflower casein meatmeal soybean linseed horsebean
## "a" "a" "ab" "b" "bc" "c"

Creating a table with the


summarised data and the compact
letter display
We are going to build a table with the mean, the third quantile and the
letters for each treatment (feed). The information on this table will be
used to add the letters indicating significant differences to the boxplot.
As the compact letter display was generated with the means arranje in
drecreasing order, we will also build the table with the means in
decreasing order.

# table with factors and 3rd quantile


Tk <- group_by(chickwts, feed) %>%
summarise(mean=mean(weight), quant = quantile(weight, probs =
0.75)) %>%
arrange(desc(mean))

# extracting the compact letter display and adding to the Tk


table
cld <- as.data.frame.list(cld$feed)
Tk$cld <- cld$Letters

print(Tk)
## # A tibble: 6 x 4
## feed mean quant cld
## <fct> <dbl> <dbl> <chr>
## 1 sunflower 329. 340. a
## 2 casein 324. 371. a
## 3 meatmeal 277. 320 ab
## 4 soybean 246. 270 b
## 5 linseed 219. 258. bc
## 6 horsebean 160. 176. c

Basic boxplot
We are going to use the function ggplot to build the boxplots. The first
argument is the data file, chickwts, and the second argument is the
aesthetics aes, where we define the x and y variables, feed and weight.
However, if we run only this code, we will have a blank plot. We also
need to define the geom, and is this case, geom_boxplot() for the
boxplot.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot()
Customizing x and y titles
Let’s now customise the x and y titles using the function ‘labs’.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot() +
labs(x="Feed Type", y="Weight (g)")

The plot could be used as it is, but there is still some space for
improvement.

Formating the overall visualisation


The next step is to change the overall theme of the plot. I have chosen
the theme_bw. Additionally, I will delete the major and minor gridlines, as
they are normally not used in scientific plots.
# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot() +
labs(x="Feed Type", y="Weight (g)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank())

Adding compact letter display from


Tukey’s test
And finally, we can add the compact letter display to the plot using
the geom_text function. The label is the column cdl in the Tk file. The
letters’ position is defined as the feed in the x-axis, and the third
quantile (quant) as the y value.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot() +
labs(x="Feed Type", y="Weight (g)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank()) +
geom_text(data = Tk, aes(x = feed, y = quant, label = cld))

As we can see, the labels of the Tukey’s test were centralized on the
third quantile. To relocate them up and to the right we are going to
define their position in relation to this point using the
arguments hjust and vjust. I am also going to decrease the size of the
letters.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot() +
labs(x="Feed Type", y="Weight (g)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank()) +
geom_text(data = Tk, aes(x = feed, y = quant, label = cld),
size = 3, vjust=-1, hjust =-1)

The result is a grey-scale boxplot suitable to be used in scientific


reports and presentations.

Adding colours to the boxplots


To create a more attractive boxplot, we can add some colours. In the
next example, I have defined fill = "lightblue" and color =
"darkblue" for geom_boxplot and geom_text functions.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot(fill = "lightblue", color = "darkblue") +
labs(x="Feed Type", y="Weight (g)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank()) +
geom_text(data = Tk, aes(x = feed, y = quant, label = cld),
size = 3, vjust=-1, hjust =-1, color = "darkblue")

Boxplots coloured according to the


factor (explanatory variable)
Another alternative is to colour the boxplots according to the factor
“feed”. To do it, we need to define fill = feed in the aesthetics of
the geom_boxplot() function.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot(aes(fill = feed)) +
labs(x="Feed Type", y="Weight (g)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank()) +
geom_text(data = Tk, aes(x = feed, y = quant, label = cld),
size = 3, vjust=-1, hjust =-1)
As the legend is not necessary in this case, we can hide it using the
code show.legend = FALSE in the geom_boxplot() arguments, and change
the colours to a more interesting palette using
the scale_fill_brewer() function. I have chosen the qualitative palette
“Pastel1” for this example.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot(aes(fill = feed), show.legend = FALSE) +
labs(x="Feed Type", y="Weight (g)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank()) +
geom_text(data = Tk, aes(x = feed, y = quant, label = cld),
size = 3, vjust=-1, hjust =-1) +
scale_fill_brewer(palette = "Pastel1")

Boxplots coloured according to the


median
Another very interesting alternative is to colour the boxplots according
to the median value; we need to define fill = factor(..middle..)) in the
aesthetics of the geom_boxplot() function. The
argument ..middle.. returns the median value in a boxplot. I have
chosen the sequential palette “PuBuGn” for this example.

# boxplot
ggplot(chickwts, aes(feed, weight)) +
geom_boxplot(aes(fill = factor(..middle..)), show.legend =
FALSE) +
labs(x="Feed Type", y="Weight (g)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank()) +
geom_text(data = Tk, aes(x = feed, y = quant, label = cld),
size = 3, vjust=-1, hjust =-1) +
scale_fill_brewer(palette = "Blues")
In the resulting plot, the higher the median, the darker the colour hue in
the boxplot.

Saving the final figure


The final look of a specific ggplot object depends on the size and
aspect ratio used. The plots shown in this tutorial were built for a figure
size 4×3 inches (width x height). I suggest saving the final plot as a png
file with 1000 dpi resolution as shown in the code below.

# saving the final figure


ggsave("boxplot.png", width = 4, height = 3, dpi = 1000)

1. Source: Anonymous (1948) Biometrika, 35, 214. Reference:


McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley.↩︎

One-Way ANOVA Step-by-Step – R tutorial


One-Way ANOVA and Box Plot in R data analysis, data visualisation, ggplot2, R
(statdoe.com)
r - How to add ANOVA one-way p-value to ggplot with ggsignif - Stack Overflow
ANOVA in R | A Complete Step-by-Step Guide with Examples (scribbr.com)s
The Boxplot and its pitfalls (data-to-viz.com)

You might also like