0% found this document useful (0 votes)

55 views23 pages

Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes

The document provides information and exercises for analyzing data from an experiment on maize planting densities. It includes: 1) Data from an experiment with three planting density treatments and the results of an ANOVA showing a significant effect of density. 2) Instructions to calculate least significant differences and identify which treatment pair means are significantly different. 3) An explanation of why testing assumptions using residuals is better than using all observations, as the residuals remove treatment effects. 4) A third exercise using diatom diversity data to test assumptions with residual diagnostics and identify significant differences with LSDs.

Uploaded by

Abery Au

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views23 pages

Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes

Uploaded by

Abery Au

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Solutions - Lab 4 - Assumptions & Multiple

Comparisons
Learning outcomes

At the end of this Lab students should be able to:

• test the assumptions of ANOVA using residual diagnostics;
• use least significance difference (LSD) tests to determine which pairs of groups are significantly different;
• use R to perform the analyses.
All of the data for this practical is in the Data4.xlsx file.

Exercise 1 - Finding pairs of groups that are significantly different

This is the same data from Exercise 2 - Lab 3.

Your experiment involved growing maize as a fodder crop, where three levels of planting density were ex-
amined, namely 20, 30 and 40 plants per unit area. Each density was trialled on five different plots; all 15
plots were considered to be similar in most respects. The sample means and standard deviations (kg of dry
matter/plot) for each planting density were as follows:

Treatment Mean Std.Dev Variance

20 plants/unit area 17.58 2.7 7.29
30 plants/unit area 27.18 1.89 3.577
40 plants/unit area 27.14 2.02 4.093
Overall 23.97 5.11 NA

An analysis of variance was undertaken to determine if the density of planting influenced the total dry weight
of maize for the plot. The results are shown below.

Source_Variation Degree_Freedom Sum_Squares Mean_Square F_Statistic P_Value

Planting_Density 2 305.925 152.96 30.67 <0.01
Residual 12 59.84 4.99 NA NA
Total 14 365.765 NA NA NA

Calculate the standard error for the difference between any two treatment means;
√
SED = Residual M S × 2r ,

and obtain the upper 2.5% critical t-value (tcrit = t0.025

res df ) from the Statistical Tables for ENVX2001.pdf
on the eLearning site.
Then calculate the least significant difference at the 5% level:
LSD (0.05) = tcrit × SED.
√ √
SED = Residual M S × 2r = 4.9867 × 2
5 = 1.412kg

tcrit = t0.025
12 = 2.179

1
LSD = 2.179 × 1.412 = 3.077kg
Determine which pairs of means are significantly different from each other.
Comparisons: P20 vs P30 |17.58 – 27.18| = 9.6 significant (since abs mean diff > LSD) P20 vs P40 |17.58 –
27.14| = 9.56 significant (since abs mean diff > LSD) P30 vs P40 |27.18 – 27.14| = 0.04 not significant (since
abs mean diff < LSD) So P20 has a significantly lower mean yield than both P30 and P40, but there is no
significant difference between P30 and P40 (P > 0.05).

Exercise 2 - Why using residuals to assess assumptions is best

In this Topic you are encouraged to test the assumptions using the residuals. This exercise illustrates why
it is not ideal to test the normality assumption using all of the observations irrespective of the treatments.
First we will create 2 synthetic datasets which we sample 50 times (n=50) from a normally distributed
population. Both underlying populations have the same variation (sd=3) but have a different mean (mean=10,
mean=40). We then plot the histograms for each individually, both groups combined and the combined
residuals (observation minus group mean).
par(mfrow=c(2,2))
group1<-rnorm(n=50,mean=10,sd=3)
group2<-rnorm(n=50,mean=40,sd=3)
hist(group1,main="A: Group1",xlab="")
hist(group2,main="B: Group2",xlab="")
hist(c(group1,group2),main="C: Group1&2",xlab="")
hist(c(group1-mean(group1),group2-mean(group2)),main="D: Residuals Group1&2",xlab="")

A: Group1 B: Group2
15
Frequency

Frequency
10

5
0

4 6 8 10 12 14 16 18 35 40 45

C: Group1&2 D: Residuals Group1&2

25
Frequency

Frequency
10

0 10
0

0 10 20 30 40 50 −5 0 5

2
We can see that histogram of each group is normally distributed (A,B), however when we combine the data
we have 2 distinct groupings centered on the mean of each group (C). Therefore, if we look at the raw data
irrespective of the groups we would not see a normally distributed dataset. This is because the effect
of individual treatments (or groups) is different so each observation is perturbed according to the treatment
it receives or group it is in. If we examine the residuals (D), the treatment (or group) effects have been
removed and we can then test if the data is normal or has constant variance. It requires fitting of a model
to the data, in this case a 1-way ANOVA model. This is why we test the assumptions on the residuals. You
could look at the distribution of each group separately but then for some experiments the replication is small
so it is hard to assess normality, using residuals allows all of the observations to be pooled together.

Exercise 3 - Diatoms in streams

This exercise is from Exercise 1 in Practcial 3. Here we will test the assumptions using residual diagnostics
and finding significant differences using LSDs. The data is found in the Diatoms worksheet.

Importing and processing data, then fitting an ANOVA model

library(readxl)

## Warning: package 'readxl' was built under R version 3.5.1

diatoms<-read_excel("Data4.xlsx",sheet="Diatoms")
diatoms$Zinc<-as.factor(diatoms$Zinc)
str(diatoms)

## Classes 'tbl_df', 'tbl' and 'data.frame': 35 obs. of 5 variables:

## $ Stream : chr "Eagle" "Blue" "Blue" "Blue" ...
## $ Zinc : Factor w/ 4 levels "BACK","HIGH",..: 1 1 1 1 1 1 1 1 3 3 ...
## $ Diversity: num 2.27 1.7 2.05 1.98 2.2 1.53 0.76 1.89 1.4 2.18 ...
## $ Group : num 1 1 1 1 1 1 1 1 2 2 ...
## $ X__1 : chr NA NA NA NA ...
anova.diatoms<-aov(Diversity~Zinc,data=diatoms)
summary(anova.diatoms)

## Df Sum Sq Mean Sq F value Pr(>F)

## Zinc 3 2.567 0.8555 3.939 0.0176 *
## Residuals 30 6.516 0.2172
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness

Testing assumptions

To test the model assumptions we encourage you to produce 3 figures;

• histogram of the residuals;
• a QQ plot of the residuals;
• plot residuals against fitted values.
It is good practice to base this on the standardised residuals which can be extracted from a model object
using the rstandard function. Standardised residuals are ∼ N (0, 1), and make it easier to interpret the

3
plots for outliers. Based on the normal distribution 95% of observations fall within +/- 2SD’s of the mean
or in the case of standardised residuals +/- 2.
The figure below presents the histogram of the standardised residuals. The majority of the observations plot
as a bell-shaped (normal) distribution. The exception are 2 observations less than -2. Given there are 34
observations this is out about 6% of the dataset so acceptable given we expect 95% observations to be in
the interval of ~[-2, 2].
hist(rstandard(anova.diatoms))

Histogram of rstandard(anova.diatoms)
7
6
5
Frequency

4
3
2
1
0

−2 −1 0 1 2

rstandard(anova.diatoms)

The QQ plot belows shows that the observed quantiles match the theoretical quantiles (assuming normality)
based on the observations reasonably following the 1:1 line. We can assume the data is normally distributed.
qqnorm(rstandard(anova.diatoms))
abline(0,1)

4
Normal Q−Q Plot
1
Sample Quantiles

0
−1
−2

−2 −1 0 1 2

Theoretical Quantiles

The plot below shows the standardised residuals plotted against the fitted values (the group means in this
case). To test the assumption of constant variance we want to have the same spread of observations for
increases in the fitted values. This is the case here. We don’t want to see fanning where the spread of
residuals increases or decreases while the fitted values increasing.
plot(fitted(anova.diatoms),rstandard(anova.diatoms))

5
rstandard(anova.diatoms)

1
0
−1
−2

1.4 1.6 1.8 2.0

fitted(anova.diatoms)

A different approach to test the assumption of constant variance

Statistics is made up of different tribes and some tribes use hypothesis testing to see if a dataset meets the
assumptions of normality and constant variance. One option is the Bartlett’s test for constant variances.
The mechanics are not important but the function and syntax are shown below. The hypotheses are:
2 2 2 2
• H0 : σBACK = σLOW = σM ED = σHIGH

• H1 : not all σi2 are equal (i = BACK, LOW, M ED, HIGH)

We prefer to use numerical and graphical diagnostics, e.g. residuals plots, but this is more to show you other
possibilities. You can use this as a different line of evidence for testing assumptions if you wish. It won’t
work if the data is non-normal and only use it if the data has one treatment factor with a
completely randomised design!
bartlett.test(Diversity ~ Zinc, data = diatoms)

##
## Bartlett test of homogeneity of variances
##
## data: Diversity by Zinc
## Bartlett's K-squared = 0.25294, df = 3, p-value = 0.9686
Based on the P-value being > 0.05 we could state that we retain the null hypothesis and that the variances
are equal.

6
Identify significant differences

In Topic 3 we used the lsmeans package to extract means for each group and their associated 95% CI. The
lsmeans is useful to produce a plot showing the mean and 95% CI which is a nice way to present the results.
#install.packages("lsmeans",repos="http://cran.csiro.au/")
#library(lsmeans)
#lsmeans(anova.diatoms, "Zinc")
#plot(lsmeans(anova.diatoms, "Zinc"))

Based on the non-overlapping confidence intervals the only pairs of groups that are significantly different
are HIGH and LOW. However more correctly we are looking at whether the difference in means = 0 which is a
slightly different question to seeing if the 95% CI around the mean overlaps. Looking at the 95% CI around
the means is a conservative test in that it will under-estimate the amount of times a significant difference
occurs. Therefore, the better approach is to use a least significant difference test which we can extract
using the agricolae package.
library(agricolae)
LSD.test(anova.diatoms,"Zinc",console=T)

##
## Study: anova.diatoms ~ "Zinc"
##
## LSD t Test for Diversity
##
## Mean Square Error: 0.2172137
##
## Zinc, means and individual ( 95 %) CI
##
## Diversity std r LCL UCL Min Max
## BACK 1.797500 0.4852613 8 1.4609789 2.134021 0.76 2.27
## HIGH 1.277778 0.4268717 9 0.9605026 1.595053 0.63 1.90
## LOW 2.032500 0.4449960 8 1.6959789 2.369021 1.40 2.83
## MED 1.717778 0.5030104 9 1.4005026 2.035053 0.80 2.19
##
## Alpha: 0.05 ; DF Error: 30
## Critical Value of t: 2.042272
##
## Groups according to probability of means differences and alpha level( 0.05 )
##
## Treatments with the same letter are not significantly different.
##
## Diversity groups
## LOW 2.032500 a
## BACK 1.797500 a
## MED 1.717778 ab
## HIGH 1.277778 b
The LSD.test function also gives the 95% CI around the mean but the last output identifes which pairs of
the means are significantly different. You will note the following pairs are different:
• ‘LOW’ and ‘HIGH’;
• ‘BACK’ and ‘HIGH’;
We have one more pair being significant compared to looking at the 95% CI. Make sure you can
interpret the letter notation for identifying pairs of groups with signficantly different means.

7
Exercise 4 - Mean comparisons, residual diagnostics and back-transformations

In this exercise will add a layer of complexity by considering a transformation. If our data does not meet
the assumptions we need to transform the data, possible transformations are the square root (weak) and log
(high). When we transform the data we need to be careful about how we interpret the results.
Concentration of prolactin (units g/L) in the pituitary glands of nine-spined stickleback fish was assessed.
The fish were kept in either saltwater or freshwater prior to assay and were different batches were examined
on three successive occasions. Cysts tend to develop in fish when kept in saltwater and sometimes develop in
freshwater populations. The four different groups of fish were used in a preliminary experiment to examine
the effects of cysts, whether induced by saltwater or normally present, on the prolactin production of the
pituitary gland.
The four groups of fish were codes as follows, with 10 fish per group:
• A = saltwater cysts, day 1;
• B = freshwater, no cysts, day 2;
• C = freshwater, no cysts, day 2;
• D = freshwater, cysts, day 3.
The data is found in the Prolactin worksheet.
(i) Import the data into R, perform some exploratory data analysis to make tentative suggestions about
differences between means and the likelihood of the data meeting the assumptions.
library(readxl)
fish<-read_excel("Data4.xlsx",sheet="Prolactin")
fish$Treatment <- as.factor(fish$Treatment)
str(fish)

## Classes 'tbl_df', 'tbl' and 'data.frame': 40 obs. of 2 variables:

## $ Treatment: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...
## $ Prolactin: num 15.1 25.6 5.6 11.4 35.7 17.2 13.3 21.1 11.6 12.3 ...
First we create some boxplots for each group which show there are difference in median between each of the
treatments. There looks like some treatment effect. In terms of the assumptions the spread of data in each
treatment looks different based on the size of the boxes and length of the whiskers. However for each group
the upper and lower whisker lengths are similiar so the distribution is likely to be symmetrical (normal).
boxplot(Prolactin ~ Treatment, ylab = "Prolactin concentration", data = fish)

8
100
Prolactin concentration

80
60
40
20

A B C D

Next we generate summary statistics. The variances are very different (ratio of largest: smallest > 4:1),
therefore the assumption of constant variance is unlikley to be met. The mean and median are similar for
each treatment so the normality assumption could be met.
aggregate(Prolactin ~ Treatment, mean, data = fish)

## Treatment Prolactin
## 1 A 16.89
## 2 B 28.22
## 3 C 52.27
## 4 D 60.73
aggregate(Prolactin ~ Treatment, median, data = fish)

## Treatment Prolactin
## 1 A 14.20
## 2 B 27.35
## 3 C 49.60
## 4 D 56.20
aggregate(Prolactin ~ Treatment, sd, data = fish)

## Treatment Prolactin
## 1 A 8.629723
## 2 B 10.786700
## 3 C 26.165628
## 4 D 23.211302
(ii) Fit an ANOVA model and test the assumption of normality using a QQ plot and a histogram - both

9
based on standardised residuals.
The QQ plot and the histogram indicate the data is normally distributed.
pro.aov <- aov(Prolactin ~ Treatment, data = fish)
qqnorm(rstandard(pro.aov))
abline(0,1)

Normal Q−Q Plot

2
Sample Quantiles

1
0
−1

−2 −1 0 1 2

Theoretical Quantiles

hist(rstandard(pro.aov))

10
Histogram of rstandard(pro.aov)
10
8
Frequency

6
4
2
0

−2 −1 0 1 2 3

rstandard(pro.aov)

(iii) Assess the assumption of constant variance by:

• examine the plot of the standardised residuals against fitted values;
This plots shows an increaseing spead (fanning) of the residuals as the fitted values increase. The assumption
of constant variance is not tenable.
plot(fitted(pro.aov),rstandard(pro.aov))

11
2
rstandard(pro.aov)

1
0
−1

20 30 40 50 60

fitted(pro.aov)

• using the Bartlett’s test;

The P-value is less than 0.05 so we reject the null hypothesis. The variances are not equal.
bartlett.test(Prolactin ~ Treatment, data = fish)

##
## Bartlett test of homogeneity of variances
##
## data: Prolactin by Treatment
## Bartlett's K-squared = 13.651, df = 3, p-value = 0.003421
• calculating the ratio of the larges SD:smallest SD to see if it is below 2:1;
The ratio is 3.03 so further evidence of the variances being unequal.
out<-tapply(fish$Prolactin,fish$Treatment,sd)
out

## A B C D
## 8.629723 10.786700 26.165628 23.211302
out[3]/out[1]

## C
## 3.032036
(iv) The data does not meet the assumptions so log transform (‘log’ function) the response and repeat (ii)
and (iii) to test the assumptions;
In R you can transform data in the model formula see below or you could create a new column in your data

12
frame, for example fish$logProlactin<-log(fish$Prolactin). The log transformation has not changed
the distribution dramatically, it is still normally distributed.
pro.aov <- aov(log(Prolactin) ~ Treatment, data = fish)
qqnorm(rstandard(pro.aov))
abline(0,1)

Normal Q−Q Plot

2
1
Sample Quantiles

0
−1
−2

−2 −1 0 1 2

Theoretical Quantiles

hist(rstandard(pro.aov))

13
Histogram of rstandard(pro.aov)
8
6
Frequency

4
2
0

−2 −1 0 1 2

rstandard(pro.aov)

All of the ways to assess the constant variance assumption indicate the variances are equal after the log
transformation.
plot(fitted(pro.aov),rstandard(pro.aov))

14
2
1
rstandard(pro.aov)

0
−1
−2

2.8 3.0 3.2 3.4 3.6 3.8 4.0

fitted(pro.aov)

bartlett.test(log(Prolactin) ~ Treatment, data = fish)

##
## Bartlett test of homogeneity of variances
##
## data: log(Prolactin) by Treatment
## Bartlett's K-squared = 1.5541, df = 3, p-value = 0.6698
out<-tapply(log(fish$Prolactin),fish$Treatment,sd)
out

## A B C D
## 0.5097462 0.3994729 0.5382265 0.3785816
out[3]/out[4]

## C
## 1.421692
(v) If the assumptions are met and there is significant F-test perform LSD tests and identify which pairs are
significantly different.
The ANOVA table indicates we reject the null hypothesis.
summary(pro.aov)

## Df Sum Sq Mean Sq F value Pr(>F)

## Treatment 3 10.717 3.572 16.76 5.57e-07 ***
## Residuals 36 7.672 0.213
## ---

15
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results of the LSD test are show below.
library(agricolae)
LSD.test(pro.aov,"Treatment",console=T)

##
## Study: pro.aov ~ "Treatment"
##
## LSD t Test for log(Prolactin)
##
## Mean Square Error: 0.2131079
##
## Treatment, means and individual ( 95 %) CI
##
## log.Prolactin. std r LCL UCL Min Max
## A 2.713137 0.5097462 10 2.417071 3.009202 1.722767 3.575151
## B 3.270747 0.3994729 10 2.974681 3.566812 2.653242 3.795489
## C 3.833124 0.5382265 10 3.537058 4.129190 3.054001 4.593098
## D 4.042180 0.3785816 10 3.746114 4.338245 3.446808 4.680278
##
## Alpha: 0.05 ; DF Error: 36
## Critical Value of t: 2.028094
##
## least Significant Difference: 0.4186999
##
## Treatments with the same letter are not significantly different.
##
## log(Prolactin) groups
## D 4.042180 a
## C 3.833124 a
## B 3.270747 b
## A 2.713137 c
(vi) One issue is that we have performed our hypothesis testing on the log scale. This means there are some
steps to be made if we wish to interpret the data on the original scale; e.g. provide a 95% CI on the original
scale. We will step through these.
Suppose the biologist was primarily interested in comparing the prolactin concentrations for A (saltwater
cysts, day 1) vs B (freshwater, no cysts, day 1).
• Calculate the difference in the means;
out<-tapply(log(fish$Prolactin),fish$Treatment,mean)
out

## A B C D
## 2.713137 3.270747 3.833124 4.042180
diffm<-out[2]-out[1]
diffm

## B
## 0.5576099
• Use the R output to calculate the Least Significant Difference (LSD):
√ ( )
0.025
LSD = tresid df × SED = t crit × ResidM S n11 + n12 .

16
This is given in the output of the LSD.test function; LSD = 0.4187.
• Calculate the lower and upper end-point of the 95% CI around the difference in mean based on the
the 95% CI being:
√ ( )
95CI = y ± tcrit × ResidM S n11 + n12 .

The lower and upper CIs are

l95<-diffm-0.4186999
u95<-diffm+0.4186999
l95

## B
## 0.13891
u95

## B
## 0.9763098
• The CI and mean are on the log scale, so backtransform the difference in the means (‘exp’ function),
the lower and upper end-point 95% CI. Note that the upper and lower tail are not of equal length on
the original scale.
exp(diffm)

## B
## 1.746493
exp(l95)

## B
## 1.149021
exp(u95)

## B
## 2.654642
(vii) Now have an estimate of the difference in the means on the original scale. It actually corresponds to
a ratio on the original scale. The reason is based on log laws, we can write the difference between 2 logged
numbers (A and B) as a log of their ratio (A/B);
(A)
log (A) − log (B) = log B .
If we backtransform the log of their ratio we get the ratio on the original scale;

elog( B ) =
A
A
B.

So the backtransformed difference between the pairs of the means is a ratio.

Note: if we were to backtransform the group means on the log scale we would get the geometric mean on
the original scale.
Provide a biological interpretation for this estimate and confidence interval. Use the CI to decided if there
is a significant difference between Treatment A and Treatment B.
Since we interpret the back-transformed difference as a ratio, we conclude that the mean prolactin concentra-
tion in Treatment B is estimated as 1.75� that in Treatment A. However, the 95% CI for this ratio extends
from 1.15� to 2.56�. Since the 95% CI for the ratio does not include 1, this also demonstrates the mean
prolactin concentrations for Treatments A and B do differ significantly (P < 0.05). Note if the ratio (on the
original prolaction scale) is 1, then, this implies meanB / meanA = 1, i.e. meanB = meanA .

17
Exercise 5 - Broiler Chickens

This exercise is an analysis of a set of growth data. It is an open question for you to gain more practice.
The effect of weight gain in dressed broiler chickens was determined after five generations of selection. Group
A was bred by using only the heaviest 10% in each generation; groups B and C were bred using respectively
the heaviest 30% and 50%; group D was obtained by crossing groups A and C of the previous generation.
The dressed weights (kg) of 25 birds from each group have been recorded.
The data is found in the Broilers worksheet.
(i) Write down the null and alternate hypothesis. What is the treatment factor, and how many levels does
it have? What are the sample sizes for each group (ni )?
H0 : µA = µB = µC = µD
H1 : not all µi are equal
where i (i = A, B, C, D) is the population mean weight gain for broilers in selection group i.
The treatment factor is the selection group, with t = 4 levels in this factor. There are r = 25 chicks in each
selection group (equal replication).
(ii) Import the data into R, and then obtain some numerical and graphical summaries of the data, by each
group. How would you interpret these data? From these summaries, is the assumption of homogeneity of
variances met? What about normality? Try a formal Bartlett’s test using the bartlett.test function. Use
residual diagnostics to assess the assumptions.
The summary statistics by group indicate the data is likely to be normally distributed (mean ~ median) and
the variances are equal. This confirmed by boxplots for each group.
library(readxl)
broilers<-read_excel("Data4.xlsx",sheet="Broilers")
str(broilers)

## Classes 'tbl_df', 'tbl' and 'data.frame': 100 obs. of 2 variables:

## $ WtGain: num 1.67 1.64 1.6 1.66 1.4 1.48 1.7 1.5 1.67 1.52 ...
## $ Group : chr "A" "A" "A" "A" ...
broilers$Group<-as.factor(broilers$Group)
aggregate(WtGain ~ Group, summary, data = broilers)

## Group WtGain.Min. WtGain.1st Qu. WtGain.Median WtGain.Mean

## 1 A 1.4000 1.5000 1.5500 1.5644
## 2 B 1.3200 1.4500 1.4800 1.5160
## 3 C 1.3000 1.4100 1.5000 1.4860
## 4 D 1.3500 1.4700 1.5700 1.5424
## WtGain.3rd Qu. WtGain.Max.
## 1 1.6400 1.7000
## 2 1.5800 1.8100
## 3 1.5500 1.6900
## 4 1.6000 1.7600
boxplot(WtGain ~ Group, ylab = "Weight gain (kg)", data = broilers)

18
1.8
1.7
Weight gain (kg)

1.6
1.5
1.4
1.3

A B C D

aggregate(WtGain ~ Group, sd, data = broilers)

## Group WtGain
## 1 A 0.09046546
## 2 B 0.10984838
## 3 C 0.09673848
## 4 D 0.09925892
The Bartlett’s test indicates the variance are equal.
bartlett.test(WtGain ~ Group, data = broilers)

##
## Bartlett test of homogeneity of variances
##
## data: WtGain by Group
## Bartlett's K-squared = 0.93192, df = 3, p-value = 0.8177
The residual diagnostics indicate the data is normally distributed and variances are equal.
broilers.aov <- aov(WtGain ~ Group, data = broilers)
qqnorm(rstandard(broilers.aov))
abline(0,1)

19
Normal Q−Q Plot
3
2
Sample Quantiles

1
0
−1
−2

−2 −1 0 1 2

Theoretical Quantiles

hist(rstandard(broilers.aov))

20
Histogram of rstandard(broilers.aov)
20
15
Frequency

10
5
0

−2 −1 0 1 2 3

rstandard(broilers.aov)

plot(fitted(broilers.aov),rstandard(broilers.aov))

21
3
rstandard(broilers.aov)

2
1
0
−1
−2

1.50 1.52 1.54 1.56

fitted(broilers.aov)

(iii) Note that the results of the analysis can only be used when the assumptions of the analysis have been
met. If you believe that the assumptions are met, then what would your conclusions of the analysis of
variance be? You should use the ‘summary function’ applied to your ‘aov’ object to obtain the ANOVA
table.
The ANOVA table indicates we reject the null hypothesis.
summary(broilers.aov)

## Df Sum Sq Mean Sq F value Pr(>F)

## Group 3 0.0859 0.028648 2.904 0.0387 *
## Residuals 96 0.9471 0.009865
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(iv) Without any formal analysis, consider the result of the group means in relation to the group treatment
- i.e. type of selection. Would this pattern be expected? If appropriate perform a LSD test.
Looking at the sample means, this pattern might be expected from the breeding experiment: weight gain
appears to be related to the degree of weight selection in each generation. For Group 1 (A), the heaviest
10% of birds were used, as this breed did result in the highest wait gain, with lesser increases for Groups 2
(B) and 3 (C). Similarly, since Group 4 (D) was obtained by crossing Groups 1 and 3, then an intermediate
result might be expected – and this is what occurred here.
library(agricolae)
LSD.test(broilers.aov,"Group",console=T)

##
## Study: broilers.aov ~ "Group"
##

22
## LSD t Test for WtGain
##
## Mean Square Error: 0.009865333
##
## Group, means and individual ( 95 %) CI
##
## WtGain std r LCL UCL Min Max
## A 1.5644 0.09046546 25 1.524969 1.603831 1.40 1.70
## B 1.5160 0.10984838 25 1.476569 1.555431 1.32 1.81
## C 1.4860 0.09673848 25 1.446569 1.525431 1.30 1.69
## D 1.5424 0.09925892 25 1.502969 1.581831 1.35 1.76
##
## Alpha: 0.05 ; DF Error: 96
## Critical Value of t: 1.984984
##
## least Significant Difference: 0.05576452
##
## Treatments with the same letter are not significantly different.
##
## WtGain groups
## A 1.5644 a
## D 1.5424 a
## B 1.5160 ab
## C 1.4860 b

Opm6090 Final Project
No ratings yet
Opm6090 Final Project
8 pages
Probability and Statistics For Scientists and Engineers 8th Ed K Ye S Myers SOLUTIONS MANUAL
81% (16)
Probability and Statistics For Scientists and Engineers 8th Ed K Ye S Myers SOLUTIONS MANUAL
285 pages
John Deeree - Operator's Manual JS61 Rotary Mowers
100% (1)
John Deeree - Operator's Manual JS61 Rotary Mowers
39 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Eider and Hawthorn Apartments: A New Chapter For Hendon
No ratings yet
Eider and Hawthorn Apartments: A New Chapter For Hendon
47 pages
Northill Apartments - Furniture
No ratings yet
Northill Apartments - Furniture
1 page
Easy Game Making Sense of
100% (2)
Easy Game Making Sense of
6 pages
Math 2275 Assignment 5
No ratings yet
Math 2275 Assignment 5
15 pages
Regression Diagnostics With R: Anne Boomsma
No ratings yet
Regression Diagnostics With R: Anne Boomsma
23 pages
Transformation Data
No ratings yet
Transformation Data
13 pages
Checking Model Assumptions
No ratings yet
Checking Model Assumptions
4 pages
Topic 4 - Further Work On One-Way ANOVA
No ratings yet
Topic 4 - Further Work On One-Way ANOVA
20 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
ProbList7 2024 SLN
No ratings yet
ProbList7 2024 SLN
15 pages
04.11. Assignment 1 Write-Up
No ratings yet
04.11. Assignment 1 Write-Up
15 pages
Week 8
No ratings yet
Week 8
13 pages
Results 1
No ratings yet
Results 1
4 pages
Weatherwax Rice Solution Manual
No ratings yet
Weatherwax Rice Solution Manual
21 pages
Chapter 4 Hand Out
No ratings yet
Chapter 4 Hand Out
15 pages
ANOVA in R
No ratings yet
ANOVA in R
7 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Regn Lect 7
No ratings yet
Regn Lect 7
26 pages
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
No ratings yet
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
23 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
BAN5
No ratings yet
BAN5
2 pages
S 15 Notes
No ratings yet
S 15 Notes
216 pages
Stata Results
No ratings yet
Stata Results
4 pages
Regn Lect 4
No ratings yet
Regn Lect 4
9 pages
2012-Assumption and Data Transformationnew
No ratings yet
2012-Assumption and Data Transformationnew
57 pages
Finance
No ratings yet
Finance
5 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
00000chen - Linear Regression Analysis3
No ratings yet
00000chen - Linear Regression Analysis3
252 pages
Advanced Statistical Methods Using R
No ratings yet
Advanced Statistical Methods Using R
32 pages
330 Lecture9 2014
No ratings yet
330 Lecture9 2014
40 pages
Exploratory Data Analysis: 2.1 Objectives
No ratings yet
Exploratory Data Analysis: 2.1 Objectives
23 pages
Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021
No ratings yet
Yadunandan Sharma 500826933 MTH480 Due Date: April 15, 2021
16 pages
R Code For Linear Regression Analysis 1 Way ANOVA
No ratings yet
R Code For Linear Regression Analysis 1 Way ANOVA
8 pages
2017dec 02402 Solution en
No ratings yet
2017dec 02402 Solution en
45 pages
2022 Final
No ratings yet
2022 Final
25 pages
Linear Statistical Models
No ratings yet
Linear Statistical Models
16 pages
Statistics 502 Lecture Notes: Peter D. Hoff
No ratings yet
Statistics 502 Lecture Notes: Peter D. Hoff
186 pages
Exp 7
No ratings yet
Exp 7
8 pages
Sta 226
No ratings yet
Sta 226
5 pages
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
No ratings yet
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
4 pages
Manual SAS GEOESTADISTICA
No ratings yet
Manual SAS GEOESTADISTICA
21 pages
Distribution Theory For
No ratings yet
Distribution Theory For
7 pages
Lab Introduction To STATA
100% (1)
Lab Introduction To STATA
27 pages
Stat 509 Notes
100% (1)
Stat 509 Notes
195 pages
W11 Exercisesolutions
No ratings yet
W11 Exercisesolutions
6 pages
STAT 222 Spring 2021 HW5 Solutions
No ratings yet
STAT 222 Spring 2021 HW5 Solutions
7 pages
ANOVA3
No ratings yet
ANOVA3
194 pages
Solución 1:: Examen de Investigación
No ratings yet
Solución 1:: Examen de Investigación
8 pages
Statistics Notes
No ratings yet
Statistics Notes
185 pages
Ho - Diagnostics Examples 2 in SPSS
No ratings yet
Ho - Diagnostics Examples 2 in SPSS
4 pages
Some Thoughts About The Assumption of Normality
No ratings yet
Some Thoughts About The Assumption of Normality
21 pages
R Practice
No ratings yet
R Practice
38 pages
Medical Statistics With R
No ratings yet
Medical Statistics With R
85 pages
Math631 Course Notes
No ratings yet
Math631 Course Notes
281 pages
CompleteLectureNotes STAT 261
No ratings yet
CompleteLectureNotes STAT 261
158 pages
Diagnostics in R Commander
No ratings yet
Diagnostics in R Commander
2 pages
Final Reviews
No ratings yet
Final Reviews
4 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Seasonal in Uence On Milk Production Performance in Different Breeds of Dairy Cows
No ratings yet
Seasonal in Uence On Milk Production Performance in Different Breeds of Dairy Cows
5 pages
Software Vs Hardware Encryption
0% (1)
Software Vs Hardware Encryption
4 pages
The Effect of Herbs On Milk Yield and Milk Quality
No ratings yet
The Effect of Herbs On Milk Yield and Milk Quality
7 pages
APJ Quantifying The Business Value of Dell XC Powered
No ratings yet
APJ Quantifying The Business Value of Dell XC Powered
18 pages
Software Versus Hardware Full Disk Encryption
100% (1)
Software Versus Hardware Full Disk Encryption
4 pages
Uptown Riverside
No ratings yet
Uptown Riverside
23 pages
Veeam Backup Microsoft Office 365 2 0 User Guide
No ratings yet
Veeam Backup Microsoft Office 365 2 0 User Guide
255 pages
Local Crescent-Fact-Sheet
No ratings yet
Local Crescent-Fact-Sheet
2 pages
Uptown Riverside - Project Overview Sheet
No ratings yet
Uptown Riverside - Project Overview Sheet
3 pages
X1 Manchester Waters
No ratings yet
X1 Manchester Waters
8 pages
Oxygen - Project Overview Sheet
No ratings yet
Oxygen - Project Overview Sheet
3 pages
1198 Sales 118 Apartment N12 02
No ratings yet
1198 Sales 118 Apartment N12 02
1 page
Chatham Waters: Architect: Apartment Schedule: Furniture Pack Price
No ratings yet
Chatham Waters: Architect: Apartment Schedule: Furniture Pack Price
1 page
Magic Quadrant For Enterprise Low-Code Application Platforms
No ratings yet
Magic Quadrant For Enterprise Low-Code Application Platforms
34 pages
Northill Apartments - Brochure
No ratings yet
Northill Apartments - Brochure
10 pages
PASS SQLSaturday PowerApps Flow
No ratings yet
PASS SQLSaturday PowerApps Flow
57 pages
Lecture 1 - BIOL933 Design, Analysis, and Interpretation of Experiments PDF
No ratings yet
Lecture 1 - BIOL933 Design, Analysis, and Interpretation of Experiments PDF
43 pages
Practical 1 Lab Safety and DNA Isolation PDF
No ratings yet
Practical 1 Lab Safety and DNA Isolation PDF
14 pages
Vibedeskoak Spec Sheet
No ratings yet
Vibedeskoak Spec Sheet
1 page
H3C S5560X-EI Series Converged Gigabit Switches: Product Overview
No ratings yet
H3C S5560X-EI Series Converged Gigabit Switches: Product Overview
16 pages
British Medical Journal July 1989 - Interview With DR Marietta Higgs
No ratings yet
British Medical Journal July 1989 - Interview With DR Marietta Higgs
1 page
Cognitive Enrichment and Welfare: Current Approaches and Future Directions
No ratings yet
Cognitive Enrichment and Welfare: Current Approaches and Future Directions
28 pages
Ref QLD - Town-Planning-Info-Sheet
No ratings yet
Ref QLD - Town-Planning-Info-Sheet
4 pages
Storage Requirement NFPA
No ratings yet
Storage Requirement NFPA
12 pages
During Ketu Mahadasha
No ratings yet
During Ketu Mahadasha
10 pages
Tai County Silicones Co., Ltd. DSA-88 Antifoam Compound: Description Applications
No ratings yet
Tai County Silicones Co., Ltd. DSA-88 Antifoam Compound: Description Applications
1 page
Data and Computer Communications
0% (1)
Data and Computer Communications
39 pages
ELEMENTS SENTENCE-CONSTRUCTION-for-Printing
No ratings yet
ELEMENTS SENTENCE-CONSTRUCTION-for-Printing
27 pages
Faarfield: FAARFIELD V 1.42 - Airport Pavement Design
No ratings yet
Faarfield: FAARFIELD V 1.42 - Airport Pavement Design
2 pages
Schools of Thought: Structuralism
No ratings yet
Schools of Thought: Structuralism
29 pages
Getting Started With Calc Manager For HFM Calc Manager For HFM
No ratings yet
Getting Started With Calc Manager For HFM Calc Manager For HFM
48 pages
Syllabus For Jee Mains
No ratings yet
Syllabus For Jee Mains
11 pages
Online Education Dashboard
No ratings yet
Online Education Dashboard
4 pages
SS HIS G4-History-Term-4-Exam-Memo
No ratings yet
SS HIS G4-History-Term-4-Exam-Memo
3 pages
Instruction: Erection Manual
100% (1)
Instruction: Erection Manual
53 pages
HLS-220 D 750.23.5.028 (031033036043) - Englisch
No ratings yet
HLS-220 D 750.23.5.028 (031033036043) - Englisch
13 pages
Mohd
No ratings yet
Mohd
1 page
Nanjing Insulators Brochure
No ratings yet
Nanjing Insulators Brochure
33 pages
Fault Level Calculation: Short Circuit Electric System
No ratings yet
Fault Level Calculation: Short Circuit Electric System
2 pages
Ce PDF
No ratings yet
Ce PDF
13 pages
Ramezankhani 2018
No ratings yet
Ramezankhani 2018
51 pages
Ajmal Steel - Product Catalog V2 1
No ratings yet
Ajmal Steel - Product Catalog V2 1
42 pages
CIS Debit Trading Deutsche
No ratings yet
CIS Debit Trading Deutsche
4 pages
Moons m2 Series Ac Servo System User Manual 2017 en v1.0
No ratings yet
Moons m2 Series Ac Servo System User Manual 2017 en v1.0
175 pages
Kontak Metamorf
No ratings yet
Kontak Metamorf
71 pages
JD - Private Sector Engagement Manager
No ratings yet
JD - Private Sector Engagement Manager
2 pages
Plano Hidraulico TH255C
No ratings yet
Plano Hidraulico TH255C
7 pages

Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes

Uploaded by

Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes

Uploaded by

Solutions - Lab 4 - Assumptions & Multiple

At the end of this Lab students should be able to:

Exercise 1 - Finding pairs of groups that are significantly different

This is the same data from Exercise 2 - Lab 3.

Treatment Mean Std.Dev Variance

Source_Variation Degree_Freedom Sum_Squares Mean_Square F_Statistic P_Value

and obtain the upper 2.5% critical t-value (tcrit = t0.025

Exercise 2 - Why using residuals to assess assumptions is best

C: Group1&2 D: Residuals Group1&2

Exercise 3 - Diatoms in streams

Importing and processing data, then fitting an ANOVA model

## Warning: package 'readxl' was built under R version 3.5.1

## Classes 'tbl_df', 'tbl' and 'data.frame': 35 obs. of 5 variables:

## Df Sum Sq Mean Sq F value Pr(>F)

To test the model assumptions we encourage you to produce 3 figures;

1.4 1.6 1.8 2.0

A different approach to test the assumption of constant variance

• H1 : not all σi2 are equal (i = BACK, LOW, M ED, HIGH)

## Classes 'tbl_df', 'tbl' and 'data.frame': 40 obs. of 2 variables:

Normal Q−Q Plot

(iii) Assess the assumption of constant variance by:

• using the Bartlett’s test;

Normal Q−Q Plot

2.8 3.0 3.2 3.4 3.6 3.8 4.0

bartlett.test(log(Prolactin) ~ Treatment, data = fish)

## Df Sum Sq Mean Sq F value Pr(>F)

The lower and upper CIs are

So the backtransformed difference between the pairs of the means is a ratio.

## Classes 'tbl_df', 'tbl' and 'data.frame': 100 obs. of 2 variables:

## Group WtGain.Min. WtGain.1st Qu. WtGain.Median WtGain.Mean

aggregate(WtGain ~ Group, sd, data = broilers)

1.50 1.52 1.54 1.56

## Df Sum Sq Mean Sq F value Pr(>F)

You might also like