0% found this document useful (0 votes)
55 views23 pages

Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes

The document provides information and exercises for analyzing data from an experiment on maize planting densities. It includes: 1) Data from an experiment with three planting density treatments and the results of an ANOVA showing a significant effect of density. 2) Instructions to calculate least significant differences and identify which treatment pair means are significantly different. 3) An explanation of why testing assumptions using residuals is better than using all observations, as the residuals remove treatment effects. 4) A third exercise using diatom diversity data to test assumptions with residual diagnostics and identify significant differences with LSDs.

Uploaded by

Abery Au
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views23 pages

Solutions - Lab 4 - Assumptions & Multiple Comparisons: Learning Outcomes

The document provides information and exercises for analyzing data from an experiment on maize planting densities. It includes: 1) Data from an experiment with three planting density treatments and the results of an ANOVA showing a significant effect of density. 2) Instructions to calculate least significant differences and identify which treatment pair means are significantly different. 3) An explanation of why testing assumptions using residuals is better than using all observations, as the residuals remove treatment effects. 4) A third exercise using diatom diversity data to test assumptions with residual diagnostics and identify significant differences with LSDs.

Uploaded by

Abery Au
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Solutions - Lab 4 - Assumptions & Multiple

Comparisons
Learning outcomes

At the end of this Lab students should be able to:


• test the assumptions of ANOVA using residual diagnostics;
• use least significance difference (LSD) tests to determine which pairs of groups are significantly different;
• use R to perform the analyses.
All of the data for this practical is in the Data4.xlsx file.

Exercise 1 - Finding pairs of groups that are significantly different

This is the same data from Exercise 2 - Lab 3.


Your experiment involved growing maize as a fodder crop, where three levels of planting density were ex-
amined, namely 20, 30 and 40 plants per unit area. Each density was trialled on five different plots; all 15
plots were considered to be similar in most respects. The sample means and standard deviations (kg of dry
matter/plot) for each planting density were as follows:

Treatment Mean Std.Dev Variance


20 plants/unit area 17.58 2.7 7.29
30 plants/unit area 27.18 1.89 3.577
40 plants/unit area 27.14 2.02 4.093
Overall 23.97 5.11 NA

An analysis of variance was undertaken to determine if the density of planting influenced the total dry weight
of maize for the plot. The results are shown below.

Source_Variation Degree_Freedom Sum_Squares Mean_Square F_Statistic P_Value


Planting_Density 2 305.925 152.96 30.67 <0.01
Residual 12 59.84 4.99 NA NA
Total 14 365.765 NA NA NA

Calculate the standard error for the difference between any two treatment means;

SED = Residual M S × 2r ,

and obtain the upper 2.5% critical t-value (tcrit = t0.025


res df ) from the Statistical Tables for ENVX2001.pdf
on the eLearning site.
Then calculate the least significant difference at the 5% level:
LSD (0.05) = tcrit × SED.
√ √
SED = Residual M S × 2r = 4.9867 × 2
5 = 1.412kg

tcrit = t0.025
12 = 2.179

1
LSD = 2.179 × 1.412 = 3.077kg
Determine which pairs of means are significantly different from each other.
Comparisons: P20 vs P30 |17.58 – 27.18| = 9.6 significant (since abs mean diff > LSD) P20 vs P40 |17.58 –
27.14| = 9.56 significant (since abs mean diff > LSD) P30 vs P40 |27.18 – 27.14| = 0.04 not significant (since
abs mean diff < LSD) So P20 has a significantly lower mean yield than both P30 and P40, but there is no
significant difference between P30 and P40 (P > 0.05).

Exercise 2 - Why using residuals to assess assumptions is best

In this Topic you are encouraged to test the assumptions using the residuals. This exercise illustrates why
it is not ideal to test the normality assumption using all of the observations irrespective of the treatments.
First we will create 2 synthetic datasets which we sample 50 times (n=50) from a normally distributed
population. Both underlying populations have the same variation (sd=3) but have a different mean (mean=10,
mean=40). We then plot the histograms for each individually, both groups combined and the combined
residuals (observation minus group mean).
par(mfrow=c(2,2))
group1<-rnorm(n=50,mean=10,sd=3)
group2<-rnorm(n=50,mean=40,sd=3)
hist(group1,main="A: Group1",xlab="")
hist(group2,main="B: Group2",xlab="")
hist(c(group1,group2),main="C: Group1&2",xlab="")
hist(c(group1-mean(group1),group2-mean(group2)),main="D: Residuals Group1&2",xlab="")

A: Group1 B: Group2
15
Frequency

Frequency
10

5
0

4 6 8 10 12 14 16 18 35 40 45

C: Group1&2 D: Residuals Group1&2


25

25
Frequency

Frequency
10

0 10
0

0 10 20 30 40 50 −5 0 5

2
We can see that histogram of each group is normally distributed (A,B), however when we combine the data
we have 2 distinct groupings centered on the mean of each group (C). Therefore, if we look at the raw data
irrespective of the groups we would not see a normally distributed dataset. This is because the effect
of individual treatments (or groups) is different so each observation is perturbed according to the treatment
it receives or group it is in. If we examine the residuals (D), the treatment (or group) effects have been
removed and we can then test if the data is normal or has constant variance. It requires fitting of a model
to the data, in this case a 1-way ANOVA model. This is why we test the assumptions on the residuals. You
could look at the distribution of each group separately but then for some experiments the replication is small
so it is hard to assess normality, using residuals allows all of the observations to be pooled together.

Exercise 3 - Diatoms in streams

This exercise is from Exercise 1 in Practcial 3. Here we will test the assumptions using residual diagnostics
and finding significant differences using LSDs. The data is found in the Diatoms worksheet.

Importing and processing data, then fitting an ANOVA model

library(readxl)

## Warning: package 'readxl' was built under R version 3.5.1


diatoms<-read_excel("Data4.xlsx",sheet="Diatoms")
diatoms$Zinc<-as.factor(diatoms$Zinc)
str(diatoms)

## Classes 'tbl_df', 'tbl' and 'data.frame': 35 obs. of 5 variables:


## $ Stream : chr "Eagle" "Blue" "Blue" "Blue" ...
## $ Zinc : Factor w/ 4 levels "BACK","HIGH",..: 1 1 1 1 1 1 1 1 3 3 ...
## $ Diversity: num 2.27 1.7 2.05 1.98 2.2 1.53 0.76 1.89 1.4 2.18 ...
## $ Group : num 1 1 1 1 1 1 1 1 2 2 ...
## $ X__1 : chr NA NA NA NA ...
anova.diatoms<-aov(Diversity~Zinc,data=diatoms)
summary(anova.diatoms)

## Df Sum Sq Mean Sq F value Pr(>F)


## Zinc 3 2.567 0.8555 3.939 0.0176 *
## Residuals 30 6.516 0.2172
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness

Testing assumptions

To test the model assumptions we encourage you to produce 3 figures;


• histogram of the residuals;
• a QQ plot of the residuals;
• plot residuals against fitted values.
It is good practice to base this on the standardised residuals which can be extracted from a model object
using the rstandard function. Standardised residuals are ∼ N (0, 1), and make it easier to interpret the

3
plots for outliers. Based on the normal distribution 95% of observations fall within +/- 2SD’s of the mean
or in the case of standardised residuals +/- 2.
The figure below presents the histogram of the standardised residuals. The majority of the observations plot
as a bell-shaped (normal) distribution. The exception are 2 observations less than -2. Given there are 34
observations this is out about 6% of the dataset so acceptable given we expect 95% observations to be in
the interval of ~[-2, 2].
hist(rstandard(anova.diatoms))

Histogram of rstandard(anova.diatoms)
7
6
5
Frequency

4
3
2
1
0

−2 −1 0 1 2

rstandard(anova.diatoms)

The QQ plot belows shows that the observed quantiles match the theoretical quantiles (assuming normality)
based on the observations reasonably following the 1:1 line. We can assume the data is normally distributed.
qqnorm(rstandard(anova.diatoms))
abline(0,1)

4
Normal Q−Q Plot
1
Sample Quantiles

0
−1
−2

−2 −1 0 1 2

Theoretical Quantiles

The plot below shows the standardised residuals plotted against the fitted values (the group means in this
case). To test the assumption of constant variance we want to have the same spread of observations for
increases in the fitted values. This is the case here. We don’t want to see fanning where the spread of
residuals increases or decreases while the fitted values increasing.
plot(fitted(anova.diatoms),rstandard(anova.diatoms))

5
rstandard(anova.diatoms)

1
0
−1
−2

1.4 1.6 1.8 2.0

fitted(anova.diatoms)

A different approach to test the assumption of constant variance

Statistics is made up of different tribes and some tribes use hypothesis testing to see if a dataset meets the
assumptions of normality and constant variance. One option is the Bartlett’s test for constant variances.
The mechanics are not important but the function and syntax are shown below. The hypotheses are:
2 2 2 2
• H0 : σBACK = σLOW = σM ED = σHIGH

• H1 : not all σi2 are equal (i = BACK, LOW, M ED, HIGH)


We prefer to use numerical and graphical diagnostics, e.g. residuals plots, but this is more to show you other
possibilities. You can use this as a different line of evidence for testing assumptions if you wish. It won’t
work if the data is non-normal and only use it if the data has one treatment factor with a
completely randomised design!
bartlett.test(Diversity ~ Zinc, data = diatoms)

##
## Bartlett test of homogeneity of variances
##
## data: Diversity by Zinc
## Bartlett's K-squared = 0.25294, df = 3, p-value = 0.9686
Based on the P-value being > 0.05 we could state that we retain the null hypothesis and that the variances
are equal.

6
Identify significant differences

In Topic 3 we used the lsmeans package to extract means for each group and their associated 95% CI. The
lsmeans is useful to produce a plot showing the mean and 95% CI which is a nice way to present the results.
#install.packages("lsmeans",repos="http://cran.csiro.au/")
#library(lsmeans)
#lsmeans(anova.diatoms, "Zinc")
#plot(lsmeans(anova.diatoms, "Zinc"))

Based on the non-overlapping confidence intervals the only pairs of groups that are significantly different
are HIGH and LOW. However more correctly we are looking at whether the difference in means = 0 which is a
slightly different question to seeing if the 95% CI around the mean overlaps. Looking at the 95% CI around
the means is a conservative test in that it will under-estimate the amount of times a significant difference
occurs. Therefore, the better approach is to use a least significant difference test which we can extract
using the agricolae package.
library(agricolae)
LSD.test(anova.diatoms,"Zinc",console=T)

##
## Study: anova.diatoms ~ "Zinc"
##
## LSD t Test for Diversity
##
## Mean Square Error: 0.2172137
##
## Zinc, means and individual ( 95 %) CI
##
## Diversity std r LCL UCL Min Max
## BACK 1.797500 0.4852613 8 1.4609789 2.134021 0.76 2.27
## HIGH 1.277778 0.4268717 9 0.9605026 1.595053 0.63 1.90
## LOW 2.032500 0.4449960 8 1.6959789 2.369021 1.40 2.83
## MED 1.717778 0.5030104 9 1.4005026 2.035053 0.80 2.19
##
## Alpha: 0.05 ; DF Error: 30
## Critical Value of t: 2.042272
##
## Groups according to probability of means differences and alpha level( 0.05 )
##
## Treatments with the same letter are not significantly different.
##
## Diversity groups
## LOW 2.032500 a
## BACK 1.797500 a
## MED 1.717778 ab
## HIGH 1.277778 b
The LSD.test function also gives the 95% CI around the mean but the last output identifes which pairs of
the means are significantly different. You will note the following pairs are different:
• ‘LOW’ and ‘HIGH’;
• ‘BACK’ and ‘HIGH’;
We have one more pair being significant compared to looking at the 95% CI. Make sure you can
interpret the letter notation for identifying pairs of groups with signficantly different means.

7
Exercise 4 - Mean comparisons, residual diagnostics and back-transformations

In this exercise will add a layer of complexity by considering a transformation. If our data does not meet
the assumptions we need to transform the data, possible transformations are the square root (weak) and log
(high). When we transform the data we need to be careful about how we interpret the results.
Concentration of prolactin (units g/L) in the pituitary glands of nine-spined stickleback fish was assessed.
The fish were kept in either saltwater or freshwater prior to assay and were different batches were examined
on three successive occasions. Cysts tend to develop in fish when kept in saltwater and sometimes develop in
freshwater populations. The four different groups of fish were used in a preliminary experiment to examine
the effects of cysts, whether induced by saltwater or normally present, on the prolactin production of the
pituitary gland.
The four groups of fish were codes as follows, with 10 fish per group:
• A = saltwater cysts, day 1;
• B = freshwater, no cysts, day 2;
• C = freshwater, no cysts, day 2;
• D = freshwater, cysts, day 3.
The data is found in the Prolactin worksheet.
(i) Import the data into R, perform some exploratory data analysis to make tentative suggestions about
differences between means and the likelihood of the data meeting the assumptions.
library(readxl)
fish<-read_excel("Data4.xlsx",sheet="Prolactin")
fish$Treatment <- as.factor(fish$Treatment)
str(fish)

## Classes 'tbl_df', 'tbl' and 'data.frame': 40 obs. of 2 variables:


## $ Treatment: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...
## $ Prolactin: num 15.1 25.6 5.6 11.4 35.7 17.2 13.3 21.1 11.6 12.3 ...
First we create some boxplots for each group which show there are difference in median between each of the
treatments. There looks like some treatment effect. In terms of the assumptions the spread of data in each
treatment looks different based on the size of the boxes and length of the whiskers. However for each group
the upper and lower whisker lengths are similiar so the distribution is likely to be symmetrical (normal).
boxplot(Prolactin ~ Treatment, ylab = "Prolactin concentration", data = fish)

8
100
Prolactin concentration

80
60
40
20

A B C D

Next we generate summary statistics. The variances are very different (ratio of largest: smallest > 4:1),
therefore the assumption of constant variance is unlikley to be met. The mean and median are similar for
each treatment so the normality assumption could be met.
aggregate(Prolactin ~ Treatment, mean, data = fish)

## Treatment Prolactin
## 1 A 16.89
## 2 B 28.22
## 3 C 52.27
## 4 D 60.73
aggregate(Prolactin ~ Treatment, median, data = fish)

## Treatment Prolactin
## 1 A 14.20
## 2 B 27.35
## 3 C 49.60
## 4 D 56.20
aggregate(Prolactin ~ Treatment, sd, data = fish)

## Treatment Prolactin
## 1 A 8.629723
## 2 B 10.786700
## 3 C 26.165628
## 4 D 23.211302
(ii) Fit an ANOVA model and test the assumption of normality using a QQ plot and a histogram - both

9
based on standardised residuals.
The QQ plot and the histogram indicate the data is normally distributed.
pro.aov <- aov(Prolactin ~ Treatment, data = fish)
qqnorm(rstandard(pro.aov))
abline(0,1)

Normal Q−Q Plot


2
Sample Quantiles

1
0
−1

−2 −1 0 1 2

Theoretical Quantiles

hist(rstandard(pro.aov))

10
Histogram of rstandard(pro.aov)
10
8
Frequency

6
4
2
0

−2 −1 0 1 2 3

rstandard(pro.aov)

(iii) Assess the assumption of constant variance by:


• examine the plot of the standardised residuals against fitted values;
This plots shows an increaseing spead (fanning) of the residuals as the fitted values increase. The assumption
of constant variance is not tenable.
plot(fitted(pro.aov),rstandard(pro.aov))

11
2
rstandard(pro.aov)

1
0
−1

20 30 40 50 60

fitted(pro.aov)

• using the Bartlett’s test;


The P-value is less than 0.05 so we reject the null hypothesis. The variances are not equal.
bartlett.test(Prolactin ~ Treatment, data = fish)

##
## Bartlett test of homogeneity of variances
##
## data: Prolactin by Treatment
## Bartlett's K-squared = 13.651, df = 3, p-value = 0.003421
• calculating the ratio of the larges SD:smallest SD to see if it is below 2:1;
The ratio is 3.03 so further evidence of the variances being unequal.
out<-tapply(fish$Prolactin,fish$Treatment,sd)
out

## A B C D
## 8.629723 10.786700 26.165628 23.211302
out[3]/out[1]

## C
## 3.032036
(iv) The data does not meet the assumptions so log transform (‘log’ function) the response and repeat (ii)
and (iii) to test the assumptions;
In R you can transform data in the model formula see below or you could create a new column in your data

12
frame, for example fish$logProlactin<-log(fish$Prolactin). The log transformation has not changed
the distribution dramatically, it is still normally distributed.
pro.aov <- aov(log(Prolactin) ~ Treatment, data = fish)
qqnorm(rstandard(pro.aov))
abline(0,1)

Normal Q−Q Plot


2
1
Sample Quantiles

0
−1
−2

−2 −1 0 1 2

Theoretical Quantiles

hist(rstandard(pro.aov))

13
Histogram of rstandard(pro.aov)
8
6
Frequency

4
2
0

−2 −1 0 1 2

rstandard(pro.aov)

All of the ways to assess the constant variance assumption indicate the variances are equal after the log
transformation.
plot(fitted(pro.aov),rstandard(pro.aov))

14
2
1
rstandard(pro.aov)

0
−1
−2

2.8 3.0 3.2 3.4 3.6 3.8 4.0

fitted(pro.aov)

bartlett.test(log(Prolactin) ~ Treatment, data = fish)

##
## Bartlett test of homogeneity of variances
##
## data: log(Prolactin) by Treatment
## Bartlett's K-squared = 1.5541, df = 3, p-value = 0.6698
out<-tapply(log(fish$Prolactin),fish$Treatment,sd)
out

## A B C D
## 0.5097462 0.3994729 0.5382265 0.3785816
out[3]/out[4]

## C
## 1.421692
(v) If the assumptions are met and there is significant F-test perform LSD tests and identify which pairs are
significantly different.
The ANOVA table indicates we reject the null hypothesis.
summary(pro.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## Treatment 3 10.717 3.572 16.76 5.57e-07 ***
## Residuals 36 7.672 0.213
## ---

15
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results of the LSD test are show below.
library(agricolae)
LSD.test(pro.aov,"Treatment",console=T)

##
## Study: pro.aov ~ "Treatment"
##
## LSD t Test for log(Prolactin)
##
## Mean Square Error: 0.2131079
##
## Treatment, means and individual ( 95 %) CI
##
## log.Prolactin. std r LCL UCL Min Max
## A 2.713137 0.5097462 10 2.417071 3.009202 1.722767 3.575151
## B 3.270747 0.3994729 10 2.974681 3.566812 2.653242 3.795489
## C 3.833124 0.5382265 10 3.537058 4.129190 3.054001 4.593098
## D 4.042180 0.3785816 10 3.746114 4.338245 3.446808 4.680278
##
## Alpha: 0.05 ; DF Error: 36
## Critical Value of t: 2.028094
##
## least Significant Difference: 0.4186999
##
## Treatments with the same letter are not significantly different.
##
## log(Prolactin) groups
## D 4.042180 a
## C 3.833124 a
## B 3.270747 b
## A 2.713137 c
(vi) One issue is that we have performed our hypothesis testing on the log scale. This means there are some
steps to be made if we wish to interpret the data on the original scale; e.g. provide a 95% CI on the original
scale. We will step through these.
Suppose the biologist was primarily interested in comparing the prolactin concentrations for A (saltwater
cysts, day 1) vs B (freshwater, no cysts, day 1).
• Calculate the difference in the means;
out<-tapply(log(fish$Prolactin),fish$Treatment,mean)
out

## A B C D
## 2.713137 3.270747 3.833124 4.042180
diffm<-out[2]-out[1]
diffm

## B
## 0.5576099
• Use the R output to calculate the Least Significant Difference (LSD):
√ ( )
0.025
LSD = tresid df × SED = t crit × ResidM S n11 + n12 .

16
This is given in the output of the LSD.test function; LSD = 0.4187.
• Calculate the lower and upper end-point of the 95% CI around the difference in mean based on the
the 95% CI being:
√ ( )
95CI = y ± tcrit × ResidM S n11 + n12 .

The lower and upper CIs are


l95<-diffm-0.4186999
u95<-diffm+0.4186999
l95

## B
## 0.13891
u95

## B
## 0.9763098
• The CI and mean are on the log scale, so backtransform the difference in the means (‘exp’ function),
the lower and upper end-point 95% CI. Note that the upper and lower tail are not of equal length on
the original scale.
exp(diffm)

## B
## 1.746493
exp(l95)

## B
## 1.149021
exp(u95)

## B
## 2.654642
(vii) Now have an estimate of the difference in the means on the original scale. It actually corresponds to
a ratio on the original scale. The reason is based on log laws, we can write the difference between 2 logged
numbers (A and B) as a log of their ratio (A/B);
(A)
log (A) − log (B) = log B .
If we backtransform the log of their ratio we get the ratio on the original scale;

elog( B ) =
A
A
B.

So the backtransformed difference between the pairs of the means is a ratio.


Note: if we were to backtransform the group means on the log scale we would get the geometric mean on
the original scale.
Provide a biological interpretation for this estimate and confidence interval. Use the CI to decided if there
is a significant difference between Treatment A and Treatment B.
Since we interpret the back-transformed difference as a ratio, we conclude that the mean prolactin concentra-
tion in Treatment B is estimated as 1.75� that in Treatment A. However, the 95% CI for this ratio extends
from 1.15� to 2.56�. Since the 95% CI for the ratio does not include 1, this also demonstrates the mean
prolactin concentrations for Treatments A and B do differ significantly (P < 0.05). Note if the ratio (on the
original prolaction scale) is 1, then, this implies meanB / meanA = 1, i.e. meanB = meanA .

17
Exercise 5 - Broiler Chickens

This exercise is an analysis of a set of growth data. It is an open question for you to gain more practice.
The effect of weight gain in dressed broiler chickens was determined after five generations of selection. Group
A was bred by using only the heaviest 10% in each generation; groups B and C were bred using respectively
the heaviest 30% and 50%; group D was obtained by crossing groups A and C of the previous generation.
The dressed weights (kg) of 25 birds from each group have been recorded.
The data is found in the Broilers worksheet.
(i) Write down the null and alternate hypothesis. What is the treatment factor, and how many levels does
it have? What are the sample sizes for each group (ni )?
H0 : µA = µB = µC = µD
H1 : not all µi are equal
where i (i = A, B, C, D) is the population mean weight gain for broilers in selection group i.
The treatment factor is the selection group, with t = 4 levels in this factor. There are r = 25 chicks in each
selection group (equal replication).
(ii) Import the data into R, and then obtain some numerical and graphical summaries of the data, by each
group. How would you interpret these data? From these summaries, is the assumption of homogeneity of
variances met? What about normality? Try a formal Bartlett’s test using the bartlett.test function. Use
residual diagnostics to assess the assumptions.
The summary statistics by group indicate the data is likely to be normally distributed (mean ~ median) and
the variances are equal. This confirmed by boxplots for each group.
library(readxl)
broilers<-read_excel("Data4.xlsx",sheet="Broilers")
str(broilers)

## Classes 'tbl_df', 'tbl' and 'data.frame': 100 obs. of 2 variables:


## $ WtGain: num 1.67 1.64 1.6 1.66 1.4 1.48 1.7 1.5 1.67 1.52 ...
## $ Group : chr "A" "A" "A" "A" ...
broilers$Group<-as.factor(broilers$Group)
aggregate(WtGain ~ Group, summary, data = broilers)

## Group WtGain.Min. WtGain.1st Qu. WtGain.Median WtGain.Mean


## 1 A 1.4000 1.5000 1.5500 1.5644
## 2 B 1.3200 1.4500 1.4800 1.5160
## 3 C 1.3000 1.4100 1.5000 1.4860
## 4 D 1.3500 1.4700 1.5700 1.5424
## WtGain.3rd Qu. WtGain.Max.
## 1 1.6400 1.7000
## 2 1.5800 1.8100
## 3 1.5500 1.6900
## 4 1.6000 1.7600
boxplot(WtGain ~ Group, ylab = "Weight gain (kg)", data = broilers)

18
1.8
1.7
Weight gain (kg)

1.6
1.5
1.4
1.3

A B C D

aggregate(WtGain ~ Group, sd, data = broilers)

## Group WtGain
## 1 A 0.09046546
## 2 B 0.10984838
## 3 C 0.09673848
## 4 D 0.09925892
The Bartlett’s test indicates the variance are equal.
bartlett.test(WtGain ~ Group, data = broilers)

##
## Bartlett test of homogeneity of variances
##
## data: WtGain by Group
## Bartlett's K-squared = 0.93192, df = 3, p-value = 0.8177
The residual diagnostics indicate the data is normally distributed and variances are equal.
broilers.aov <- aov(WtGain ~ Group, data = broilers)
qqnorm(rstandard(broilers.aov))
abline(0,1)

19
Normal Q−Q Plot
3
2
Sample Quantiles

1
0
−1
−2

−2 −1 0 1 2

Theoretical Quantiles

hist(rstandard(broilers.aov))

20
Histogram of rstandard(broilers.aov)
20
15
Frequency

10
5
0

−2 −1 0 1 2 3

rstandard(broilers.aov)

plot(fitted(broilers.aov),rstandard(broilers.aov))

21
3
rstandard(broilers.aov)

2
1
0
−1
−2

1.50 1.52 1.54 1.56

fitted(broilers.aov)

(iii) Note that the results of the analysis can only be used when the assumptions of the analysis have been
met. If you believe that the assumptions are met, then what would your conclusions of the analysis of
variance be? You should use the ‘summary function’ applied to your ‘aov’ object to obtain the ANOVA
table.
The ANOVA table indicates we reject the null hypothesis.
summary(broilers.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## Group 3 0.0859 0.028648 2.904 0.0387 *
## Residuals 96 0.9471 0.009865
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(iv) Without any formal analysis, consider the result of the group means in relation to the group treatment
- i.e. type of selection. Would this pattern be expected? If appropriate perform a LSD test.
Looking at the sample means, this pattern might be expected from the breeding experiment: weight gain
appears to be related to the degree of weight selection in each generation. For Group 1 (A), the heaviest
10% of birds were used, as this breed did result in the highest wait gain, with lesser increases for Groups 2
(B) and 3 (C). Similarly, since Group 4 (D) was obtained by crossing Groups 1 and 3, then an intermediate
result might be expected – and this is what occurred here.
library(agricolae)
LSD.test(broilers.aov,"Group",console=T)

##
## Study: broilers.aov ~ "Group"
##

22
## LSD t Test for WtGain
##
## Mean Square Error: 0.009865333
##
## Group, means and individual ( 95 %) CI
##
## WtGain std r LCL UCL Min Max
## A 1.5644 0.09046546 25 1.524969 1.603831 1.40 1.70
## B 1.5160 0.10984838 25 1.476569 1.555431 1.32 1.81
## C 1.4860 0.09673848 25 1.446569 1.525431 1.30 1.69
## D 1.5424 0.09925892 25 1.502969 1.581831 1.35 1.76
##
## Alpha: 0.05 ; DF Error: 96
## Critical Value of t: 1.984984
##
## least Significant Difference: 0.05576452
##
## Treatments with the same letter are not significantly different.
##
## WtGain groups
## A 1.5644 a
## D 1.5424 a
## B 1.5160 ab
## C 1.4860 b

23

You might also like