0% found this document useful (0 votes)

9 views68 pages

Chapter12 OneWayANOVA

Uploaded by

7xk4fkbsrd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views68 pages

Chapter12 OneWayANOVA

Uploaded by

7xk4fkbsrd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 68

DEPARTMENT OF STATISTICS

Chapter 12 One-Way Analysis of Variance

Section 1 - One-Way ANOVA

 Definitions Section 2 - Determining Differences:
 Background - Why use variance?  How to determine which means are different?
 Notation  Why not use two-sample independent?
 Hypotheses  General method
 Assumptions  LSD (Fishers)
 Model  Bonferroni
 Statistics  Tukey
 Test Statistic - conceptual  Dunnett
 ANOVA Table  Graphical Display
 Hypothesis Test – Summary  Procedure
 t-test vs. F-test

1
DEPARTMENT OF STATISTICS

ANOVA
(ANalysis Of VAriance)

2
DEPARTMENT OF STATISTICS

One-Way ANOVA Statistical Question

A research group is studying the growth rates of bacteria in four different

solutions of sugars; glucose, sucrose, fructose, and lactose. The group
is interested in determining if there is any difference in the average growth
rate of bacteria between different sugar solutions.

3
DEPARTMENT OF STATISTICS

ANOVA Terminology

One-way ANOVA (ANalysis Of Variance) is a statistical

inference technique used to compare the means of three or
more groups that are independent of each other. The population
groups are distinguishable by a factor variable and the levels of
the factor define the different populations.

4
DEPARTMENT OF STATISTICS

One-Way ANOVA

5
DEPARTMENT OF STATISTICS

Two-Way ANOVA Statistical Question

A research group is studying the growth rates of bacteria in four different

solutions of sugars; glucose, sucrose, fructose, and lactose , across four
different temperature levels: 20, 25, 30, and 35. The group is interested in
determining if there is any difference in the average growth rate of bacteria
between different sugar solutions and if this relationship is affected by
temperature.

6
DEPARTMENT OF STATISTICS

ANOVA Terminology
Two-way ANOVA (ANalysis Of Variance) is a statistical
technique used to analyze the effects of two independent
factor variables on a continuous dependent variable. The
primary objective of two-way ANOVA is to determine whether
there is a significant difference in means between the groups
defined by the two factors, as well as exploring the interaction
between the two factors.

Two-way ANOVA is not

covered in this course.
7
DEPARTMENT OF STATISTICS

One-Way ANOVA Examples

In each of the following situations, what is the factor and how many levels are there?

1) Do five different brands of gasoline influence automobile efficiency?

Factor: gasoline brand levels: 5

2) Does the hardwood concentration in pulp (%) influence tensile strength of bags made from the pulp?
Factor: hardwood concentration levels: Needs to be determined

3) Does the resulting color density of a fabric depend on the amount of dye used?
Factor: amount of dye levels: Needs to be determined

8
DEPARTMENT OF STATISTICS

One-Way ANOVA Notation (Parameters and Hypothesis)

populations under investigation (Factor  levels)

Hypotheses
Null Hypothesis 𝐇 𝟎 :𝝁𝟏=𝝁𝟐=…=𝝁𝒌

at least one is different

Alternative
at least two are different
Hypothesis
for some

Choose one of the alternative hypothesis

statements. They are all equivalent.
9
DEPARTMENT OF STATISTICS

One-Way ANOVA Comparing Means?

Example (Coffee House):
How do five coffeehouses around campus differ in the demographics of their customers? Are certain
coffee houses more popular among graduate students? Do professors tend to favor one coffeehouse?

A reporter from the student newspaper asks 50 random customers of each of the five coffeehouses to
respond to a questionnaire. One variable of interest is the customer’s age.
Note: There was some non-response.

Variables:
Quantitative Response: age of the customer
Factor: Coffeehouse Levels: (5-levels)

Is there a statistically significant difference between the average

Question:
age of the customers at the different coffeeshops around campus?

10
DEPARTMENT OF STATISTICS

One-Way ANOVA Comparing Means?

11
DEPARTMENT OF STATISTICS

One-Way ANOVA Notation (Statistics)

populations under investigation (Factor  levels)

Total Number of observations 

12
DEPARTMENT OF STATISTICS

One-Way ANOVA Notation (Statistics- Means)

Data ()
such that , for
( observation on the group)

Sample Mean for the Group () Sample Mean for the entire Sample ()
for
The double dot subscript signifies that we averaged over both
sets of indices.
The dot subscript signifies that we averaged over the
second index, i.e., the observations for the group.

13
DEPARTMENT OF STATISTICS

One-Way ANOVA Notation (Statistics-Variance)

Data ()
such that , for
( observation on the group)

Sample Variance for the Group ()

for

The dot subscript signifies that we averaged over the second

index, i.e., the observations for the group.

14
DEPARTMENT OF STATISTICS

One-Way ANOVA Assumptions

1. Each group is a sample from a distinct population.
We have an SRS from the populations of
interest.
For each we have for

2. The responses in each group are independent of those in the other groups.

3. Each of the populations is normally distributed or the CLT holds for the statistics that we
measure; therefore, the statistics are Normal distributed for each of the groups.
for all
4. Traditional One-Way ANOVA (what we cover) we assume homogeneity of variance. (Pooled Estimator)

for all
15
DEPARTMENT OF STATISTICS

Checking Assumption of Homogeneity of Variance

Check Assumption for all

𝒎𝒂𝒙 𝒔 𝒊 ? Ratio of the maximum sample standard

𝟏 ≤ 𝒊 ≤𝒌
Rule of thumb ≤𝟐 deviation of the groups to the minimum
𝐦 𝐢𝐧 𝒔 𝒊 standard deviation of the groups is not too large.
𝟏 ≤𝒊 ≤ 𝒌

Plotting the residual errors of the fitted model.

Residual Plots (Difference in observed values and the fitted values of the model.)

16
DEPARTMENT OF STATISTICS

Checking Assumption of Homogeneity of Variance

Test Assumption for all

Beyond course material

Test is sensitive to departures from normality.
Hypothesis testing procedures Bartlett’s -test from a measure that compares
for homogeneity of variance the pooled variance estimate to the individual
estimates for significant departures.
𝐇 𝟎 : 𝝈𝟐𝟏=𝝈 𝟐𝟐=…=𝝈 𝟐𝒌
Levene’s -test is a robust procedure that uses the absolute
for some deviations of the individual observations from their respective group
means, and it measures how much the variance of the group means
differs from the variance of the individual observations.

17
DEPARTMENT OF STATISTICS

What if the Assumption of Homogeneity is Violated?

Beyond course material

Welch’s -test for One-Way ANOVA Similar to the two independent sample procedure

Modified procedure that does not assume equal variance. The test statistic has a complex form
for an approximate degree of freedom and is approximately -distributed.

Nonparametric Procedures

Kruskal-Wallis test is a nonparametric test that uses the ranks of the observations across
the groups rather than the actual values and test statistic is based on the sum of the ranks.

18
DEPARTMENT OF STATISTICS

-distribution
𝐅 ( 𝐝 𝐟 𝐀 =𝒌 −𝟏 , 𝐝 𝐟 𝐄=𝒏 −𝒌 )
populations under investigation (Factor  levels)

Total Number of observations 

1. The values of are always positive.

2. The distribution is right skewed! (Not symmetric)
3. The mean of the distribution is approximately 1.
4. The shape of the distribution is controlled by the degrees of freedom. (and )
 Is the numerator degrees of freedom (treatment degrees of freedom)
 Is the denominator degrees of freedom (error degrees of freedom)

19
DEPARTMENT OF STATISTICS

-distribution
𝐅 ( 𝐝 𝐟 𝐀 =𝒌 −𝟏 , 𝐝 𝐟 𝐄=𝒏 −𝒌 )
For all

20
DEPARTMENT OF STATISTICS

ANOVA Model
Group Mean () Data () Error ()
for such that , for for
( observation on the group)

ANOVA Generative Model

such that , for

( observation on the group)
Note the is because of the
𝟐
equal variance assumption. 𝝐 𝒊𝒋 ∼ 𝑵 (𝟎 , 𝝈 )
The error term represents all the sources of variability that are not
accounted for by the group means in the One-Way ANOVA model. 21
DEPARTMENT OF STATISTICS

Modeling the Sources of Variability

Sample Variance for the Group ()
for

The dot subscript signifies that we averaged over the second

index, i.e., the observations for the group.

𝐒𝐮𝐦 𝐨𝐟 𝐒𝐪𝐮𝐚𝐫𝐞𝐬 ( 𝐒𝐒)

𝐌𝐞𝐚𝐧 𝐒𝐪𝐮𝐚𝐫𝐞𝐬 =
𝐝𝐟

22
DEPARTMENT OF STATISTICS

Sum of Squares Between Groups/Treatments (SSA)

A reporter from the student newspaper

asks 50 random customers of each of
the five coffeehouses to respond to a
questionnaire. One variable of interest is
the customer’s age.
𝒌
𝐒𝐒𝐀 =∑ 𝒏𝒊 ( 𝒙 𝒊 . − 𝒙 .. )𝟐
𝒊=𝟏
𝐝𝐟 𝐀 = 𝐤 − 𝟏

Mean Squared Treatments

𝒌 Estimate of the variation that is explained by the differences
𝟏
𝐌 𝐒𝐀= ∑ (𝒏 𝒙 − 𝒙
𝒌−𝟏 𝒊=𝟏 𝒊 𝒊 . ..
)
𝟐 due to the population means.
If was true, this would be an unbiased estimate of the
variance provided the equal variance assumption holds.
If was false it overestimates the variance. 23
DEPARTMENT OF STATISTICS

Sum of Squares Error/Residuals (Pooled Variance)

Sum of Squares Within Groups

A reporter from the student newspaper

asks 50 random customers of each of
the five coffeehouses to respond to a
questionnaire. One variable of interest is
the customer’s age.

𝒌 𝒏𝒊 𝒌
𝐒𝐒𝐄 =∑ ∑ ( 𝒙𝒊𝒋 − 𝒙 𝒊 )𝟐= ∑ ( 𝒏𝒊 −𝟏 ) 𝒔 𝟐𝒊
𝒊=𝟏 𝒋=𝟏 𝒊 =𝟏

𝐝𝐟 𝐄 =𝒏− 𝐤
Mean Squared Error
Estimate of the variation that is unexplained by the differences
𝒌 due to the population means. The mean squared error is always
𝟏
𝐌 𝐒𝐄 = ∑
𝒏− 𝒌 𝒊=𝟏
(
𝟐
𝒏𝒊 −𝟏 ) 𝒔 𝒊 an unbiased estimate of the variance of the error term in the
model when the equal variance assumption holds. 24
DEPARTMENT OF STATISTICS

Sum of Squares Total

A reporter from the student newspaper

asks 50 random customers of each of
the five coffeehouses to respond to a
questionnaire. One variable of interest is
the customer’s age.

𝒌 𝒏𝒊
𝐒𝐒𝐓 = ∑ ∑ ( 𝒙 𝒊𝒋 − 𝒙.. )𝟐
𝒊=𝟏 𝒋=𝟏
𝐝𝐟 𝐓 =𝒏−𝟏

Mean Squared Total

𝒌 𝒏𝒊 Estimate of the total variation in the data. How each point deviates
𝟏
𝐌 𝐒𝐓 = ∑ ∑
𝒏 −𝟏 𝒊=𝟏 𝒋=𝟏
( 𝒙 𝒊𝒋 − 𝒙 .. )
𝟐 from the overall mean. It is an unbiased estimator of the error
variance when is true otherwise it also overestimates it.
25
DEPARTMENT OF STATISTICS

Decomposition of the Sum of Squares

𝒌 𝒏𝒊

¿
𝒌 𝒏𝒊 𝒌
𝐒𝐒𝐄 =∑ ∑ ( 𝒙𝒊𝒋 − 𝒙 𝒊 )
+¿ 𝐒𝐒𝐀 =∑ 𝒏𝒊 ( 𝒙 𝒊 . − 𝒙 .. ) 𝐒𝐒𝐓 = ∑ ∑ ( 𝒙 𝒊𝒋 − 𝒙.. )𝟐
𝟐 𝟐

𝒊=𝟏 𝒋=𝟏 𝒊=𝟏 𝒊=𝟏 𝒋=𝟏

26
DEPARTMENT OF STATISTICS

Decomposition of the Sum of Squares

𝒌 𝒏𝒊 𝒌 𝒏𝒊

∑ ∑ ( 𝒊𝒋 .. ) ∑ ∑ ( 𝒊𝒋 𝒊 . .. ) =¿
𝒙
𝒊=𝟏 𝒋=𝟏
− 𝒙 𝟐
= 𝒙 ± 𝒙
𝒊 =𝟏 𝒋 =𝟏
− 𝒙 𝟐
𝟎
𝒌 𝒏𝒊 𝒌 𝒏𝒊 𝒌 𝒏𝒊

∑ ∑ ( 𝒙 𝒊𝒋 − 𝒙 𝒊 .) + ∑ ∑ ( 𝒙 𝒊 . − 𝒙 .. ) +𝟐 ∑ ∑ ( 𝒙𝒊𝒋 − 𝒙 𝒊 .)( 𝒙 𝒊 . − 𝒙.. ) =¿

𝟐 𝟐

𝒊=𝟏 𝒋=𝟏 𝒊=𝟏 𝒋=𝟏 𝒊=𝟏 𝒋=𝟏

𝒌 𝒏𝒊 𝒌

∑ ∑ ( 𝒙 𝒊𝒋 − 𝒙 𝒊 .) + ∑ 𝒏𝒊 ( 𝒙 𝒊 . − 𝒙 .. ) +𝟎
𝟐 𝟐

𝒊=𝟏 𝒋=𝟏 𝒊=𝟏

𝐒𝐒𝐓 ¿ 𝐒𝐒𝐄 +¿ 𝐒𝐒 𝐀
𝒌 𝒏𝒊 𝒌 𝒏𝒊 𝒌

∑ ∑ ( 𝒙 𝒊𝒋 − 𝒙 .. ) = ∑ ∑ ( 𝒙 𝒊𝒋 − 𝒙 𝒊 . ) +∑ 𝒏𝒊 ( 𝒙𝒊 . − 𝒙.. )
𝟐 𝟐 𝟐

𝒊=𝟏 𝒋=𝟏 𝒊 =𝟏 𝒋 =𝟏 𝒊=𝟏 27

DEPARTMENT OF STATISTICS

ANOVA Hypothesis Test

One-Way ANOVA (ANalysis Of Variance) is a statistical hypothesis testing procedure that is
used to compare the means of three or more treatment groups or populations.

One-Way ANOVA (ANalysis Of Variance) assesses whether the observed variation in the
sample means can be attributed to true differences in the population means, or if it is
simply due to chance.

One-Way ANOVA (ANalysis Of Variance) partitions the total variability in the data into different
sources, including the variability within each group and the variability between groups. It then
compares the magnitude of the between-group variability with the within-group variability to
determine if the differences in the means of the groups are statistically significant.

28
DEPARTMENT OF STATISTICS

Test Statistic for One-Way ANOVA

𝐁𝐞𝐭𝐰𝐞𝐞𝐧 𝐆𝐫𝐨𝐮𝐩 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
𝐅 𝐓𝐒=
𝐖𝐢𝐭𝐡𝐢𝐧 𝐆𝐫𝐨𝐮𝐩 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
𝒌
𝟏
𝐌𝐒𝐀 ∑
𝒌 − 𝟏 𝒊=𝟏
𝒏 𝒊 ( 𝒙 𝒊 . − 𝒙 .. ) 𝟐
𝐅 𝐓𝐒= =
𝐌 𝐒𝐄 𝟏
𝒌

∑
𝒏 − 𝒌 𝒊=𝟏
(
𝟐
𝒏𝒊 − 𝟏 ) 𝒔 𝒊

If is true . If and all assumptions are true

If is false tends to be large.

29
DEPARTMENT OF STATISTICS

Hypothesis Test for One-Way ANOVA

Identify and describe the parameter(s) of interest for the populations.
What is the factor? What are the levels of the factor?
Step 1 are the parameters of interest representing the means of the populations.
(In context state what the population is and what the actual parameter or attribute represents.)

Step 2
Null hypothesis is always the same. Pick your favorite way of
State the Hypothesis.
stating the alternative.
𝐇 𝟎 :𝝁𝟏=𝝁𝟐=…=𝝁𝒌
at least one is different

at least two are different

for some

30
DEPARTMENT OF STATISTICS

Hypothesis Test for One-Way ANOVA

Step 3
Calculate the -value.
Calculate the test statistic.
pf(q = fts, df1 = dfA, df2 = dfE, lower.tail = FALSE)
𝐁𝐞𝐭𝐰𝐞𝐞𝐧 𝐆𝐫𝐨𝐮𝐩 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
𝐅 𝐓𝐒=
𝐖𝐢𝐭𝐡𝐢𝐧 𝐆𝐫𝐨𝐮𝐩 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
𝒌
𝟏
𝐌𝐒𝐀 ∑
𝒌 − 𝟏 𝒊=𝟏
𝒏 𝒊 ( 𝒙 𝒊 . − 𝒙 .. ) 𝟐
𝐅 𝐓𝐒= =
𝐌 𝐒𝐄 𝟏
𝒌

∑
𝒏 − 𝒌 𝒊=𝟏
(
𝟐
𝒏𝒊 − 𝟏 ) 𝒔 𝒊

Typically, won't do this by hand.

If you have the data, we will use a built in R function for this entire process.
fit<- aov(quantitativeVariable ~ categoricalVariable, data = dataframe)
summary(fit)
31
DEPARTMENT OF STATISTICS

Decision and Conclusion Template

Step 4
Decision Template
The -value [actual value of -value] [actual value of ] therefore Note the process is the same but
we have evidence to reject the null hypothesis . [statement of Ha in words]
The -value [actual value of -value] [actual value of ] therefore is about the difference in the
population means or levels of the
we do not have evidence to reject the null hypothesis .
factor variable.
Conclusion Template (Must be in terms of )
The data [does or might] [not] give [strong] support
(p-value = [value]) to the claim that the [statement of Ha in words].
[does] If decision was reject and is not that close to -value.
[does not] If decision was do not reject and is not that close to -value.

[might or might not] If is close to the -value.

32
DEPARTMENT OF STATISTICS

ANOVA Table

𝝈 Estimate 𝒔=√ 𝐌 𝐒𝐄
33
DEPARTMENT OF STATISTICS

Coffeehouse Example
Example (Coffee House):
How do five coffeehouses around campus differ in the demographics of their customers? Are certain
coffee houses more popular among graduate students? Do professors tend to favor one coffeehouse?

A reporter from the student newspaper asks 50 random customers of each of the five coffeehouses to
respond to a questionnaire. One variable of interest is the customer’s age.

34
DEPARTMENT OF STATISTICS

Coffeehouse Example

Check Assumption for all

𝒎𝒂𝒙 𝒔 𝒊
𝟏 ≤ 𝒊 ≤𝒌
?
≤𝟐 √ 𝟏𝟐.𝟗𝟕 =𝟏 .𝟑𝟔𝟐𝟏𝟕 ≤? 𝟐
Rule of thumb 𝐦 𝐢𝐧 𝒔 𝒊
𝟏 ≤𝒊 ≤ 𝒌 √ 6.99 35
DEPARTMENT OF STATISTICS

Coffeehouse Example
Checking Normality

36
DEPARTMENT OF STATISTICS

Coffeehouse Example
Checking Normality

Some indication of
skewness but ANOVA
procedure is robust to
some skewness.

37
DEPARTMENT OF STATISTICS

A reporter from the student newspaper asks 50 random customers of each of the five coffeehouses to
respond to a questionnaire. One variable of interest is the customer’s age. Use to conduct a
hypothesis test to see if there is any difference in the mean age at the coffeeshops around campus.

Factor is the coffeehouse it has 5 levels.

Step 1 The mean Age of the customers at each coffee house is of interest.

State the Hypothesis.

Step 2

for some
38
DEPARTMENT OF STATISTICS

Coffeehouse Example
Step 3

Calculate the test statisticand the -value.

𝐁𝐞𝐭𝐰𝐞𝐞𝐧 𝐆𝐫𝐨𝐮𝐩 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧

𝐅 𝐓𝐒=
𝐖𝐢𝐭𝐡𝐢𝐧 𝐆𝐫𝐨𝐮𝐩 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧
𝐌𝐒 𝐀
𝐅 𝐓𝐒=
𝐌𝐒 𝐄

coffeehouse_df <- read.csv("FileLocation.csv", header=TRUE)

coffeehouse_df$Coffeehouse <- as.factor(coffeehouse_df$Coffeehouse)
fit <- aov(Age~Coffeehouse, data = coffeehouse_df)
summary(fit)

39
DEPARTMENT OF STATISTICS

Coffeehouse Example
ANOVA Table Output from R
Df Sum Sq Mean Sq F value Pr(>F)
Coffeehouse 4 8834 2208.4 22.14 4.4e-15 ***
Residuals 195 19451 99.8

40
DEPARTMENT OF STATISTICS

Decision and Conclusion Template

Step 4
Decision

The -value 0.01 therefore we have evidence to

reject the null hypothesis .

Conclusion

The data does give strong support

(-value = ) to the claim that the at least one of the coffee shops around
campus differs in the age of the customers from the rest.

41
DEPARTMENT OF STATISTICS

One-Way ANOVA and Two

Independent Sample -test when
there are Two Populations

42
DEPARTMENT OF STATISTICS

-test and -test Statistic Relationship (2 Groups)

𝒌
𝟏
𝐌𝐒𝐀 ∑
𝒌 − 𝟏 𝒊=𝟏
𝒏 𝒊 ( 𝒙 𝒊 . − 𝒙 .. ) 𝟐
𝟏
𝟐 𝒏𝒊
𝟏
𝐅 𝐓𝐒=
𝐌 𝐒𝐄
=
𝟏
𝒌
𝒙 ..= ∑ ∑
𝒏 𝒊=𝟏 𝒋=𝟏
𝒙 𝒊𝒋 =
𝒏𝟏 +𝒏𝟐
[ 𝒏𝟏 𝒙𝟏 . +𝒏𝟐 𝒙𝟐 . ]
∑
𝒏 − 𝒌 𝒊=𝟏
(
𝟐
𝒏𝒊 − 𝟏 ) 𝒔 𝒊

( ) ( )
𝟐 𝟐 𝟐 𝟐
𝐌𝐒 𝐀 𝒏𝟏 ( 𝒙 𝟏 . − 𝒙 .. ) +𝒏𝟐 ( 𝒙 𝟐 . − 𝒙 .. ) 𝒏𝟏 𝒙 𝟏. +𝒏𝟐 𝒙 𝟐. 𝒏𝟏 𝒙 𝟏 . +𝒏𝟐 𝒙 𝟐 .
𝐅 𝐓𝐒= = 𝒏𝟏 𝒙 𝟏. − +𝒏 𝟐 𝒙 𝟐. −
𝐌𝐒 𝐄 𝟏 𝒏 +𝒏 𝒏𝟏 +𝒏𝟐
[ ( 𝒏𝟏 −𝟏 ) 𝒔 𝟏+ ( 𝒏𝟐 −𝟏 ) 𝒔 𝟐 ] ¿
𝟐 𝟐 𝟏 𝟐
𝟐
𝒏− 𝟐 𝒔𝒑
𝟐 𝟐
𝒏𝟏 𝒏𝟐 𝟐 𝒏𝟐 𝒏𝟏 𝟐
𝟐 ( 𝒙 𝟏 . − 𝒙 𝟐 . ) + 𝟐 ( 𝒙 𝟏 . − 𝒙 𝟐 . )
( 𝒏𝟏 +𝒏𝟐 ) ( 𝒏𝟏+ 𝒏𝟐 )
¿ 𝟐
𝒔𝒑
𝟐
𝒏 𝟏 𝒏 𝟐 (𝒏 𝟏+𝒏 𝟐 ) ( 𝒙 𝟏 . − 𝒙 𝟐 . )𝟐 ( 𝒙 𝟏 . − 𝒙 𝟐 . )𝟐
𝟐 ( 𝒙 𝟏 . − 𝒙 𝟐. ) ¿ ¿ 𝟐 For the equal
( 𝒏𝟏 +𝒏 𝟐 ) ¿ 𝒕 𝐓𝐒
¿ 𝟐
𝒔𝒑
𝒔
𝟐
𝒑 ( 𝒏𝟏 +𝒏 𝟐
𝒏𝟏 𝒏𝟐 ) 𝒔
𝟐
𝒑
( 𝟏
+
𝟏
𝒏𝟏 𝒏 𝟐 ) variance assumption.

43
DEPARTMENT OF STATISTICS

-test and -test Differences

Two Independent Sample Procedure ANOVA

Same or Different Same
Variance Assumption
(In this class we do not assume equal variance) (In this class we assume equal variance)
Hypothesis Test One or two-tailed test Only two-tailed test
Distribution - distribution (symmetric) -distribution (positive/right skew)
Null Value (or no null value)
Number of Levels A single factor of only two levels (Populations) A single factor of any reasonable number of levels (Populations)

44
DEPARTMENT OF STATISTICS

Multiple Comparison Procedures

45
DEPARTMENT OF STATISTICS

After Rejecting the Null Hypothesis of ANOVA

The data [does] give support
If the -value [actual -value] [actual ] therefore we have
(-value = [value]) to the claim that the
evidence to reject the null hypothesis . [statement of Ha in words].

Can we say more? Which population means are actually different?

The null hypothesis for the equality of population means for the levels of a factor is rejected i.e.,

Null Hypothesis 𝐇 𝟎 :𝝁𝟏=𝝁𝟐=…=𝝁𝒌 Rejected!

then we have evidence that at least one population mean differs from the others i.e.,

Alternative
for some Statistically Significant Results.
Hypothesis

We can compare the pairs.

46
DEPARTMENT OF STATISTICS

Graphical Check of Differences

After rejecting : After rejecting :

An effects plot plots the sample means of each group. Side-by-side boxplots allow us to see which
Allowing us to see which means are different but means different and compare the variability to
ignores the variability of each group. understand which differences are not masked by
the variability.
47
DEPARTMENT OF STATISTICS

Simultaneous Comparisons
Rejected the null hypothesis so at least one is different but which?
for some

We do not know which, so we need to compare all possible pairs.

For all we can run a two-sample independent procedure with equal variance assumption.

𝐇 𝟎 :𝝁𝒊 =𝝁 𝒋
𝟐
𝒔 =𝐌𝐒𝐄
( 𝒙 𝒊 . − 𝒙 𝒋 . ) ± 𝒕𝜶 /𝟐 , 𝒅𝒇
√ (
𝟏 𝟏
𝐌𝐒𝐄 +
𝒏𝒊 𝒏 𝒋 ) Pooled variance
confidence interval.

However, there is an issue with controlling the Type I error.

48
DEPARTMENT OF STATISTICS

Simultaneous Comparisons
For all with we can run a two-sample independent procedure.

𝐇 𝟎 :𝝁𝒊 =𝝁 𝒋
We do not know which, so we need to compare all possible pairs
simultaneously and want to control the Type I error for all of them.

A Type I error is the incorrect rejection of a true null hypothesis .

(False identification of statistical significance)
(False Positive)

The probability of Type I error in this case is the probability that we conclude
that for at least one pairwise comparison there is a significant difference
between the means when in fact there is not.

49
DEPARTMENT OF STATISTICS

Controlling the Overall Type I Error

𝒙 − 𝒙 ± 𝒕
√√
∗∗
((
( 𝒊𝒊..− 𝒙 𝒋 .𝒋).±) 𝒕𝜶 /𝟐 , 𝒅𝒇 𝐌𝐒𝐄
𝐌𝐒𝐄
𝟏𝟏 𝟏
++
𝒏𝒏𝒊 𝒊 𝒏𝒏 𝒋𝒋 )
Interval needs to be wider to control the overall Type I error
for all paired comparisons.

We could control the width by changing the critical value.

50
DEPARTMENT OF STATISTICS

Family-wise Error Rate (FWER)

The total Type I error (Family-wise Error Rate)
assuming is true for all tests is
A comparison of two
means test.

If each test makes a Type I error

independently with probability .

The more comparisons we do

the worse the Type I error if we
do not control them together.
A comparison of two means test.
51
DEPARTMENT OF STATISTICS

Family-wise Error Rate (FWER)

The family-wise error rate (FWER) is the probability of obtaining at least one
significant result from a set of multiple statistical tests when is true for all tests.
If you have distinct populations then there are different comparisons.

Note this last line is only

true if all the tests are
truly independent.

𝜶 𝐎𝐯𝐞𝐫𝐚𝐥𝐥 =𝟏 − ( 𝟏− 𝜶 𝐬𝐢𝐧𝐠𝐥𝐞 ) 𝒄

Sidak Correction If they are truly independent.

Not discussed in book
𝜶 𝐬𝐢𝐧𝐠𝐥𝐞 =𝟏 − ( 𝟏 − 𝜶 𝐎𝐯𝐞𝐫𝐚𝐥𝐥 )
𝟏/ 𝒄 We could specify and solve for and
use this for each test.
52
DEPARTMENT OF STATISTICS

Bonferroni Correction
If you have distinct populations then there are different comparisons.

This is known as Bool’s Inequality.

We specify the overall error rate and

use for the individual tests.
𝜶 𝐎𝐯𝐞𝐫𝐚𝐥𝐥
𝜶 𝐬𝐢𝐧𝐠𝐥𝐞 =
𝒄

53
DEPARTMENT OF STATISTICS

Bonferroni Correction
If you have distinct populations then there are different comparisons.

√
Note the degrees of

𝜶 𝐬𝐢𝐧𝐠𝐥𝐞 =
𝜶 𝐎𝐯𝐞𝐫𝐚𝐥𝐥
𝒄
( 𝒙𝒊.− 𝒙 𝒋.) ± 𝒕𝜶 𝐬𝐢𝐧𝐠𝐥𝐞
𝟐
,𝒏−𝒌
𝐌𝐒𝐄
( 𝟏 𝟏
+
𝒏𝒊 𝒏 𝒋 ) freedom is from the
estimate of the
variance ().

The Bonferroni correction is a conservative correction to the individual

Type I error rates to control for the family-wise error rate (FWER).
By making the alpha level more stringent for each comparison, the
Bonferroni correction reduces the overall probability of making a
Conservative Type I error. However, this comes at the cost of reduced statistical
Not the preferred method when power, since the stricter level makes it more difficult to reject the null
there are many comparisons. hypothesis.

54
DEPARTMENT OF STATISTICS

Tukey’s HSD Method

Technically called the Tukey-Kramer Method if we have unequal sample sizes.
Tukey’s honestly significant difference (Tukey HSD) controls the family-wise error rate by
controlling the critical value by ensuring that the largest group difference is controlled at the
significance level specified all other remaining comparisons use the same critical value making
the other tests conservative. This test is more powerful than Bonferroni method.

∗∗𝑸 𝜶, 𝒌 ,𝒏 − 𝒌 is a quantile of the studentized

Critical Value: 𝒕 =
√𝟐 range distribution.

The studentized range distribution is a distribution for the range statistic.

𝑿 ( 𝐌𝐚𝐱 ) − 𝑿 ( 𝐌𝐢𝐧 )
𝑸=

√
Range of the sample means The distribution is
𝑸=
Standard error
𝟏
𝟐
𝑴𝑺𝑬
𝟏 𝟏
+
(
𝒏𝒊 𝒏 𝒋 ) positively skewed.

55
DEPARTMENT OF STATISTICS

Tukey’s HSD Method

∗∗ 𝑸 𝜶, 𝒌 ,𝒏 − 𝒌 is a quantile of the studentized

Critical Value: 𝒕 =
√𝟐 range distribution.

The studentized range distribution is a distribution for the range statistic.

𝒙𝒊− 𝒙 𝒋
𝑸 𝐓𝐒 =

√ 𝟏
𝟐
𝑴𝑺𝑬
(
𝟏 𝟏
+
𝒏𝒊 𝒏 𝒋 ) For all such that .

56
DEPARTMENT OF STATISTICS

Tukey’s HSD Method

∗∗ 𝑸 𝜶, 𝒌 ,𝒏 − 𝒌 is a quantile of the studentized

Critical Value: 𝒕 =
√𝟐 range distribution.

 qtukey(p=C, nmeans= k, df = n – k, lower.tail = TRUE) C=1- alpha_overall

 qtukey(p=alpha_overall, nmeans = k, df = n – k, lower.tail = FALSE)

( 𝒙𝒊.− 𝒙 𝒋.) ±
𝑸 𝜶 ,𝒌 , 𝒏−𝒌
√𝟐 √ (
𝟏 𝟏
𝐌𝐒𝐄 +
𝒏𝒊 𝒏 𝒋 ) Preferred method.
57
DEPARTMENT OF STATISTICS

Tukey’s HSD Method

fit<- aov(quantitativeVariable ~ categoricalVariable, data = dataframe)

summary(fit) If we have the fitted ANOVA.

TukeyHSD(fit, ordered = TRUE, conf.level = C) C = 1- alpha_overall

58
DEPARTMENT OF STATISTICS

Dunnett’s Method for Comparing with a Control

Dunnett’s Method is a multiple comparison test used when comparing multiple treatments to a
control group. It controls the family-wise error rate for the pairwise comparisons between
each treatment and the control group, rather than all possible pairwise comparisons among the
treatments and the control.

This method is not implemented in the standard R packages. You are asked to discover an R package
that implements it and learn to use it on your own in a bonus question for Computer Assignment 8.

59
DEPARTMENT OF STATISTICS

Which Paired Comparisons are Significantly Different?

Creating a graphical display of the output of the paired tests.
Steps (Given the output from R):
1. Create contextual symbols to Represent the means that will be compared.

2. Order your sample means (for ) in increasing order and put them under the symbols.

3. Draw lines under neighboring pairs of means such that they were identified as not being statistically significant.

4. If all pairs are significantly different from each other then no lines will be drawn.

5. Combine lines if the pairs indicate that three or more of the means are not significantly different from each other.

60
DEPARTMENT OF STATISTICS

Paired Comparison Graphical Display Example

Suppose there are four groups the we label and where and we know the significance of the
pairs; the results are provided in the table below.

Significant?
B–A No
C–A Yes
D–A Yes
C–B Yes
What can we say about population means ?
D–B Yes
D–C No
means are larger than the means of but we do not
know if is larger than or vice-versa.

61
DEPARTMENT OF STATISTICS

Paired Comparison Graphical Display Example

Suppose there are four groups the we label and where and we know the significance of
the pairs; the results are provided in the table below.

Significant?
B–A Yes
C–A Yes
D–A Yes
C–B No
What can we say about population means ?
D–B No
D–C No
means are larger than the mean but we do not know
which of is largest.
If was a control and the others were treatments of
different dosages, then it would say that any dose is
significantly different from the control.

62
DEPARTMENT OF STATISTICS

Paired Comparison Graphical Display Example

Suppose there are four groups the we label and where and we know the significance of
the pairs; the results are provided in the table below.

Significant?
B–A Yes
C–A Yes
D–A Yes
C–B No
What can we say about population means ?
D–B Yes
D–C No • means are larger than the mean
• is larger than .
• and are not different statistically significant.
• is larger than and statistically significant but is not significant with it
may be reasonable to assume that is the largest.

63
DEPARTMENT OF STATISTICS

Full ANOVA Procedure with Possible Multiple Testing

1. Perform ANOVA: Determine if there is a significant difference among the groups being compared. If not,
no further tests are needed.
2. Choose a family-wise significance level .
3. Select a method for pairwise comparisons:
• Use Dunnett's method when comparing to a control group.
• Tukey's method when comparing all possible pairs.
4. Calculate the appropriate critical value based on the method chosen in step 3.
5. Compute confidence intervals. (If you have the data R can do both step 4 and 5.)
6. Determine from the confidence intervals which pairs of means are statistically different:
• If 0 is not in the interval, it is statistically significant, and the means are considered different.
Results can be presented in a table.
7. Visually display results: Create a graph to display pairwise comparisons. Only necessary for Tukey's
method and must be done by hand or write your own algorithm.
8. Write a conclusion: Summarize results and draw conclusions in complete English sentences that answer
the question.
Steps 2, 3, 7 and 8 must be done by hand.
64
DEPARTMENT OF STATISTICS

A reporter from the student newspaper asks 50 random customers of each of the five coffeehouses to
respond to a questionnaire. One variable of interest is the customer’s age.

The data does give strong support

(-value = ) to the claim that the at least one of the coffee
But which coffeehouses differ and how? shops around campus differs in the age of the
Use Pairwise comparisons! customers from the rest.
65
DEPARTMENT OF STATISTICS

Coffeehouse Example
But which coffeehouses differ and how?
Use Pairwise comparisons!
Tukey’s honestly significant difference (Tukey HSD): Select Method
We will use a family-wise confidence level of .

coffeehouse_df <- read.csv("FileLocation.csv", header=TRUE)

coffeehouse_df$Coffeehouse <-
as.factor(coffeehouse_df$Coffeehouse) We already have fit the model.
fit <- aov(Age~Coffeehouse, data = coffeehouse_df)
summary(fit)

TukeyHSD(fit, ordered = TRUE, conf.level = C) C = 0.95 Get the pair-wise CI’s.

66
DEPARTMENT OF STATISTICS

Coffeehouse Example
But which coffeehouses differ and how?
Use Pairwise comparisons!
Tukey’s honestly significant difference (Tukey HSD):

-value adjusted for multiple comparisons.

67
DEPARTMENT OF STATISTICS

Coffeehouse Example
But which coffeehouses differ and how?
Use Pairwise comparisons!
Tukey’s honestly significant difference (Tukey HSD):
Create graphical display.

We can conclude that Coffeehouse4 has the smallest mean age .

Coffeehouse2 is has the largest mean age .
However, Coffeehouse3 is only barely significant when comparing
with Coffeehouse2.

ANOVA
0% (1)
ANOVA
26 pages
Materi 1 - One Way and Two Way ANOVA
No ratings yet
Materi 1 - One Way and Two Way ANOVA
84 pages
One-Way ANOVA Is Used To Test If The Means of Two or More Groups Are Significantly Different
No ratings yet
One-Way ANOVA Is Used To Test If The Means of Two or More Groups Are Significantly Different
17 pages
Introduction To Business Research 3
No ratings yet
Introduction To Business Research 3
237 pages
Inferential Statistics
100% (4)
Inferential Statistics
28 pages
Stat 5311 - Multivariate Analysis and Nonparametric Statistics
No ratings yet
Stat 5311 - Multivariate Analysis and Nonparametric Statistics
34 pages
Chapter 4 - Anova - One Way Anova For Comaparing Multiple Groups
No ratings yet
Chapter 4 - Anova - One Way Anova For Comaparing Multiple Groups
16 pages
Lesson 4 Analysis of Variance
No ratings yet
Lesson 4 Analysis of Variance
50 pages
Design of Experiments and ANOVA
No ratings yet
Design of Experiments and ANOVA
45 pages
Psych Stat (Book) - Finals
No ratings yet
Psych Stat (Book) - Finals
4 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Where Are We and Where Are We Going?: Purpose IV DV Inferential Test
No ratings yet
Where Are We and Where Are We Going?: Purpose IV DV Inferential Test
36 pages
Lecture 9: Analysis of Variance: Statistics For Economics 1
No ratings yet
Lecture 9: Analysis of Variance: Statistics For Economics 1
50 pages
Session 12 - 2023
No ratings yet
Session 12 - 2023
43 pages
Analysisof Variance
No ratings yet
Analysisof Variance
44 pages
Anova
No ratings yet
Anova
38 pages
Chapter 5 Analysis of Variance (ANOVA)
No ratings yet
Chapter 5 Analysis of Variance (ANOVA)
10 pages
Last Lecture 1
No ratings yet
Last Lecture 1
17 pages
T (Ea) For Two
No ratings yet
T (Ea) For Two
31 pages
Chapter 15 PDF
No ratings yet
Chapter 15 PDF
16 pages
Anova
No ratings yet
Anova
5 pages
Anova
No ratings yet
Anova
43 pages
Analysis of Variance - Cell Means Model
No ratings yet
Analysis of Variance - Cell Means Model
12 pages
Chapter 12 - One-Way Analysis of Variance - One-Way ANOVA
No ratings yet
Chapter 12 - One-Way Analysis of Variance - One-Way ANOVA
34 pages
BBADM 221 Unit 10 - With Notes
No ratings yet
BBADM 221 Unit 10 - With Notes
51 pages
Anova and Design of Experiments
No ratings yet
Anova and Design of Experiments
35 pages
Ermi Stat LL ch5
No ratings yet
Ermi Stat LL ch5
42 pages
Anova
No ratings yet
Anova
51 pages
Chi-Square Test Presentation
100% (1)
Chi-Square Test Presentation
28 pages
Topic 5 Analysis of Variance
No ratings yet
Topic 5 Analysis of Variance
31 pages
Spss Tutorials: One-Way Anova
No ratings yet
Spss Tutorials: One-Way Anova
12 pages
Session 15 - ANOVA
No ratings yet
Session 15 - ANOVA
39 pages
Trudy A. Watt, Robin H. McCleery, Tom Hart - Introduction To Statistics For Biology, Third Edition-Chapman and Hall - CRC (2007)
No ratings yet
Trudy A. Watt, Robin H. McCleery, Tom Hart - Introduction To Statistics For Biology, Third Edition-Chapman and Hall - CRC (2007)
287 pages
ch08 SamplingDist
No ratings yet
ch08 SamplingDist
43 pages
Chapter 11 - ANOVA 5
No ratings yet
Chapter 11 - ANOVA 5
36 pages
Lecture 7 ANOVA
No ratings yet
Lecture 7 ANOVA
30 pages
Multicollinearity Assignment April 5
100% (1)
Multicollinearity Assignment April 5
15 pages
SMuR Complete
No ratings yet
SMuR Complete
114 pages
Sta404 07
0% (1)
Sta404 07
71 pages
Unit 4 - Notes
No ratings yet
Unit 4 - Notes
14 pages
2 One-Way ANOVA
No ratings yet
2 One-Way ANOVA
59 pages
Statistics For Business: Analysis of Variance
No ratings yet
Statistics For Business: Analysis of Variance
51 pages
Edited Analysis of Variance - Final Anova
No ratings yet
Edited Analysis of Variance - Final Anova
58 pages
Certified Quality Engineer Problem Bank 3
100% (2)
Certified Quality Engineer Problem Bank 3
34 pages
Chapter 4 Hypotheses Testing of More Than Two Populations
No ratings yet
Chapter 4 Hypotheses Testing of More Than Two Populations
90 pages
One-Way ANOVA Test
No ratings yet
One-Way ANOVA Test
28 pages
F Distribution Table 0 05 PDF
No ratings yet
F Distribution Table 0 05 PDF
1 page
Anova
No ratings yet
Anova
46 pages
ANOVA
No ratings yet
ANOVA
36 pages
Azeezat Chapter 4
100% (1)
Azeezat Chapter 4
9 pages
Analysis of Variance: Testing Equality of Means Across Groups
No ratings yet
Analysis of Variance: Testing Equality of Means Across Groups
7 pages
ANOVA (Medtech Lecture)
No ratings yet
ANOVA (Medtech Lecture)
57 pages
Lecture 2
No ratings yet
Lecture 2
13 pages
Gipe 058050 PDF
No ratings yet
Gipe 058050 PDF
105 pages
Module 8 ANOVA or F Test
No ratings yet
Module 8 ANOVA or F Test
11 pages
The One-Way Analysis of Variance (ANOVA) Process For STAT 461 Students at Penn State University
No ratings yet
The One-Way Analysis of Variance (ANOVA) Process For STAT 461 Students at Penn State University
7 pages
ANOVA PPT Explained PDF
No ratings yet
ANOVA PPT Explained PDF
50 pages
Cfa
No ratings yet
Cfa
40 pages
ANOVA Reader
No ratings yet
ANOVA Reader
7 pages
Using ABAQUS For Reliability Analysis by Directional Simulation
No ratings yet
Using ABAQUS For Reliability Analysis by Directional Simulation
15 pages
SLS Corrected 1.4.16 PDF
No ratings yet
SLS Corrected 1.4.16 PDF
362 pages
Anova
No ratings yet
Anova
56 pages
Samenvatting Statistiek 10tm17
No ratings yet
Samenvatting Statistiek 10tm17
11 pages
Chapter 5, ANOVA
No ratings yet
Chapter 5, ANOVA
6 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
7 pages
Module 9
No ratings yet
Module 9
11 pages
Parasnis - 1951 - Study Rock Midlands
No ratings yet
Parasnis - 1951 - Study Rock Midlands
20 pages
14 Anova1
No ratings yet
14 Anova1
31 pages
Statistics FOR Management Assignment - 2: One Way ANOVA Test
No ratings yet
Statistics FOR Management Assignment - 2: One Way ANOVA Test
15 pages
CH V Anova
No ratings yet
CH V Anova
22 pages
Statistics and Probability
No ratings yet
Statistics and Probability
8 pages
PLU Quantitative Techniques 4
No ratings yet
PLU Quantitative Techniques 4
13 pages
Lecture Two 2019-2020
No ratings yet
Lecture Two 2019-2020
30 pages
Hypothesis Testing ANOVA
No ratings yet
Hypothesis Testing ANOVA
61 pages
Impact of Information Technology Infrastructure On Innovation Performance: An Empirical Study On Private Universities in Iraq
No ratings yet
Impact of Information Technology Infrastructure On Innovation Performance: An Empirical Study On Private Universities in Iraq
9 pages
Chapter 6 ANOVA (Analysis of Variance)
No ratings yet
Chapter 6 ANOVA (Analysis of Variance)
26 pages
Levine Smume7 Bonus Ch12
No ratings yet
Levine Smume7 Bonus Ch12
12 pages
Reassessment of Expectations As A Comparison Standard in Measuring Service Quality - Implications of Future Research
No ratings yet
Reassessment of Expectations As A Comparison Standard in Measuring Service Quality - Implications of Future Research
15 pages
Anova
No ratings yet
Anova
16 pages
Anova (Sta 305)
No ratings yet
Anova (Sta 305)
21 pages
Chi-Squared Tests
No ratings yet
Chi-Squared Tests
33 pages
ANOVA
No ratings yet
ANOVA
39 pages
Chi Square
No ratings yet
Chi Square
16 pages
BST 32202 Linear Regression 3 Anova One Way
No ratings yet
BST 32202 Linear Regression 3 Anova One Way
29 pages
The Robustness of Test Statistics To Nonnormality and Specification Error in Confirmatory Factor Analysis
No ratings yet
The Robustness of Test Statistics To Nonnormality and Specification Error in Confirmatory Factor Analysis
14 pages
Accident Data Analysis Using Statistical Methods
No ratings yet
Accident Data Analysis Using Statistical Methods
8 pages
Teachers Attitude Towards Using Songs in English Vocabulary 231aoz2de8
No ratings yet
Teachers Attitude Towards Using Songs in English Vocabulary 231aoz2de8
22 pages
Midterm Practice Solutions
No ratings yet
Midterm Practice Solutions
6 pages
A Gravity Model Analysis of International Migration To North America
No ratings yet
A Gravity Model Analysis of International Migration To North America
12 pages
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet