Chapter Seven: Multi-Sample Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Chapter Seven: Multi-Sample Methods


The independent samples t test and the independent samples Z test

for a difference between proportions are designed to analyze data
from research designs that employ two groups of subjects.
You will now study methods that can be used to analyze data from
two or more groups.
We will also consider a few multiple comparison procedures (MCPs).

7.1 Introduction 2/52

One-Way ANOVA: Hypotheses

You will recall that the null hypothesis tested by the independent
samples t test is H0 : µ1 = µ2 . This can be interpreted as asserting
that the treatments afforded two groups of subjects were equal in
their effect.
The null hypothesis tested by the One-Way ANOVA F test is

H0 : µ1 = µ2 = · · · = µk

Which can be interpreted as asserting that treatments afforded a

specified number of groups (k) of subjects were equal in their effect.

7.2 One-Way ANOVA F Test 3/52

Hypotheses (continued)

For a three group design the null hypothesis is

H0 : µ1 = µ2 = µ3
The alternative is any condition that makes the null hypothesis false
which for the three group design would be

1 µ1 = µ2 6= µ3
2 µ1 6= µ2 = µ3
3 µ1 = µ3 6= µ2
4 µ1 6= µ2 6= µ3

7.2 One-Way ANOVA F Test 4/52

Obtained F

The test statistic for the One-Way ANOVA F test is a ratio given by

F =

The numerator is termed the mean square between (MSb ) while

the denominator is termed the mean square within (MSw ).

7.2 One-Way ANOVA F Test 5/52

The Mean Square Within (MSw )

The mean square within (MSw ) is also a ratio and is defined as

MSw =
N −k

Here SSw is the sum of squares within, N is the total number of

observations, and k is the number of groups. The quantity N − k is
termed the denominator degrees of freedom. For example, if there
are three groups with five subjects in each, N = 15, k = 3 and the
denominator degrees of freedom is 15 − 3 = 12.

7.2 One-Way ANOVA F Test 6/52

The Sum of Squares Within (SSw )

The sum of squares within is the sum of the sums of squares for the
individual groups or

SSw = SS1 + SS2 + · · · + SSk

The sum of squares for a given group can be calculated by

SS = (x − x̄)2

or equivalently
( x)2
SS = x −

7.2 One-Way ANOVA F Test 7/52

The Sum of Squares Within (SSw ) (continued)

Thus, SSw can be calculated by

x2 )2 xk )2
hP P i hP P i hP P i
SSw = x12 − ( nx11 ) + x22 − (
n2 + ··· + xk2 − (

7.2 One-Way ANOVA F Test 8/52


The (fictitious) data in the accompanying table represents the weights

of subjects who have been engaged in three different dieting
regimens. Use these data to calculate MSw .

Diet Diet Diet

One Two Three

198 214 174

211 200 176
240 259 213
189 194 201
178 188 158

7.2 One-Way ANOVA F Test 9/52


The sums of squares for the three individual groups are as follows.

( x1 )2 (1016)2
SS1 = − x12 = 208730 − = 2278.8
n1 5
( x2 )2 (1055)2
SS2 = x2 − = 225857 − = 3252.0
n2 5
( x3 )2 (922)2
SS3 = x32 − = 171986 − = 1969.2
n3 5

7.2 One-Way ANOVA F Test 10/52

Solution (continued)

SSw is then by Equation 7.4

SSw = SS1 + SS2 + SS3 = 2278.8 + 3252.0 + 1969.2 = 7500.0

Then by Equation 7.3

SSw 7500.0
MSw = = = 625
N −k 15 − 3

7.2 One-Way ANOVA F Test 11/52

The Mean Square Between (MSb )

As with the mean square within, the mean square between is a ratio
of a sum of squares to a degrees of freedom. More precisely,

MSb =
k −1

where SSb is the sum of squares between and k is the number of

groups. The quantity k − 1 is termed the numerator degrees of
freedom. For example, if there are three groups the numerator
degrees of freedom is 3 − 1 = 2.

7.2 One-Way ANOVA F Test 12/52

The Sum of Squares Between SSb

Pn1 2 Pn2 2 Pnk 2

( All x.. )2
i=1 xi1 i=1 xi2 i=1 xik
SSb = + + ··· + −
n1 n2 nk N
The terms before the minus sign indicate that the observations in
each group are to be summed with the sum then being squared and
the result then being divided by the number of observations in the
group. This calculation is carried out for each group with the results
then being summed. The term after the minus sign indicates that all
observations are to be summed and the result squared. The division
of this term is by N which represents the total number of
observations—i.e., n1 + n2 + · · · + nk .

7.2 One-Way ANOVA F Test 13/52


Use the data in the table on slide #9 to calculate MSb . Then

calculate the One-Way ANOVA F statistic.

7.2 One-Way ANOVA F Test 14/52


By Equation 7.8
Pn1 2 Pn2 2 Pn3 2
( All x.. )2
i=1 xi1 i=1 xi2 i=1 xi3
SSb = + + −
n1 n2 n3 N
2 2 2
(1016) (1055) (922) (2993)2
= + + −
5 5 5 15
= 599073 − 597203.267
= 1869.73

7.2 One-Way ANOVA F Test 15/52

Solution (continued)

Dividing SSb by the numerator degrees of freedom of

k − 1 = 3 − 1 = 2 yields MSb = 1869.73
2 = 934.87.1
Using the mean square within calculation from slide #11
MSb 934.87
F = = = 1.50.
MSw 625.00
Thus, obtained F for the test of significance is 1.40.

The result of 934.88 provided on page 269 of the text is based on a different
calculation where rounding was a bit differently.
7.2 One-Way ANOVA F Test 16/52
The Test of Significance

The test of significance is conducted by comparing obtained F to

critical F . If the former is greater than or equal to the latter, the null
hypothesis is rejected. Otherwise, the null hypothesis is not rejected.
Critical F is obtained by first noting that the numerator degrees of
freedom for the analysis are k − 1 = 3 − 1 = 2 and the denominator
degrees of freedom are N − k = 15 − 3 = 12. To use Appendix C, the
numerator degrees of freedom are located across the top of the table
and the denominator degrees of freedom down the side. For α = .05
with 2 and 12 degrees of freedom, Appendix C shows that critical F is
3.89. In this case 1.50 is not greater than or equal to 3.89 so the null
hypothesis is not rejected.

7.2 One-Way ANOVA F Test 17/52


Suppose a study is conducted with treatments being administered to

four independent groups of subjects. Suppose further that n1 = 5,
n2 = 7, n3 = 8 and n4 = 4. Obtained F is calculated to be 4.19.
Use this information to conduct a One-Way ANOVA F test at
α = .05. What is the null hypothesis being tested? What is your
decision regarding the null hypothesis?

7.2 One-Way ANOVA F Test 18/52


The numerator degrees of freedom are k − 1 = 4 − 1 = 3.

N = 5 + 7 + 8 + 4 = 24 so that the denominator degrees of freedom
are N − k = 24 − 4 = 20.
Reference to Appendix C give critical F as 2.38.
Because obtained F of 4.19 is greater than critical F of 2.38, the null
hypothesis H0 : µ1 = µ2 = µ3 = µ4 is rejected.

7.2 One-Way ANOVA F Test 19/52

The ANOVA Table

The results of a One-Way ANOVA analysis are traditionally reported in a

table similar to the one shown here.
Source of Sum of Mean F Critical
Variation Squares df Squares Ratio F p -value

Between SSb k−1 SSb /k−1 MSb /MSw (table) (computer)

Within SSw N−k SSw /N−k

Total SSt N−1

7.2 One-Way ANOVA F Test 20/52


The assumptions underlying the ANOVA F test are the same as those
underlying the independent samples t test, namely
1 Population normality
2 Homogeneous variances
3 Independence of observations

7.2 One-Way ANOVA F Test 21/52

The 2 By k Chi-Square Test: Hypotheses

In Chapter 6 on page 230 you learned to test the null hypothesis

H0 : π1 = π2 by means of an independent samples Z test.
The 2 by k chi-square test extends this concept to test for equality of
any number of proportions.
This null hypothesis is stated as

H 0 : π 1 = π 2 = · · · = πk

which asserts that all population proportions are equal.

The notation indicates that the equality extends to any number of
groups with the last group characterized as group k.

7.3 The 2 By k Chi-Square Test 22/52

The Alternative Hypothesis

The alternative hypothesis is any condition that renders the null

hypothesis false. Thus, given three groups, any of the following
conditions, baring a Type II error, would cause rejection of the null
1 π1 = π2 6= π3
2 π1 6= π2 = π3
3 π1 = π3 6= π2
4 π1 6= π2 6= π3
When the null hypothesis is rejected, there is no way to know which
of the four conditions listed above caused the rejection.

7.3 The 2 By k Chi-Square Test 23/52

Obtained χ2

As with other statistics with which you are now familiar, the
hypothesis test is carried out by calculating an obtained value with a
subsequent comparison to a critical value.
For the chi-square test the obtained value is calculated by
" #
X (fo − fe )2
χ =
all cells

where fo and fe are referred to respectively as the observed and

expected frequencies. The observed frequency is simply the number
of outcomes occurring in the given cell as shown in the table on the
next slide (slide #25).

7.3 The 2 By k Chi-Square Test 24/52

2 by 3 Chi-Square Table

In this table we have used double subscripts to indicate the row and
column of each cell entry.
Group Group Group
One Two Three
fo11 fo12 fo13
Outcome 1
fe11 fe12 fe13
fo21 fo22 fo23
Outcome 2
fe21 fe22 fe23

7.3 The 2 By k Chi-Square Test 25/52

Observed (f0 ) and expected (fe ) frequencies

The observed frequency is simply the number of outcomes falling

into a given cell of the chi-square table.
The expected frequency represents the expected number of
outcomes to be found in each cell if the null hypothesis is true.
The expected frequency is calculated as follows.

(NR ) (NC )
fe =

where NR is the row total for the cell whose expected frequency is
being calculated and NC is the column total for the same cell.

7.3 The 2 By k Chi-Square Test 26/52


Suppose that in the treatment of a terminal illness, the following

results are obtained. Of the patients receiving treatment one, 17 are
dead at the end of five years while 52 are still alive. For treatment
two, 29 are dead while 54 remain alive and for treatment three 11 are
dead and 26 remain alive. Use these data to construct a chi-square
table then test the hypothesis H0 : π1 = π2 = π3 .

7.3 The 2 By k Chi-Square Test 27/52

We begin by placing the observed frequency of each cell into a
chi-square table as shown on the next slide (#29). We then calculate
the expected frequency for each cell as follows.

(ND ) NG1 (57) (69)
fe11 = = = 20.81
N 189
(ND ) (NG 2 ) (57) (83)
fe12 = = = 25.0
N 189
(ND ) (NG 3 ) (57) (37)
fe13 = = = 11.16
N 189
(NA ) (NG 1 ) (132) (69)
fe21 = = = 48.19
N 189
(NA ) (NG 2 ) (132) (83)
fe22 = = = 57.97
N 189
(NA ) (NG 3 ) (132) (37)
fe23 = = = 25.84
N 189

7.3 The 2 By k Chi-Square Test 28/52

Solution (continued)

Group Group Group

One Two Three
[17] [29] [11] 57
(20.81) (25.03) (11.16)
[52] [54] [26] 132
(48.19) (57.97) (25.84)

7.3 The 2 By k Chi-Square Test 29/52

Solution (continued)

Obtained chi-square is then

" #
X (fo − fe )2
χ =
all cells
(17-20.81)2 (29-25.03)2 (11-11.16)2 (52-48.19)2
= + + +
20.81 25.03 11.16 48.19
(54-57.97)2 (26-25.84)2
+ +
57.97 25.84
= .70 + .63 + .00 + .30 + .27 + .00
= 1.9

7.3 The 2 By k Chi-Square Test 30/52

Solution (continued)

The critical value is obtained by entering Appendix D with k − 1

degrees of freedom where k is the number of groups. For α = .05 and
3 − 1 = 2 degrees of freedom, critical χ2 is 5.991.
The null hypothesis is rejected when obtained chi-square is greater
than or equal to critical chi-square. Because 1.9 is less than 5.991,
the null hypothesis is not rejected.
We conclude, therefore, that a difference between population
proportions cannot be demonstrated. In research terms, we conclude
that we could not show a difference in the effectiveness of the three

7.3 The 2 By k Chi-Square Test 31/52

Multiple Comparison Procedures: Introduction

You have learned that rejecting a true null hypothesis when

conducting a significance test results in a Type I error and that the
probability of such is α. For the purposes that follow we will term this
type of rejection as a Per Comparison Error (PCE) and will
symbolize the probability of such as αPCE .
A Familywise Error (FWE) occurs when one or more true null
hypotheses are rejected in a series of tests. The probability of such is
symbolized αFWE .

7.4 Multiple Comparison Procedures 32/52

Introduction (continued)

Familywise errors occur in two broad contexts.

1 Multiple comparison analysis refers to the situation where multiple
groups are being compared on a single outcome variable.
2 Multiple endpoint analysis refers to the situation where two groups
are being compared on multiple outcome measures.

7.4 Multiple Comparison Procedures 33/52

Determinants of Familywise Error

The following observations are demonstrated in the table on the following

slide (#35).
Other factors remaining fixed, as the number of comparisons (i.e.
significance tests) increases, αFWE increases.
Other factors remaining fixed, as αPCE decreases (increases), αFWE
decreases (increases).

7.4 Multiple Comparison Procedures 34/52

Relationship Between αPCE and αFWE

Number of Number of
αPCE Groups Comparisons αFWE
.05 3 3 .122
5 10 .286
10 45 .630
20 190 .920

.01 3 3 .027
5 10 .075
10 45 .231
20 190 .528

7.4 Multiple Comparison Procedures 35/52

Controlling Familywise Errors

When you reject a single null hypothesis the interpretation is clear.

You have an αPCE probability that you did so incorrectly.
When you perform a series of tests and reject one or more null
hypotheses, the interpretation is not so clear. Did you reject these
hypotheses because they are false or because the familywise Type I
error rate is so high that rejections were highly likely even in the face
of true null hypotheses?
You were confident in your result for the single test because you were
able to control the probability of a false rejection at αPCE . You could
gain this same confidence in your results for multiple tests if you
could control αFWE to some specified level—.05 for example.

7.4 Multiple Comparison Procedures 36/52

The Bonferroni Method Of Controlling Familywise Error

As shown in the table on slide #35, αFWE can be reduced by

reducing αPCE . But suppose you wish to establish αFWE at some
specified value—for example .05.
How low must you set αPCE in order to have αFWE be .05?
One of the oldest, simplest, and most widely used methods for finding
this level is known as the Bonferroni adjustment.

7.4 Multiple Comparison Procedures 37/52

The Bonferroni Method (continued)

The adjustment is made by the following equation.

αPCE =
where NT represents the number of tests to be performed.
Thus, for example, if we wish to control αFWE at .05 while we
perform three tests, each test would be carried out at the .05
3 = .017
level of significance.

7.4 Multiple Comparison Procedures 38/52

The Step-Down Bonferroni Method

In 1979, Holm proposed a modification to the Bonferroni procedure

that is usually more powerful than, is never less powerful than, and
maintains familywise error at the same level as, the classical
This modified Bonferroni, or more properly, step-down Bonferroni
procedure is illustrated on slide #41 and is carried out as follows.

7.4 Multiple Comparison Procedures 39/52

The Step-Down Bonferroni Method (continued)
1 The multiple test statistics are calculated.
2 The p-value for each statistic calculated in 1 is obtained.
3 The p-values are ordered from smallest to largest with the smallest
being designated p(1) , the second smallest p(2) and so forth with the
largest being p(NT ) where NT is the number of tests.
4 At the first step, p(1) is compared to αNT FWE
. If p(1) ≤ αNT
, the test is
declared significant and the second step is carried out. If p(1) > αNT FWE
the test is declared nonsignificant and testing ceases with all
remaining comparisons being declared not significant.
5 If the first step is significant, step two is carried out by comparing p(2)
with NT −1 . If p(2) ≤ NT −1 , the result is declared significant and
testing continues to the next step. Otherwise, the test is declared
nonsignificant and testing ceases with all remaining tests being
declared nonsignificant.
6 The steps are continued as shown in the figure on slide #41 until a
nonsignificant result is obtained or until the last step is completed.
7.4 Multiple Comparison Procedures 40/52
The Step-Down Bonferroni Method (continued)

Figure: An illustration of the step-down Bonferroni multiple comparison

Step Step Step ... Step
One Two Three NT
P-value P (1) P (2) P (3) ... P (NT)

Step-down FWE FWE FWE ... FWE

NT NT-1 NT-2 1

Classical FWE FWE FWE ... FWE


7.4 Multiple Comparison Procedures 41/52


A researcher involved in a study employing multiple groups of subjects

wishes to test a series of null hypotheses by means of independent
samples t tests. The null hypotheses with accompanying p-values
associated with each test are given below. Use these results to
perform a step-down Bonferroni procedure with αFWE not to exceed
.05. How do these results compare to results that would be obtained
from classical Bonferroni tests?
H0 : p-value
µ1 = µ3 .0111
µ2 = µ4 .0419
µ2 = µ5 .0090
µ3 = µ4 .0200
µ4 = µ5 .0181

7.4 Multiple Comparison Procedures 42/52


The five p values, along with the hypothesis test from which each was
derived, are listed in ascending order below. Also shown are the
step-down values of αPCE (S-D) and the classical Bonferroni values of
αPCE (CB) for each test of significance.
As may be seen the tests of µ2 − µ5 and µ1 − µ3 are significant while
the remaining tests are not.
It is important to understand that the tests of µ3 − µ4 and µ2 − µ4
are automatically declared nonsignificant at this point due to the
stopping rule.
Notice that had the researcher employed the classical Bonferroni
method, which unfortunately is still common practice, only µ2 − µ5
would have been significant.

7.4 Multiple Comparison Procedures 43/52

Solution (continued)

Test µ2 − µ5 µ1 − µ3 µ4 − µ5 µ3 − µ4 µ2 − µ4
p-value .0090 .0111 .0181 .0200 .0419
S-D αPCE .0100 .0125 .0167 .0250 .0500
CB αPCE .0100 .0100 .0100 .0100 .0100


7.4 Multiple Comparison Procedures 44/52

Tukey’s HSD Method

Tukey’s HSD (Honestly Significant Difference) test is designed for use

in multiple comparison settings where all pairwise comparisons of
group means are to be carried out.
These tests are conducted by computing the test statistic, commonly
symbolized as q, for each of the k(k−1)
2 comparisons with the
resultant q statistics then being referenced to an appropriate table of
critical values.

7.4 Multiple Comparison Procedures 45/52

Tukey’s HSD Method (continued)

The test statistic is defined as follows.

x̄i − x̄j
qij = q

The subscripts i and j denote the two groups being compared so that
x̄i and x̄j are the means of groups i and j respectively. MSw is the
mean square within as computed for a one-way ANOVA via Equations
7.3 and 7.4.

7.4 Multiple Comparison Procedures 46/52

Tukey’s HSD Method (continued)

The symbol nh represents the harmonic mean of the two sample sizes
and is computed as
nh = 1 1
ni + nj

When ni = nj , nh = n which is the sample size of either group.

7.4 Multiple Comparison Procedures 47/52


Use the data from the dieting study depicted on slide #9 to perform
Tukey’s HSD test. Begin by stating the null hypotheses to be tested,
then perform the tests and finally, state you conclusions. Maintain
αFWE at .05.

7.4 Multiple Comparison Procedures 48/52


Because there are three groups and we wish to make all pairwise
comparisons, we will have 3(2)
2 = 3 hypotheses to test. They are

H0 : µ1 = µ2
H0 : µ1 = µ3
H0 : µ2 = µ3

7.4 Multiple Comparison Procedures 49/52

Solution (continued)

The means of the three groups are as follows.

x̄1 = 203.2
x̄2 = 211.0
x̄3 = 184.4

Previous calculations (see slides #10 and 11) obtained when

performing a one-way ANOVA on these data provide the following.

MSw = 625

Because sample sizes are the same for all groups, nh will be
2 2
nh = 1 1
= 1 1
ni + nj 5 + 5

for all comparisons.

7.4 Multiple Comparison Procedures 50/52
Solution (continued)

The test statistics for the three comparisons are by Equation 7.13
x̄1 − x̄2 203.2 − 211.0
q12 = q = q = −.698
MSw 625
nh 5
x̄1 − x̄3 203.2 − 184.4
q13 = q = q = 1.682
MSw 625
nh 5
x̄2 − x̄3 211.0 − 184.4
q23 = q = q = 2.379
MSw 625
nh 5

7.4 Multiple Comparison Procedures 51/52

Solution (continued)

Critical values of q are obtained from Appendix E. The table is

entered with the number of means in the analysis and the appropriate
degrees of freedom which are N − k.
Referencing Appendix E for 3 means and 12 degrees of freedom yields
a critical value of 3.773.
As may be seen, none of the hypotheses are rejected so that no
differences between group means can be demonstrated.

7.4 Multiple Comparison Procedures 52/52

You might also like