0% found this document useful (0 votes)
134 views197 pages

Unit 5 Mba 1ST

The document provides an overview of key concepts in hypothesis testing and analysis of variance (ANOVA). It discusses the goals of hypothesis testing as examining opposing hypotheses (H0 and HA) through collecting sample evidence. The null hypothesis states the assumption to be tested, while the alternative hypothesis challenges the status quo. One-sample and two-sample tests are introduced, along with concepts like p-values, type I and type II errors, and parametric vs. non-parametric tests. ANOVA is introduced as a framework to study the effect of qualitative variables on a quantitative outcome, with examples given of one-way ANOVA and partitioning total variation.

Uploaded by

saritalodhi636
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views197 pages

Unit 5 Mba 1ST

The document provides an overview of key concepts in hypothesis testing and analysis of variance (ANOVA). It discusses the goals of hypothesis testing as examining opposing hypotheses (H0 and HA) through collecting sample evidence. The null hypothesis states the assumption to be tested, while the alternative hypothesis challenges the status quo. One-sample and two-sample tests are introduced, along with concepts like p-values, type I and type II errors, and parametric vs. non-parametric tests. ANOVA is introduced as a framework to study the effect of qualitative variables on a quantitative outcome, with examples given of one-way ANOVA and partitioning total variation.

Uploaded by

saritalodhi636
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 197

HypothesisTesting

and ANOVA
Goals

• Overview of key elements of hypothesis testing

• Review of common one and two sample tests

• Introduction to ANOVA
Hypothesis Testing

• The intent of hypothesis testing is formally examine two


opposing conjectures (hypotheses), H0 and HA

• These two hypotheses are mutually exclusive and


exhaustive so that one is true to the exclusion of the
other

• We accumulate evidence - collect and analyze sample


information - for the purpose of determining which of
the two hypotheses is true and which of the two
hypotheses is false
The Null and Alternative Hypothesis
The null hypothesis, H0:
• States the assumption (numerical) to be tested
• Begin with the assumption that the null hypothesis is TRUE
• Always contains the ‘=’ sign

The alternative hypothesis, Ha:


• Is the opposite of the null hypothesis
• Challenges the status quo
• Never contains just the ‘=’ sign
• Is generally the hypothesis that is believed to be true by
the researcher
One and Two Sided Tests
• Hypothesis tests can be one or two sided (tailed)

• One tailed tests are directional:


H0: 1 - 2  0

HA: 1 - 2 > 0

• Two tailed tests are not directional:


H0: 1 - 2 = 0

HA: 1 - 2  0
P-values
• Calculate a test statistic in the sample data that is
relevant to the hypothesis being tested

• After calculating a test statistic we convert this to a P-


value by comparing its value to distribution of test
statistic’s under the null hypothesis

• Measure of how likely the test statistic value is under


the null hypothesis

P-value    Reject H0 at level 


P-value >   Do not reject H0 at level 
When To Reject H0
Level of significance, : Specified before an experiment to define
rejection region
Rejection region: set of all test statistic values for which H0 will be
rejected

One Sided Two Sided


 = 0.05  = 0.05

Critical Value = -1.64 Critical Values = -1.96 and +1.96


Some Notation
• In general, critical values for an  level test denoted as:

One sided test : X


Two sided test : X/2

where X depends on the distribution of the test statistic

• For example, if X ~ N(0,1):


One sided test : z (i.e., z0.05  1.64)
Two sided test : z/2 (i.e., z0.05 / 2  z0.025   1.96)
Errors in Hypothesis Testing

Actual Situation “Truth”

Decision H0 True H0 False

Do Not
Reject H0

Rejct H0
Errors in Hypothesis Testing

Actual Situation “Truth”

Decision H0 True H0 False

Do Not Correct Decision Incorrect Decision


Reject H0 1 -  

Rejct H Incorrect Decision Correct Decision


0
 1 - 
Type I and II Errors

Actual Situation “Truth”

Decision H0 True H0 False


Incorrect Decision
Do Not Correct Decision
Type II Error
Reject H0 1 - 

Incorrect Decision Correct Decision
Rejct H0 Type I Error
1 - 


  P(Type I Error)   P(Type II Error)

Power = 1 - 
Parametric and Non-Parametric Tests

• Parametric Tests: Relies on theoretical distributions of


the test statistic under the null hypothesis and assumptions
about the distribution of the sample data (i.e., normality)

• Non-Parametric Tests: Referred to as “Distribution


Free” as they do not assume that data are drawn from any
particular distribution
Whirlwind Tour of One and Two Sample Tests

Type of Data
Goal Gaussian Non-Gaussian Binomial
Compare one
group to a One sample
Wilcoxon Test Binomial Test
hypothetical t-test
value

Compare two
Paired t-test Wilcoxon Test McNemar’s Test
paired groups

Compare two
Two sample Wilcoxon-Mann- Chi-Square or
unpaired
t-test Whitney Test Fisher’s Exact Test
groups
General Form of a t-test

One Sample Two Sample

x   x  y  (1  2 )
Statistic T T
1 1
s n 

sp
m n

df t,n1 t,m  n 2
Non-Parametric Alternatives

• Wilcoxon Test: non-parametric analog of one sample t-


test

• Wilcoxon-Mann-Whitney test: non-parametric analog


of two sample t-test
Hypothesis Tests of a Proportion

• Large sample test (prop.test)


p̂  p0
z
p0(1 p0) / n

• Small sample test (binom.test)

- Calculated directly from binomial distribution


Confidence Intervals

• Confidence interval: an interval of plausible values for


the parameter being estimated, where degree of plausibility
specifided by a “confidence level”

• General form:

xˆ  critical value  se
1 1
x - y  t ,m  n 2  sp 
m n
Interpreting a 95% CI
• We calculate a 95% CI for a hypothetical sample mean to be
between 20.6 and 35.4. Does this mean there is a 95%
probability the true population mean is between 20.6 and 35.4?

• NO! Correct interpretation relies on the long-rang frequency


interpretation of probability



• Why is this so?


Hypothesis Tests of 3 or More Means
• Suppose we measure a quantitative trait in a group of N
individuals and also genotype a SNP in our favorite
candidate gene. We then divide these N individuals into
the three genotype categories to test whether the
average trait value differs among genotypes.

• What statistical framework is appropriate here?

• Why not perform all pair-wise t-tests?


Basic Framework of ANOVA
• Want to study the effect of one or more
qualitative variables on a quantitative
outcome variable

• Qualitative variables are referred to as factors


(i.e., SNP)

• Characteristics that differentiates factors are


referred to as levels (i.e., three genotypes of a
SNP
One-Way ANOVA
• Simplest case is for One-Way (Single Factor) ANOVA

 The outcome variable is the variable you’re comparing

 The factor variable is the categorical variable being used to


define the groups
- We will assume k samples (groups)

 The one-way is because each value is classified in exactly one


way

• ANOVA easily generalizes to more factors


Assumptions of ANOVA

• Independence

• Normality

• Homogeneity of variances (aka,


Homoscedasticity)
One-Way ANOVA: Null Hypothesis

• The null hypothesis is that the means are all equal


H 0 : 1  2  ...  k
• The alternative hypothesis is that at least one of
the means is different
– Think about the Sesame Street® game where three of
these things are kind of the same, but one of these
things is not like the other. They don’t all have to be
different, just one of them.
Motivating ANOVA

• A random sample of some quantitative trait


was measured in individuals randomly sampled
from population

• Genotyping of a single SNP


– AA: 82, 83, 97
– AG: 83, 78, 68
– GG: 38, 59, 55
Rational of ANOVA

• Basic idea is to partition total variation of the


data into two sources

1. Variation within levels (groups)

2. Variation between levels (groups)

• If H0 is true the standardized variances are equal


to one another
The Details
Our Data:
AA: 82, 83, 97 x1.  (82  83  97) / 3  87.3

AG: 83, 78, 68 x2.  (83  78  68) / 3  76.3

GG: 38, 59, 55 x3.  (38  59  55) / 3  50.6

• Let Xij denote the data from the ith level and jth observation

• Overall, or grand mean, is:


K J
x
x..
 
i1 j1 N
ij

82  83  97  83  78  68  38  59  55
x..   71.4
9
Partitioning Total Variation
• Recall, variation is simply average squared deviations from the mean

SST = SSTG + SSTE


K J K K J

(x ij  x.. )
2
 n (x
i i.  x.. )
2
 (x ij  x i. )
2

i1 j1 i1 i1 j1

Sum of squared
Sum of squared Sum of squared deviations for all
deviations about the deviations for each observations within
grand mean across all group mean about each group from that
N observations the grand mean group mean, summed
across all groups
In Our Example

SST = SSTG + SSTE


K J K K J

(x ij  x.. )
2
 n  (xi i.  x.. )
2
(x ij  x i. )2
i1 j1 i1 j1
i1

(82  71.4)2  (83  71.4)2  (97  71.4)2  3(87.3  71.4)2  (82  87.3)2  (83  87.3)2  (97  87.3)2 
(83  71.4)2  (78  71.4)2  (68  71.4)2  3(76.3  71.4)2  (83  76.3)2  (78  76.3)2  (68  76.3)2 
(38  71.4)2  (59  71.4)2  (55  71.4)2  3(50.6  71.4)2  (38  50.6)2  (59  50.6)2  (55  50.6)2 

2630.2 2124.2 506


In Our Example

SST = SSTG + SSTE


K J K K J

(x ij  x.. )
2
 n  (x
i i.  x .. )
2
(x ij  x i. )2
i1 j1 i1 i1 j1

x1.

x2.
x..

x3.
Calculating Mean Squares

• To make the sum of squares comparable, we divide each one by


their associated degrees of freedom
• SSTG = k - 1 (3 - 1 = 2)
• SSTE = N - k (9 - 3 = 6)
• SSTT = N - 1 (9 - 1 = 8)

• MSTG = 2124.2 / 2 = 1062.1

• MSTE = 506 / 6 = 84.3


Almost There… Calculating F Statistic
• The test statistic is the ratio of group and error mean squares

MSTG 1062.2
F   12.59
MSTE 84.3

• If H0 is true MSTG and MSTE are equal

• Critical value for rejection region is F, k-1, N-k

• If we define  = 0.05, then F0.05, 2, 6 = 5.14


ANOVA Table

Source of df Sum of MS F
Variation Squares
SSTG
SSTG k 1
Group k-1 SSTG k 1 SSTE
Nk

SSTE
Error N-k SSTE Nk

Total N-1 SST


Non-Parametric Alternative

• Kruskal-Wallis Rank Sum Test: non-parametric analog to ANOVA

• In R, kruskal.test()
T - Test
Tests of Hypotheses on Population Means

There are two general methods used to make a “ good guess” as to the true
value of 0
The first involves determine a confidence interval on (CI).
The second  concerned with making a guess as to the value of  and
then testing to see if such a guess is compatible with observed data . This
is method is called hypothesis testing.

 Null Hypothesis is denoted by 𝐻0 If a population mean is equal to


hypothesized mean then Null Hypothesis can be written as 𝐻0: 𝜇 = 𝜇0 .
 The Alternative hypothesis is negation of null hypothesis and is denoted
by 𝐻1 If Null is given as 𝐻0 : 𝜇 = 𝜇0 Then alternative Hypothesis can be
written as H1 : 𝜇  𝜇0
 Significance means the percentage risk to reject a null hypothesis when it
is true and it is denoted by 𝛼 generally taken as 1%, 5%, 10% , (1 − 𝛼) is
the confidence interval in which the null hypothesis will exist when it is
true.

1
Two tailed test 5% Significance level

One left tailed test 5% Significance level

One Right tailed test 5% Significance level

2
 Decide a test statistics; z-test , t- test , F-test. Calculate the value of test
statistics.
 Calculate the T critical at given Significance level from the table Compare
the T critical value with |TStatistic| calculated value, If T critical > calculated
value |T statistical | Accept H0 or If T critical < |T statistical |
calculated value Rejected H0

State the null State a significance Decide a test Calculate the value
(Ho)and alternate level;1%, 5%, 10% statistics; z-test , t- of test statistics
(Ha) Hypothesis test , F-test.

Calculate the T critical > Accept H0


Tcritical at given Compare calculated
Significance the T critical value value |T statistical|
level from the with calculated
table value

T critical < Reject H0


calculated
value |T statistical|

3
Table of TCritical values

4
Types of t-tests
A t-test is a hypothesis test of the mean of one or two normally distributed
populations. Several types of t-tests exist for different situations, but they all
use a test statistic that follows a t-distribution under the null hypothesis:

Test Purpose Example

1 sample t-test Tests whether the mean of a single Is the mean height of female
population is equal to a target value college students greater than
5.5 feet?

2 sample t-test Tests whether the difference Does the mean height of
between the means of two female college students
independent populations is equal to significantly differ from the
a target value mean height of male college
students?

3-paired t-test Tests whether the mean of the If you measure the weight of
differences between dependent or male college students before
paired observations is equal to a and after each subject takes a
target value weight-loss pill, is the mean
weight loss significant
enough to conclude that the
pill works?

One Sample T-Test for testing means


Test Condition
normal,

unknown
1 may be one-sided or two sided

The hypotheses are:


Null hypothesis

5
H0: μ = µ0 The population mean (μ) equals the hypothesized mean
(µ0).
Alternative hypothesis
H1: μ ≠ µ0 The population mean (μ) differs from the hypothesized
mean (µ0).

Test Statistics
𝒙 − 𝝁
𝒕= 𝒔
√𝒏
with d.f ( degree of freedom ) = n – 1

∑(𝒙𝒊−𝒙)𝟐
where 𝒔=√
(𝒏−𝟏)
In the formula that follows, we use a new symbol (  ) to indicate the
population standard value, and s the standard deviation , x= mean of the
sample

Example
The following data represents hemoglobin values in gm/dl for 10 patients:
10.5.

10.5 9 6.5 8 11 7 7.5 8.5 9.5 12

Is the mean value for patients significantly differ from the mean value of
general population (12 gm/dl). Evaluate the role of chance.(  = 0.05 )
Solution
Mention all steps of testing hypothesis.
First, we must compute the mean (or average) of this sample:

∑ 𝑥 10.5 + 9 + 6.5 + 8 + 11 + 7 + 7.5 + 8.5 + 9.5 + 12


𝑥 = =
10 = 8.95
𝑛
In the above example, there is some new mathematical notation.
x x (x -x)2
10.5 8.95 2.4025
9 8.95 0.0025

6
6.5 8.95 6.0025
8 8.95 0.9025
11 8.95 4.2025
7 8.95 3.8025
7.5 8.95 2.1025
8.5 8.95 0.2025
9.5 8.95 0.3025
12 8.95 9.3025
x= 89.5/10 (x -x)2= 722.4025

∑(𝒙𝒊−𝒙)𝟐 𝟕𝟐𝟐.𝟒𝟎𝟐𝟓
𝒔=√ = √ =1.802005
(𝒏−𝟏) 𝟗

𝒙−𝝁 𝟖.𝟗𝟓−𝟏𝟐
𝒕= 𝒔 = 𝟏.𝟖𝟎𝟐𝟎𝟎𝟐 = -5.35234
√𝒏 √𝟏𝟎
Then compare with tabulated value, for 9 df, and 5% level of significance. It
is = 2.262, the calculated value>tabulated value. Reject Ho and conclude that
there is a statistically significant difference between the mean of sample and
population mean, and this difference is unlikely due to chance.

Hypothesis t- tests on the difference between to population means (1 - 2)


Two types of t – tests for testing significance of difference between means
will be present:
The pooled t- test and paired t – test. The distinction between these two lie in
the method by which the samples are drawn.
The pooled t – test
The characteristic feature of the pooled is that the individual samples represent
independent random samples from their respective populations. For example,
in testing the effects of a new drug, an investigator may assign individuals at
random to the treatment group and to the control group. Observations made on
individual in the treatment group are independent of those made individuals in
the control group .For pooled case, the number of individuals in the two samples
need not be the same . The value for t is given by
(𝑥1 − 𝑥2)
𝑡=
1 1
𝑆𝑝 √ +
𝑛1 𝑛2

7
Where Sp is called the pooled standard deviation , and is given by
(𝑛1 − 1)𝑠2 + (𝑛2 − 1)𝑠2
1 2
𝑠𝑝 = √
𝑛1 + 𝑛2 − 2

(𝑥1 − 𝑥2)
𝑡=
(𝑛1 − 1)𝑠2 + (𝑛2 − 1)𝑠2 1 1
√ 1 2
𝑛1 + 𝑛2 − 2 √ +𝑛
𝑛1 2

d.f = n1 + n2 - 2
Example
The following data represents weight in Kg for 10 males and 12 females.
Male
80 75 95 55 60 70 75 72 80 65

Females:
60 70 50 85 45 60 80 65 70 62 77 82

2. Two independent samples, cont. Is there a statistically significant


difference between, the mean weight of males and females. Let alpha =
0.01
To solve it follow the steps and use this equation :

(𝑥1 − 𝑥2)
𝑡=
(𝑛1 − 1)𝑠2 + (𝑛2 − 1)𝑠2 1 1
√ 1 2
𝑛1 + 𝑛2 − 2 √ +𝑛
𝑛1 2

x (x- x )^2 x (x- x)^2


80 72.7 53.29 60 67.16667 51.36111
75 72.7 5.29 70 67.16667 8.027778
95 72.7 497.29 50 67.16667 294.6944
55 72.7 313.29 85 67.16667 318.0278

8
60 72.7 161.29 45 67.16667 491.3611
70 72.7 7.29 80 67.16667 164.6944
75 72.7 5.29 65 67.16667 4.694444
72 72.7 0.49 60 67.16667 51.36111
80 72.7 53.29 70 67.16667 8.027778
65 72.7 59.29 62 67.16667 26.69444
77 67.16667 96.69444
82 67.16667 220.0278
Mean1=72.7 ∑(𝑥−𝑥)2 Mean2= ∑(𝑥−𝑥)2
s^2 = 67.16667
s^2 =
𝑛−1 𝑛−1

128.4556 157.7879

(72.7 − 67.166)
𝑡=
√(12 − 1)157.7879 + (10 − 1)128.4556 √ 1 + 1
12 + 10 − 2 12 10

t = 1.074
The tabulated t, 2 sides, for alpha 0.01 is 2.845 .Then accept Ho and conclude
that there is no significant difference between the 2 means. This difference
may be due to chance.
Note: - To calculated t- test using excel when α =0.05 t-
Test: Two-Sample Independent
t-Test: Two-Sample Assuming
Equal Variances

Variable 1 Variable 2
Mean 72.7 67.16666667
Variance 128.4555556 157.7878788
Observations 10 12
Pooled Variance 144.5883333
Hypothesized Mean Difference 0
df 20
t Stat 1.074730292
P(T<=t) one-tail 0.147645482
t Critical one-tail 1.724718243
P(T<=t) two-tail 0.295290964
t Critical two-tail 2.085963447

9
Decision:
We do a two-tail test. IF t Stat < t Critical, we accepted the null
hypothesis. In the case 1.07473 < 2.0859. Therefore, we do accepted
the null hypothesis; that means 0 = µ1. We can make decision
depended on the values of α and P value .IF α < P value, we accepted the
null hypothesis. In the case 0.05 < 0.2952. Therefore, we do accepted
the null hypothesis; that means 0 = µ1
The paired t – test
For the paired case, pairs are randomly selected from a single population. Each
member of a pair is randomly assigned to one of the two treatments. The null
hypothesis is that the mean different among pairs is zero. Example of pairing
observation is the before and after measurements on the same individuals.

Formula:
̅
𝑫
𝒕𝒙𝑫 =
𝑺𝑬𝒅𝒊𝒇𝒇
t is the difference in means over a standard error
𝑆𝐷𝐷
𝑆𝐸𝑑𝑖𝑓𝑓 =
√𝑛𝑝𝑎𝑖𝑟𝑠

2 (∑ 𝑑) 2
√ ∑ 𝑑 − 𝑛
where 𝑆𝐷𝐷 =
𝑛−1

The standard error is found by finding the difference between each pair of
observations. The standard deviation of these differences is SD D. Divide
SDD by sqrt(number of pairs) to get SEdiff.
̅
𝑫
𝒕𝒙𝑫 =
𝑆𝐷𝐷
√𝑛𝑝𝑎𝑖𝑟𝑠
where d.f = n – 1

10
Example Blood pressure of 8 patients, before & after treatment
Bp (before ) BP( after ) d d2
180 140 40 1600
200 145 55 3025
230 150 80 6400
240 155 85 7225
170 120 50 2500
190 130 60 3600
200 140 60 3600
165 130 35 1225
465
d = 465, Mean = = 58.123 d2=
d
8
29175

Results and conclusion


 t=9.387
 Tabulated t (df -7), with level of significance
 0.05, two tails, = 2.36
We reject Ho and conclude that there is significant difference between BP
readings before and after treatment.
 P<0.05.

Note: - To calculated t- test using excel when α =0.05


T-Test: Two-Sample dependent
t-Test: Paired Two Sample for
Means

Variable 1 Variable 2
Mean 196.875 138.75
Variance 720.9821429 133.9285714
Observations 8 8
Pearson Correlation 0.882107431
Hypothesized Mean Difference 0
df 7
t Stat 9.387578897
P(T<=t) one-tail 1.62001E-05
t Critical one-tail 1.894578605
P(T<=t) two-tail 3.24001E-05
t Critical two-tail 2.364624252
Decision:
We do a two-tail test. IF t Stat  > t Critical, we rejected the null
hypothesis. In the case 9.38757 < 2.3646. Therefore, we do rejected
11
the null hypothesis; that means 0  µ1. We can make decision
depended on the values of α and P value .IF α > P value, we rejected the
null hypothesis. In the case 0.05 > 3.24001E-05. Therefore, we do
rejected the null hypothesis; that means 0  µ1

12
Z test

LNCT CLG 1
Changing Standard Deviations
Whatever the mean and std dev:
• If the distribution is normally distributed – the
68-95-99.7 rule applies.
• At 68%, two-thirds of all the cases fall within
+ 1 standard deviation of the mean,
• 95% of the cases within + 2 standard
deviation of the mean, and 99.7% of the
cases within + 3 sd of the mean.
Z test versus T test
Generally, z-tests are used when we
have large sample sizes (n > 30), whereas
t-tests are most helpful with a smaller
sample size (n < 30). Both methods
assume a normal distribution of the
data, but the z-tests are most useful
when the standard deviation is known.
Z test
DEFINATION Z test is a statistical procedure used to
test an alternative hypothesis against a null
hypothesis.
Z-test is any statistical hypothesis used to determine
whether two samples' means are different when
variances are known and sample is large (n ≥ 30).
A z-test is a statistical test used to determine
whether two population means are different when
the variances are known and the sample size is
large.
Z Scores
• We call these standard deviation values “Z-
scores”
• Z score is defined as the number of standard
units any score or value is from the mean.
• Z score states how many standard deviations
the observation X falls away from the mean
and in which direction – plus or minus.
Formula for z score

x  
z


Examples of computing z-scores
X X
X X XX SD z
SD

5 3 2 2 1

6 3 3 2 1.5

5 10 -5 4 -1.25

6 3 3 4 .75

4 8 -4 2 -2
Computing raw scores from z scores
X  z   or X  zSD  X
X X
z
SD SD zSD X X
1 2 2 3 5

-2 2 -4 2 -2

.5 4 2 10 12

-1 5 -5 10 5
example
• Party-time employee salaries in a company are
normally distributed with mean $20,000 and
Standard Dev. $1,000
• How many Std. Devs. Is $18,500 away from the
mean?

• Z= 18.500-200000/1000=
• -1.5 (negative specifies
directio,n5)00 ,
?
11
Example
• How many Std. Devs. Is $19,371 away from the
mean?

• Z = x-µ/Sd
• Z= 19.371-200000/1000=
= -.269 Std. devs. away

?
z-scores and conversions
• What is a z-score?
– A measure of an observation’s distance from the
mean.
– The distance is measured in standard deviation
units.
• If a z-score is zero, it’s on the mean.
• If a z-score is positive, it’s above the mean.
• If a z-score is negative, it’s below the mean.
• If a z-score is 1, it’s 1 SD above the mean.
• If a z-score is –2, it’s 2 SDs below the mean.
IQ is normally distributed with a mean of 100 and sd of 15.
How do you interpret a score of 109?
Use Z score

x
109 100 9
z    .6
 15 15
What does this Z-score .60 mean?

Does not mean 60 percent of cases below this score BUT rather
that this Z score is .60 standard units above the mean,

We need the Z-table to interpret this!


Calculating z-scores
The amount of time it takes for a pizza delivery
is approximately normally distributed with a
mean of 25 minutes and a standard deviation
of 2 minutes. Convert 21 minutes to a z score.

x   21  25
z   2.00
Calculating z-scores
Mean delivery time = 25 minutes
Standard deviation = 2 minutes
Convert 29.7 minutes to a z score.

x   29.7  25
z   2.35
To find the area to the left of
z = 1.34=0.4099
below this value =0.5+0.4099=0.9099

Dr.Abid Ahmad 17
To find the area to the left of
z = 1.34=0.4099
area to the left of this value=0.5+0.4099
= 0.9099

z … 0.03 0.04 0.05 ..…


.
.
1.2 … 0.3907 0.3925 0.3944. …..
1.3 … 04082. 0.4099 0.41.15 ….

1.4 … 0.4222 0.4236 0.4251 ….


.

Dr.Abid Ahmad 18
Patterns for Finding Areas Under
the Standard Normal Curve
To find the area to the left of a given negative
z:

z 0
Patterns for Finding Areas Under the
Standard Normal Curve

To find the area to the left of a given positive


z:

0 z
Patterns for Finding Areas
Under the Standard Normal
Curve
To find the area between z values on either
side of zero:
Subtract area to left of z1 from area to left
of z2 .

z1 0 z2
Patterns for Finding Areas
Under the Standard Normal
Curve
To find the area between z values on the
same side of zero:
Subtract area to left of z1 from area to left
of z2 .

0 z1 z2
Patterns for Finding Areas
Under the Standard Normal
Curve
To find the area to the right of a positive z
value or to the right of a negative z value:
Subtract from 1.0000 the area to the left of the
given z.
Area under
entire curve
= 1.000.

0 z
Z score

• What is the probability of selecting some one


who is 2 SD above the mean and one SD below
the mean in normal distribution curve ?
• 2 SD above the mean = ½ x 0.95 = 0.475
• One SD below the mean = ½ x 0.68 = 0.340
• Or we can use z score table to know the solutions
Use of the Normal Probability Table

a. P(z < 1.24) = .8925

b. P(0 < z < 1.60) = .4452

c. P( - 2.37 < z < 0) = .4911


Normal Probability

d. P( - 3 < z < 3 ) = .9974

e. P( - 2.34 < z < 1.57 ) = .9322

f. P( 1.24 < z < 1.88 ) = .0774


Normal Probability

g. P( - 2.44 < z < - 0.73 ) = .2254

h. P( z < 1.64 ) = .9495

i. P( z > 2.39 ) = .0084


Normal Probability

j. P ( z > - 1.43 ) = .9236

k. P( z < - 2.71 ) = .0034


Application of the Normal Curve

The amount of time it takes for a pizza delivery is


approximately normally distributed with a mean of 25
minutes and a standard deviation of 2 minutes. If you order
a pizza, find the probability that the delivery time will be:

a. between 25 and 27 minutes. a. .3413

b. less than 30 minutes. b. .9938

c. less than 22.7 minutes. c. .1251


Z score

• What is the probability of selecting some one


who is between - 1.3 SD and 1.8 SD in normal
distribution curve ?
From the table of z zcore ;
• Z1= - 1.3 SD = 0.4032
• Z2= 1.8 SD = 0.4641
• Probability between z1 and z2 =
0.4032 +0.4641 = 0.8673
• The heights of a large proportion of men were
found to follow closely a normal distribution
curve with a mean of 172.5 cm and st.dv of 6.2
cm. what are the probability of getting a value ;
• A) above 180 cm.
• B) below 170 cm.
• C) below 185 cm.
• D) between 165 cm and 175 cm.
A) above 180 cm. x  
z 
180-172.5 / 6.2 = 1.2 
From z table :
Z= 1.2 SD = 0.3849
Above 180 cm (1.2 SD) = 0.5- 0.3849= 0.115 =
11.5%
• B) below 170 cm.
• Z= 170 - 172.5/6.2 = - 0.40
• From table ;
• Z= -0.40 = 0.1554
• Below 170cm = 0.5 – 0.1554 = 0.345 = 34.5%
• C) below 165 cm.
• Z = 165 – 172.5 / 6.2 = - 2.01
• From table :
• Z = - 2.01 = 0.4778
• Below 165 cm = 0.5 – 0.4778 = 0.022 = 2.2%
D) between 165 cm and 175 cm.
• Z1 = 165 – 172.5/ 6.2 = - 1.21
• Z2 = 175 – 172.5/ 6.2 = 0.403
From table ;
• Z1 =- 1.21 = 0.3869
• Z2 = 0.403 = 0.1554
• The value between 165 cm and 175 cm =
0.3869 + 0.1554 = 0.5423 = 54.29%
 Hypothesis test:
observed mean  null mean
Z
s
n

s
confidence interval observed mean Z/2 *( )
n
• 140 children with urinary lead concentrarion in
umol/24 hours. The mean = 2.18 and SD =0.87
umol/24 hours .
A)What is 95% probability limits for this mean ?
• C I = The mean ± 1.96 SD
• 2.18 ± 1.96 * 0.87
• 3.88 – 0.475
• B) How many SD is a reading of 4.8 umol/24
hours from the mean, what is the probability of
getting such a reading, and is this reading
significantly differ from the mean or not ?
• Z= 4.8 – 2.18 / 0.87 = 3.01
• Z = 0.0013
• That means that the probability is very small to
come from the same population
• Mean diastolic blood pressure among printers
was found to be 88 mm Hg and with SD of 4.5.
One of the printers diastolic blood pressure was
found to be 100 mm Hg .
• A) is this significantly different or not at 95%
level.
• B) is this significantly different or not at 99%
level.
A) C I = The mean ± 1.96 SD
• 95% confidence interval = 88 ± 1.96* 4.5
• = 79.2 – 96.8 mm Hg
• So 100 mm Hg is outside this interval.
B) 99% confidence interval = 88 ± 3* 4.5
• = 74.5 – 101.5
• So 100 mm Hg lies within this interval
• We can use SE instead of SD
C I = The mean ± 1.96 SE
• SE = SD/√n
• = 4.5/√72 = 0.53 mm Hg
• so the answers will be:
• A) 95% CI = 88 ± 1.96* 0.53
• = 86.96 – 89.04
• B) 99% CI = 88 ± 3 * 0.53
• = 86.41 – 89.59
Z value application in difference between two means

Z = mean of 1st sample – mean of 2nd sample / summation of (SE of


1st sample + SE of 2nd sample).

Z= x1 – x 2 / (SD1)/√ n1 + (SD2)/√ n 2

Z= (X1 – X2) /√{(SD21 / n1 ) +( SD2 2 / n2 )}

95% CI = x1 – x 2 ± 1.96 SE of both


Null hypothesis : there is no difference
Alternative hypothesis : there is difference
example
group Mean diastolic blood SD numbers
pressure
printer 88 4.5 72
farmer 79 4.2 49

Is there any significant differences between the


two means or not?
Z = 88 – 79 /{(4.5/√72 )+ (4.2/√ 49)}
Z = 9/0.81 = 11.1
So the probability is that p =< 0.001
So unlikely the two samples from same population
Z test
hypothesis testing
• Choosing the statistical test.
• Statistical hypothesis.
Null hypothesis and Alternative hypothesis
• Level of significance. α= 0.05
• Calculating the statistical test.
• Statistical decision.
• Resume the situation.
44
One sample Z test
Example:
A local telephone company claims that the average length of a phone call is 8
minutes. In a random sample of 58 phone calls, the sample mean was 7.8
minutes and the standard deviation was 0.5 minutes. Is there enough
evidence to support this claim at  = 0.05?

H0:  = 8 (Claim) Ha:   8

The level of significance is  = 0.05.

0.025 0.025
z
z0 = 1.96 0 z0 = 1.96

Dr. Abid Ahmad Conti4n5ued.


Testing hypothesis one sample
Example continued: Z test
A local telephone company claims that the average length of a phone call is 8
minutes. In a random sample of 58 phone calls, the sample mean was 7.8
minutes and the standard deviation was 0.5 minutes. Is there enough
evidence to support this claim at  = 0.05?

H0:  = 8 (Claim) Ha:   8


The standardized test statistic is
The test statistic falls in
z  x  µ  7.8  8 the rejection region, so
σ n 0.5 58 H0 is rejected.
 3.05. z
z0 = 1.96 0 z0 = 1.96

At the 5% level of significance, there is enough evidence to reject


the claim that the average length of a phone call is 8 minutes.
46
Two-Sample Z test

• The usual rate (430 births/day)


• What if we instead look at difference
between weekends and weekdays?

Sun Mon Tue Wed Thu Fri Sat


Weekdays Weekends
452 470 431 448 467 377
344 449 440 457 471 463 405
377 453 499 461 442 444 415
356 470 519 443 449 418 394
399 451 468 432

47
Two-Sample Z test
• We want to test the null hypothesis that the two
populations have different means
• H0: 1 = 2 or equivalently, 1 - 2 = 0
• Two-sided alternative hypothesis: 1 - 2  0
• If we assume our population SDs 1 and 2 are
known, we can calculate a two-sample Z statistic:

• We can then calculate a p-value from this Z statistic


using the standard normal distribution

48
Two samples Z test
• In Tikrit Medical college, to decide any difference
between two student samples age means from
outside Tikrit and those from Tikrit center.
• From Tikrit: mean1= 18, SD= 1.5, N1=60
• From outside Tikrit: mean2=19, SD=2, N2=60

• Z=18-19/√ (1.5)2 /60 +(2)2 /60=-3.125


• There is a significant difference
49
F-TEST and Analysis of Variance (ANOVA)

Introduction

Analysis of variance (ANOVA) is statistical technique used for analyzing the difference between
the means of more than two samples. It is a parametric test of hypothesis. It is a step wise
estimation procedures (such as the "variation" among and between groups) used to attest the
equality between two or more population means .

ANOVA was developed by statistician and eugenicist Ronald Fisher. Though many statisticians
including Fisher worked on the development of ANOVA model but it became widely known
after being included in Fisher's 1925 book “Statistical Methods for Research Workers”. The
ANOVA is based on the law of total variance, where the observed variance in a particular
variable is partitioned into components attributable to different sources of variation. ANOVA
provides an analytical study for testing the differences among group means and thus generalizes
the t-test beyond two means. ANOVA uses F-tests to statistically test the equality of means.

Concept of Variance

Variance is an important tool in the sciences including statistical science. In the Theory of
Probability and statistics, variance is the expectation of the squared deviation of a random
variable from its mean. Actually, it is measured to find out the degree to which the data in series
are scattered around its average value. Variance is widely used in statistics, its use is ranging
from descriptive statistics to statistical inference and testing of hypothesis.

Relationship Among Variables


under the said analysis, we use to examine the differences in the mean values of the

1
dependent variable associated with the effect of the controlled independent variables,
after taking into account the influence of the uncontrolled independent variables.

We take the null hypothesis that there is no significant difference between the means of
different populations. In its simplest form, analysis of variance must have a dependent
variable that is metric (measured using an interval or ratio scale). There must also be
one or more independent variables. The independent variables must be all categorical
(non-metric). Categorical independent variables are also called factors. A particular
combination of factor levels, or categories, is called a treatment.

What type of analysis would be made for examining the variations depends upon the
number of independent variables taken into account for the study purpose. One-way
analysis of variance involves only one categorical variable, or a single factor. If two or
more factors are involved, the analysis is termed n-way (eg. Two-Way, Three-Way etc.)
Analysis of Variance.

F Tests

F-tests are named after the name of Sir Ronald Fisher. The F-statistic is simply a ratio of two
variances. Variance is the square of the standard deviation. For a common person, standard
deviations are easier to understand than variances because they’re in the same units as the data
rather than squared units. F-statistics are based on the ratio of mean squares. The term “mean
squares” may sound confusing but it is simply an estimate of population variance that accounts
for the degrees of freedom (DF) used to calculate that estimate.

For carrying out the test of significance, we calculate the ratio F, which is defined as:
2
𝐹 = 𝑆1 , where 𝑆2= (𝑋1 −𝑋̅1 )2
𝑆22 1 𝑛1−1

(𝑋2 −𝑋̅2 ) 2
And 𝑆22 =
𝑛2−1

It should be noted that 𝑆2 is always the larger estimate of variance, i.e., 𝑆2> 𝑆2
1 1 2

𝐿𝑎𝑟𝑔𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒


F=
𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

1= 𝑛1-1 and 2= 𝑛2-1

1= degrees of freedom for sample having larger variance.

2= degrees of freedom for sample having smaller variance.

2
The calculated value of F is compared with the table value for 1 and 2 at 5% or 1% level of
significance. If calculated value of F is greater than the table value then the F ratio is considered
significant and the null hypothesis is rejected. On the other hand, if the calculated value of F is
less than the table value the null hypothesis is accepted and it is inferred that both the samples
have come from the population having same variance.

Illustration 1: Two random samples were drawn from two normal populations and their values
are:

A 65 66 73 80 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97

Test whether the two populations have the same variance at the 5% level of significance.

(Given: F=3.36 at 5% level for 1=10 and 2 =8.)

Solution: Let us take the null hypothesis that the two populations have not the same variance.

Applying F-test:
2
F=𝑆 1
𝑆22

A (𝑋1 -𝑋̅1 ) 𝑥12 B (𝑋2 -𝑋̅2 ) 𝑥22


𝑋1 𝑥1 𝑋2 𝑥2
65 -15 225 64 -19 361
66 -14 196 66 -17 289
73 -7 49 74 -9 81
80 0 0 78 -5 25
82 2 4 82 -1 1
84 4 16 85 2 4
88 8 64 87 4 16
90 10 100 92 9 81
92 12 144 93 10 100
95 12 144
97 14 196
∑𝑋1= 720 ∑𝑥1=0 ∑𝑥12-=798 ∑𝑋2=913 ∑𝑥2=0 ∑𝑥22=1298

𝑋̅= ∑𝑋1 = 720 = 80;


1 𝑛1 9

𝑋̅ = ∑𝑋2 = 913 = 83
2 𝑛2 11
798
𝑆2= ∑𝑥2/n1-1= = 99.75
1 1 9−1

3
734
𝑆2= ∑𝑥2/n2-1= = 129.8
2 2 11−1
2
F= 𝑆1 = 99.75
𝑆22
= 0.768
129.8

𝐴𝑡 5 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒, For 1= 10 and 2 =8, the table value of 𝐹0.05=3.36.

The calculated value of F is less than the table value. The hypothesis is accepted. Hence the two
populations have not the same variance.

TESTING EQUALITY OF POPULATION (TREATMENT) MEANS:

ONE-WAY CLASSIFICATION

In one way classification, following steps are carrying out for computing F- ratio through most
popular method i.e. short-cut method:

1. Firstly get the squared value of all the observation for different samples (column)

2. Get the sum total of sample observations as ∑X1, ∑X2,……. ∑Xk in each column.
3. Get the sum total of squared values for each column as ∑𝑋2, ∑𝑋2,……. ∑𝑋2 in each column.
1 2 𝐾

4. Finding the value of “T” by adding up all the sums of sample observations i.e. T= ∑X1+
∑X2+……. ∑Xk

5. Compute the Correction Factor by the formula:


2
C.F.= 𝑇
𝑁

6. Find out Total sum of Squares (SST) through squared values and C F:

SST= ∑𝑋12+ ∑𝑋22, +……. ∑𝑋2𝐾- CF

7. Find out Sum of square between the samples SSC by following formula:

(∑𝑋1)2 (∑𝑋2)2 (∑𝑋𝐾)2


SSC= + + ................... – CF
𝑛1 𝑛2 𝑛𝑘

8. Finally, find out sum of squares within samples i.e. SSE as under:

SSE= SST-SSC

ANALYSIS OF VARIANCE (ANOVA) TABLE

Source of Sum of squares Degrees of Mean square (MS) Variance ratio


Variation (SS) freedom (v) of F
Between samples SSC 1 = C-1 MSC= SSC/1

4
(Treatments)
Within samples SSE 2= N-C MSE SSE/2 F=
𝑀𝑆𝐶
𝑀𝑆𝐸
(error)
Total SST n-1

SSC= Sum of squares between samples (Columns)

SST= Total sum of the squares of Variations.

SSE= Sum of squares within the samples.

MSC= Mean sum of squats between samples

MSE= Mean sum of squares within samples

Illustration 2: To test the significance of variation in the retail prices of a commodity in three
principal cities, Mumbai, Kolkata, and Delhi, four shops were chosen at random in each city and
the prices who lack confidence in their mathematical ability observed in rupees were as follows:

Kanpur 15 7 11 13
Lucknow 14 10 10 6
Delhi 4 10 8 8

Do the data indicate that the price in the three cities are significantly different?

Solution: Let us take the null hypothesis that there is no significant difference in the prices of a
commodity in the three cities.

Calculations for analysis of variance are us under:

Sample 1 Sample 2 Sample 3


Kanpur Lucknow Delhi
𝑥1 2
𝑥1 𝑥2 𝑥22 𝑥3 𝑥23
15 225 14 196 4 16
7 49 10 100 10 100
11 121 10 100 8 64
13 169 6 36 8 64
∑𝑥1 = 46 ∑𝑥12 =564 ∑𝑥2=40 ∑ 𝑥22 =432 ∑𝑥3 = 30 ∑𝑥23 =244

There are r = treatments (samples) with 𝑛1=4, 𝑛2= 4, 𝑛3 = 4, and n= 12.

T= Sum of all the observations in the three samples

= ∑𝑥1+ ∑𝑥2 +∑𝑥3 = 46+40+30 = 116

5
2 2
CF = Correction Factor = 𝑇 = (116) = 1121.33
𝑛 12

SST = Total sum of the squares


= (∑𝑥2+ ∑𝑥2 +∑𝑥2) – CF = (564+ 432+ 244)-1121.33 = 118.67
1 2 3

SSC= Sum of the squares between the samples


(∑𝑥1)2 (∑𝑥2)2 (∑𝑥3)2
=[ + + ] – CF
𝑛1 𝑛2 𝑛3
2 2 2
=[ (46) + (40) + (30) ] – 1121.33
4 4 4

= [ 2116 + 1600 + 900 ] – 1121.33


4 4 4

= 4616 – 1121.33 = 32.67


4

SSE= SST-SSC= 118.67-32.67= 86

Degrees of freedom: df1 = r-1= 3-1 = 2 and df2 =n-r= 12-3=9

Thus MSTR= 𝑆𝑆𝑇𝑅 = 32.67 = 16.33 and MSE= 𝑆𝑆𝐸 = 86 = 9.55


𝑑𝑓1 2 𝑑𝑓2 9

ANOVA TABLE

Source of Sum of Squares Degrees of Mean Squares Test-Statistic


Variation Freedom
32.67 = SSTR 2=r-1 𝑆𝑆𝑇𝑅 𝑀𝑆𝐶 16.335
Between MSC= F= =
𝑟−1 𝑀𝑆𝐸 9.55
Samples
32.67
= = 1.71
2
= 16.335
Within Samples 86= SSE 9=n-r
86
MSE= 𝑆𝑆𝐸 = =9.55
𝑛−𝑟 9

Total 118.67=SST 11=n-1

The table value of F for df1 =2 , df2 = 9, and  = 5% level of significance is 4.26. Since
calculated value of F is less than its critical (or table) value, the null hypothesis is accepted.
Hence we conclude that prices of a commodity in three cities have no significant difference.

6
TESTING EQUALITY OF POPULATION (TREATMENT) MEANS: TWO-WAY
CLLASIFICATION

ANOVA TABLE FOR TWO-WAY CLASSIFICATION

Source of Sum of Degrees of Mean Square


Variation square Freedom
Between SSC c-1 MSC= SSTR/(c-1) 𝐹𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 =MSC/MSE
columns
Between rows SSR r-1 MSR= SSR/(r-1) 𝐹𝑏𝑙𝑜𝑐𝑘𝑠= MSR/MSE
Residual error SSE (c-1)(r-1) MSE= SSE/(c-1)(r-1)
Total SST n-1

Total variation consists of three parts: (i) variation between columns, SSC; (ii) variation between
rows, SSR; and (iii) actual variation due to random error, SSE. That is,

SST=SSC+(SSR+SSE).

The degrees of freedom associated with SST are cr-1, where c and r are the number of columns
and rows, respectively.

Degrees of freedom between columns= c-1

Degrees of freedom between rows= r-1

Degrees of freedom for residual error=(c-1)(r-1)

The test-statistic F for analysis of variance is given by:

𝐹𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = MSC/MSE; MSC > MSE or MSE/MSC; MSE > MSC

𝐹𝑏𝑙𝑜𝑐𝑘𝑠 = MSR/MSE; MSR > MSE or MSE/MSR; MSE>MSR.

Illustration 3: The following table gives the number of refrigerators sold by 4 salesmen in three
months May, June and July:

Month Salesman

A B C D
March 50 40 48 39
April 46 48 50 45
May 39 44 40 39

7
Is there a significant difference in the sales made by the four salesmen? Is there a significant
difference in the sales made during different months?

Solution: Let us take the following null hypothesis:

𝐻𝑂 ∶ There is no significant difference in the sales made by the four salesmen.

𝐻𝑂 ∶ There is no significant difference in the sales made during different months.

The given data are coded by subtracting 40 from each observation. Calculations for a two-
criteria-month and salesman-analysis of variance are shown below:

Two-way ANOVA Table

Month Salesman Row


A(x1) 𝑥12 B(x2) 𝑥22 C(x3) 𝑥32 D(x4) 𝑥42 Sum
March 10 100 0 0 8 64 -1 1 17
April 6 36 8 64 10 100 5 25 29
May -1 1 4 16 0 0 -1 1 2
Column 15 137 12 80 18 164 3 27 48
sum

T= Sum of all observations in three samples of months= 48


𝑇2 (48)2
CF= Correction Factor= = = 192
𝑛 12

SSC= Sum of squares between salesmen (columns)


(15)2 (12)2 (18)2 (3)2
=[ + + + ] – 192
3 3 3 3

= (75+48+108+3)-192= 42

SSR= Sum of squares between months (rows)


(17)2 (29)2 (2)2
=[ + + ]-192
4 4 4

= (72.25 +210.25+1) -192 = 91.5

SST= Total sum of squares


= (∑𝑥2 +∑𝑥2 +∑𝑥2 +∑𝑥2 )-CF
1 2 3 4

= (137+80+164+27)-192 = 216

SSE= SST-(SSC+SSR) = 216-(42+91.5) = 82.5

8
The total degrees of freedom are df= n-1=12-1=11.

So dfc= c-1 = 4-1 = 3, dfr = r-1=3-1=2; df =(c-1)(r-1)= 3x2=6

Thus, MSC= SSC/(c-1) = 42/3=14

MSR= SSR/(r-1)= 91.5/2= 45.75

MSE= SSE/(c-1)(r-1) = 82.5/6=13.75

The ANOVA table is shown below:

Source of Sum of Degrees of Mean Squares Variance Ratio


variation squares freedom
Between SSC=42.0 c-1=3 MSC=SSC/(c-1) 𝐹𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = MSC/MSE
Salesmen =14.00 =14/13.75
Between SSR=91.5 r-1=2 MSR=SSR/(r-1) =1.018
months =45.75 𝐹𝐵𝑙𝑜𝑐𝑘= MSR/MSE
Residual SSE=82.5 (c-1)(r-1)=6 MSE=SSE/(c-1)(r-1) =45.75/13.75
error =13.75 =3.327
Total SST=216 n-1=11

(a) The table value of F = 4.75 for df1 =3, df2 = 6, and  =5%. Since the calculated value of
𝐹𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = 1.018 is less than its table value, the null hypothesis is accepted. Hence we
conclude that the sales made by the salesmen do not differ significantly.

(b) The table value of F= 5.14 for df1=2, df2=6, and = 5%. Since the calculated value of 𝐹𝐵𝑙𝑜𝑐𝑘=
3.327 is less than its table value, the null hypothesis is accepted. Hence we conclude that sales
made during different months do not differ significantly.

9
Analysis of Variance (ANOVA)

Recall, when we wanted to compare two population means, we used the 2-sample t procedures .
Now let’s expand this to compare k ≥ 3 population means. As with the t-test, we can graphically get an
idea of what is going on by looking at side-by-side boxplots. (See Example 12.3, p. 748, along with Figure
12.3, p. 749.)

1 Basic ANOVA concepts

1.1 The Setting

Generally, we are considering a quantitative response variable as it relates to one or more explanatory
variables, usually categorical. Questions which fit this setting:

(i) Which academic department in the sciences gives out the lowest average grades? (Explanatory vari-
able: department; Response variable: student GPA’s for individual courses)
(ii) Which kind of promotional campaign leads to greatest store income at Christmas time? (Explanatory
variable: promotion type; Response variable: daily store income)

(iii) How do the type of career and marital status of a person relate to the total cost in annual claims
she/he is likely to make on her health insurance. (Explanatory variables: career and marital status;
Response variable: health insurance payouts)

Each value of the explanatory variable (or value-pair, if there is more than one explanatory variable) repre-
sents a population or group. In the Physicians’ Health Study of Example 3.3, p. 238, there are two factors
(explanatory variables): aspirin (values are “taking it” or “not taking it”) and beta carotene (values again are
“taking it” or “not taking it”), and this divides the subjects into four groups corresponding to the four cells
of Figure 3.1 (p. 239). Had the response variable for this study been quantitative—like systolic blood pres-
sure level—rather than categorical, it would have been an appropriate scenario in which to apply (2-way)
ANOVA.

1.2 Hypotheses of ANOVA

These are always the same.

H0: The (population) means of all groups under consideration are equal.

Ha: The (pop.) means are not all equal. (Note: This is different than saying “they are all unequal ”!)

1.3 Basic Idea of ANOVA

Analysis of variance is a perfectly descriptive name of what is actually done to analyze sample data ac-
quired to answer problems such as those described in Section 1.1. Take a look at Figures 12.2(a) and 12.2(b)
(p. 746) in your text. Side-by-side boxplots like these in both figures reveal differences between samples
taken from three populations. However, variations like those depicted in 12.2(a) are much less convincing
that the population means for the three populations are different than if the variations are as in 12.2(b). The
reason is because the ratio of variation between groups to variation within groups is much
smaller for 12.2(a) than it is for 12.2(b).
1.4 Assumptions of ANOVA

Like so many of our inference procedures, ANOVA has some underlying assumptions which should be in
place in order to make the results of calculations completely trustworthy. They include:

(i) Subjects are chosen via a simple random sample.


(ii) Within each group/population, the response variable is normally distributed.
(iii) While the population means may be different from one group to the next, the population standard
deviation is the same for all groups.

Fortunately, ANOVA is somewhat robust (i.e., results remain fairly trustworthy despite mild violations of
these assumptions). Assumptions (ii) and (iii) are close enough to being true if, after gathering SRS samples
from each group, you:

(ii) look at normal quantile plots for each group and, in each case, see that the data points fall close to a
line.
(iii) compute the standard deviations for each group sample, and see that the ratio of the largest to the
smallest group sample s.d. is no more than two.

2 One-Way ANOVA

When there is just one explanatory variable, we refer to the analysis of variance as one-way ANOVA.

2.1 Notation

Here is a key to symbols you may see as you read through this section.

k = the number of groups/populations/values of the explanatory variable/levels of treatment


ni = the sample size taken from group i
xij = the jth response sampled from the ith group/population.
1 ni
x̄i = the sample mean of responses from the ith group =
ni
xij
j=1

1 ni
si = the sample standard deviation from the ith group =
ni − 1
 (x i j − x¯i)2
j=1

n = the (total) sample, irrespective of groups = ki=1 ni.


1
x¯ = the mean of all responses, irrespective of groups =
n
 xij
ij

2.2 Splitting the Total Variability into Parts

Viewed as one sample (rather than k samples from the individual groups/populations), one might measure
the total amount of variability among observations by summing the squares of the differences between each
xij and x¯:
k ni

SST (stands for sum of squares total) =   (x ij − x¯)2.


i=1 j=1
This variability has two sources:

1. Variability between group means (specifically, variation around the overall mean x¯)
k
SSG :=  ni (x̄ i − x̄) 2 , and
i=1

2. Variability within groups means (specifically, variation of observations about their group mean x¯i)
k ni k

SSE :=   (x ij − x¯ ) i 2 = (ni − 1)si2.


i=1 j=1 i=1

It is the case that


SST = SSG + SSE.

2.3 The Calculations

If the variability between groups/treatments is large relative to the variability within groups/treatments,
then the data suggest that the means of the populations from which the data were drawn are significantly
different. That is, in fact, how the F statistic is computed: it is a measure of the variability between treat-
ments divided by a measure of the variability within treatments. If F is large, the variability between
treatments is large relative to the variation within treatments, and we reject the null hypothesis of equal
means. If F is small, the variability between treatments is small relative to the variation within treatments,
and we do not reject the null hypothesis of equal means. (In this case, the sample data is consistent with
the hypothesis that population means are equal between groups.)
To compute this ratio (the F statistic) is difficult and time consuming. Therefore we are always going to let
the computer do this for us. The computer generates what is called an ANOVA table:

Source SS df MS F

SSG MSG
Model/Group SSG k−1 MSG =
k−1 MSE
SSE
Residual/Error SSE n−k MSE =
n−k

Total SST n−1

What are these things?

• The source (of variability) column tells us SS=Sum of Squares (sum of squared deviations):
SST measures variation of the data around the overall mean x¯
SSG measures variation of the group means around the overall mean
SSE measures the variation of each observation around its group mean x̄i

• Degrees of freedom

k − 1 for SSG, since it measures the variation of the k group means about the overall
mean
n−k for SSE, since it measures the variation of the n observations about k group means
n−1 for SST, since it measures the variation of all n observations about the overall mean
• MS = Mean Square = SS/df :
This is like a standard deviation. Look at the formula we learned back in Chapter 1 for sample stan-
dard deviation (p. 51). Its numerator was a sum of squared deviations (just like our SS formulas), and
it was divided by the appropriate number of degrees of freedom.
It is interesting to note that another formula for MSE is

(n1 − 1)s21+ (n2 − 1)s2 + · · · + (nk − 1)s2k


MSE = 2 ,
(n1 − 1) + (n2 − 1) + · · · + (nk − 1)
which may remind you of the pooled sample estimate for the population variance for 2-sample pro-
cedures (when we believe the two populations have the same variance). In fact, the quantity MSE is
also called s2p .

• The F statistic = MSG/MSE


If the null hypothesis is true, the F statistic has an F distribution with — k 1 and n− k degrees of
freedom in the numerator/denominator respectively. If the alternate hypothesis is true, then F tends
to be large. We reject H0 in favor of Ha if the F statistic is sufficiently large.
As with other hypothesis tests, we determine whether the F statistic is large by finding a correspond-
ing P-value. For this, we use Table E. Since the alternative hypothesis is always the same (no 1-sided
vs. 2-sided distinction), the test is single-tailed (like the chi-squared test). Neverthless, to read the
correct P-value from the table requires knowledge of the number of degrees of freedom associated
with both the numerator (MSG) and denominator (MSE) of the F-value.
Look at Table E. On the top are the numerator df, and down the left side are the denominator df. In
the table are the F values, and the P-values (the probability of getting an F statistic larger than that if
the null hypothesis is true) are down the left side.
Example: Determine
P(F3,6 > 9.78) = 0.01
P(F2,20 > 5) = between 0.01 and 0.025
Example: A firm wishes to compare four programs for training workers to perform a certain manual task.
Twenty new employees are randomly assigned to the training programs, with 5 in each program. At the
end of the training period, a test is conducted to see how quickly trainees can perform the task. The number
of times the task is performed per minute is recorded for each trainee.
Program 1 Program 2 Program 3 Program 4
9 10 12 9
12 6 14 8
14 9 11 11
11 9 13 7
13 10 11 8
The data was entered into Stata which produced the following ANOVA table.

Analysis of Variance
Source SS df MS F Prob > F

Between groups 54.95 3 18.3166667 7.04 XXXXXX


Within groups 41.6 16 2.6

Total 96.55 19 5.08157895

Bartlett’s test for equal variances: chi2(3) = 0.5685 Prob>chi2 = 0.904

The X-ed out P-value is between 0.001 and 0.01 .


Stata gives us a line at the bottom—the one about Bartlett’s test—which reinforces our belief that the vari-
ances are the same for all groups.
Example: In an experiment to investigate the performance of six different types of spark plugs intended for
use on a two-stroke motorcycle, ten plugs of each brand were tested and the number of driving miles (at a
constant speed) until plug failure was recorded. A partial ANOVA table appears below. Fill in the missing
values:
Source df SS MS F
Brand 5 55961.5 11192.3 2.3744
Error 54 254539.26 4,713.69
Total 59 310,500.76

Note: One more thing you will often find on an ANOVA table is R2 (the coefficient of determination). It
indicates the ratio of the variability between group means in the sample to the overall sample variability,
meaning that it has a similar interpretation to that for R2 in linear regression.

2.4 Multiple Comparisons

As with other tests of significance, one-way ANOVA has the following steps:

1. State the hypotheses (see Section 1.2)


2. Compute a test statistic (here it is Fdf numer., df denom.), and use it to determine a probability of
getting a sample as extreme or more so under the null hypothesis.
3. Apply a decision rule: At the α level of significance, reject H0 if P(Fk−1,n−k > Fcomputed) < α. Do
not reject H0 if P > α.

If P > α, then we have no reason to reject the null hypothesis. We state this as our conclusion along with
the relevant information (F-value, df-numerator, df-denominator, P-value). Ideally, a person conducting the
study will have some preconceived hypotheses (more specialized than the H0, Ha we stated for ANOVA,
and ones which she held before ever collecting/looking at the data) about the group means that she wishes
to investigate. When this is the case, she may go ahead and explore them (even if ANOVA did not indicate
an overall difference in group means), often employing the method of contrasts. We will not learn this
method as a class, but if you wish to know more, some information is given on pp. 762–769.
When we have no such preconceived leanings about the group means, it is, generally speaking, inappro-
priate to continue searching for evidence of a difference in means if our F-value from ANOVA was not
significant. If, however, P < α, then we know that at least two means are not equal, and the door is open
to our trying to determine which ones. In this case, we follow up a significant F-statistic with pairwise
comparisons of the means, to see which are significantly different from each other.
This involves doing a t-test between each pair of means. This we do using the pooled estimate for the
(assumed) common standard deviation of all groups (see the MS bullet in Section 2.3):
x¯i − x¯j
tij = q .
sp 1/n i + 1/n j

To determine if this tij is statistically significant, we could just go to Table D with −


n k degrees of freedom
(the df associated with sp). However, depending on the number k of groups, we might be doing many
comparisons, and recall that statistical significance can occur simply by chance (that, in fact, is built into
the interpretation of the P-value), and it becomes more and more likely to occur as the number of tests we
conduct on the same dataset grows. If we are going to conduct many tests, and want the overall probability
of rejecting any of the null hypotheses (equal means between group pairs) in the process to be no more
than α, then we must adjust the significance level for each individual comparison to be much smaller than
α. There are a number of different approaches which have been proposed for choosing the individual-test
significance level so as to get an overall family significance of α, and we need not concern ourselves with the
details of such proposals. When software is available that can carry out the details and report the results to
us, we are most likely agreeable to using which ever proposal(s) have been incorporated into the software.
The most common method of adjusting for the fact that you are doing multiple comparisons is a method
developed by Tukey. Stata provides, among others, the Bonferroni approach for pairwise comparisons,
which is an approach mentioned in our text, pp. 770-771.
Example: Recall our data from a company looking at training programs for manual tasks carried out by its
employees. The results were statistically significant to conclude that not all of the training programs had
the same mean result. As a next step, we use Bonferroni multiple comparisons, providing here the results
as reported by Stata

Comparison of post-test by program


(Bonferroni)
Row Mean-|
Col Mean | 1 2 3
+
2| -3
| 0.057
|
3| .4 3.4
| 1.000 0.025
|
4| -3.2 -.2 -3.6
| 0.038 1.000 0.017

Where the row labeled ‘2’ meets the column labeled ‘1’, we are told that the sample mean response for
Program 2 was 3 lower than the mean response for Program 1 (Row Mean - Col Mean = -3), and that the
adjusted Bonferroni probability is 0.057. Thus, this difference is not statistically significant at the 5% level
to conclude the mean response from Program 1 is actually different than the mean response from Program
2. Which programs have statistically significant (at significance level 5%) mean responses?

Program 3 is different from program 2, with program 3 apparently better.


Program 4 is different from program 1, with program 1 apparently better.
Program 4 is different from program 3, with program 3 apparently better.

Apparently, programs 1 and 3 are the most successful, with no statistically-significant difference between
them. At this stage, other factors, such as how much it will cost the company to implement the two pro-
grams, may be used to determine which program will be set in place.
Note: Sometimes instead of giving P-values, a software package will generate confidence intervals for the
differences between means. Just remember that if the CI includes 0, there is no statistically significant
difference between the means.

3 Two-Way ANOVA

Two-way ANOVA allows to compare population means when the populations are classified according to
two (categorical) factors.
Example. We might like to look at SAT scores of students who are male or female (first factor) and either
have or have not had a preparatory course (second factor).
Example. A researcher wants to investigate the effects of the amounts of calcium and magnesium in a
rat’s diet on the rat’s blood pressure. Diets including high, medium and low amounts of each mineral (but
otherwise identical) will be fed to the rats. And after a specified time on the diet, the blood pressure wil be
measured. Notice that the design includes nine different treatments because there are three levels to each
of the two factors.

Comparisons of Two-way to One-Factor-at-a-Time

• usually have a smaller total sample size, since you’re studying two things at once [rat diet example,
p. 800]

• removes some of the random variability (some of the random variability is now explained by the
second factor, so you can more easily find significant differences)

• we can look at interactions between factors (a significant interaction means the effect of one variable
changes depending on the level of the other factor).

Examples of (potential) interaction.

• Radon (high/medium/low) and smoking.


High radon levels increase the rate of lung cancer somewhat. Smoking increases the risk of lung
cancer. But if you are exposed to radon and smoke, then your lung cancer rates skyrocket. Therefore,
the effect of radon on lung cancer rates is small for non-smokers but big for smokers. We can’t talk
about the effect of radon without talking about whether or not the person is a smoker.

• age of person (0-10, 11-20, 21+) and effect of pesticides (low/high)


• gender and effect of different legal drugs (different standard doses)

Two-way ANOVA table

Below is the outline of a two-way ANOVA table, with factors A and B, having I and J groups, respectively.

Source df SS MS F p-value
A I−1 SSA MSA MSA/MSE
B J−1 SSB MSB MSB/MSE
A ×B ( I − 1)(J − 1 ) SSAB MSAB MSAB/MSE
Error n − IJ SSE MSE
Total n−1 SST
The general layout of the ANOVA table should be familiar to us from the ANOVA tables we have seen for
regression and one-way ANOVA. Notice that this time we are dividing the variation into four components:

1. the variation explained by factor A


2. the variation explained by factor B
3. the variation explained by the interaction of A and B
4. the variation explained by randomness

Since there are three different values of F, we must be doing three different hypothesis tests at once. We’ll
get to the hypotheses of these tests shortly.

The Two-way ANOVA model

The model for two-way ANOVA is that each of the I J groups has a normal distribution with potentially
different means (µij), but with a common standard deviation (σ). That is,
xijk = µij + sijk , where sijk ∼ N(0, σ)
`˛¸x `˛¸x
group mean resid ual

As usual, we will use two-way ANOVA provided it is reasonable to assume normal group distributions
and the ratio of the largest group standard deviation to the smallest group standard deviation is at most 2 .

Main Effects

Example. We consider whether the classifying by diagnosis (anxiety, depression, CDFS/Court referred)
and prior abuse (yes/no) is related to mean BC (Being Cautious) score. Below is a table where each cell
contains he mean BC score for people who were in that group.

Diagnosis Abused Not abused Mean


Anxiety 24.7 18.2 21.2
Depression 27.7 23.7 26.6
DCFS/Court Referred 29.8 16.4 20.8
Mean 27.1 19.4
Here is the ANOVA table:

df SS MS F p-value
Diagnosis 2 222.3 111.15 2.33 .11
Ever abused 1 819.06 819.06 17.2 .0001*
D* E 2 165.2 82.60 1.73 .186
Error 62 2958.0 47.71
Total 67
The table has three P-values, corresponding to three tests of significance:

I. H0: The mean BC score is the same for each of the three diagnoses.
Ha: The mean BC score is not the same for all three diagnoses.
The evidence here is not significant to reject the null hypothesis. (F = 2.33, df 1 = 2, df 2 = 62, P =
0.11)
II. H0: There is no main effect due to ever being abused.
Ha: There is a main effect due to being abused.
The evidence is significant to conclude a main effect exists. (F = 17.2, df 1 = 1, df 2 = 62, P = 0.0001)
III. H0: There is no interaction effect between diagnosis and ever being abused.
Ha: There is an interaction effect between the two variables.
The evidence here is not significant to reject the null hypothesis. (F = 1.73, df 1 = 2, df 2 = 62, P =
0.186)

When a main effect has been found for just one variable without variable interactions, we might combine
data across diagnoses and perform one of the other tests we know that is applicable. (Two-sample t or One-
way ANOVA are both options here, since the combining of information leaves us with just two groups.)
But we might also perform a simpler task: Draw a plot of the main effects due to abuse.
Interaction Effects

Example. We consider whether the mean BSI (Belonging/Social Interest) is the same after classifying people
on the basis of whether or not they’ve been abused and diagnosis.

Diagnosis Abused Not abused Mean

Anxiety 27.0 26.8 29.7


Depression 27.3 31.7 32.4
DCFS/Court Referred 23.0 37.1 26.9

Mean 26.6 31.7

ANOVA table:

df SS MS F p-value
Diagnosis 2 118.0 59.0 1.89 .1602
Ever abused 1 483.6 483.6 15.5 .0002*
DX E 2 387.0 193.5 6.19 .0035*
Error 62 1938.12 31.26
Total 67

Since the interaction is significant, let’s look at the individual mean BSI at each level of diagnosis on an
interaction plot.

Knowing how many people were reflected in each category (information that is not provided here) would
allow us to conduct 2-sample t tests at each level of diagnosis. Such tests reveal that there is a significant
difference in mean BSI between those who have ever been abused and those not abused only for those who
have a DCFS/Court Referred disorder. There is no statistically significant difference between these two
groups for those with a Depressive or Anxiety disorder (though it’s pretty close for those with an Anxiety
disorder).
Example. Promotional fliers. [Exercise 13.15, p. 821 in Moore/McCabe]

Means:
discount
promos 10 20 30 40
1 4.423 4.225 4.689 4.920
3 4.284 4.097 4.524 4.756
5 4.058 3.890 4.251 4.393
7 3.780 3.760 4.094 4.269

Standard Deviations:

discount
promos 10 20 30 40
1 0.18476 0.38561 0.23307 0.15202
3 0.20403 0.23462 0.27073 0.24291
5 0.17599 0.16289 0.26485 0.26854
7 0.21437 0.26179 0.24075 0.26992

Counts:
discount
promos 10 20 30 40
1 10 10 10 10
3 10 10 10 10
5 10 10 10 10
7 10 10 10 10

Sum Sq Df F value Pr(>F)


promos 8.36 3 47.73 <2e-16
discount 8.31 3 47.42 <2e-16
promos:discount 0.23 9 0.44 0.91
Residuals 8.41 144

discount promos
4.8

4.8

40 1
30 3
10 5
20 7
4.6

4.6
mean of expPrice

mean of expPrice
4.4

4.4
4.2

4.2
4.0

4.0
3.8

3.8

1 3 5 7 10 20 30 40

promos discount

Question: Was it worth plotting the interaction effects, or would we have learned the same things plotting
only the main effect?
44.1 One-Way Analysis of Variance 2

44.2 Two-Way Analysis of Variance 15

44.3 Experimental Design 40


One-Way Analysis
,

of Variance
Introduction
Problems in engineering often involve the exploration of the relationships between values taken by
a variable under different conditions. 41 introduced hypothesis testing which enables us to
compare two population means using hypotheses of the general form
H0 : µ1 = µ2
H1 : µ1 /= µ2
or, in the case of more than two populations,
H0 : µ1 = µ2 = µ3 = . . . = µk
H1 : H0 is not true
If we are comparing more than two population means, using the type of hypothesis testing referred
to above gets very clumsy and very time consuming. As you will see, the statistical technique called
Analysis of Variance (ANOVA) enables us to compare several populations simultaneously. We
might, for example need to compare the shear strengths of five different adhesives or the surface
toughness of six samples of steel which have received different surface hardening treatments.

,
1. One-way ANOVA
In this Workbook we deal with one-way analysis of variance (one-way ANOVA) and two-way analysis of
variance (two-way ANOVA). One-way ANOVA enables us to compare several means simultaneously
by using the F -test and enables us to draw conclusions about the variance present in the set of
samples we wish to compare.
Multiple (greater than two) samples may be investigated using the techniques of two-population
hypothesis testing. As an example, it is possible to do a comparison looking for variation in the
surface hardness present in (say) three samples of steel which have received different surface hardening
treatments by using hypothesis tests of the form
H0 : µ1 = µ2
H1 : µ1 /= µ2
We would have to compare all possible pairs of samples before reaching a conclusion. If we are
dealing with three samples we would need to perform a total of
3
3!
C2 = =3
1!2!
hypothesis tests. From a practical point of view this is not an efficient way of dealing with the
problem, especially since the number of tests required rises rapidly with the number of samples
involved. For example, an investigation involving ten samples would require
10!
10
C2 = = 45
8!2!
separate hypothesis tests.
There is also another crucially important reason why techniques involving such batteries of tests are
unacceptable. In the case of 10 samples mentioned above, if the probability of correctly accepting a
given null hypothesis is 0.95, then the probability of correctly accepting the null hypothesis
H0 : µ1 = µ2 = . . . = µ10
is (0.95)45 ≈0.10 and we have only a 10% chance of correctly accepting the null hypothesis for
all 45 tests. Clearly, such a low success rate is unacceptable. These problems may be avoided by
simultaneously testing the significance of the difference between a set of more than two population
means by using techniques known as the analysis of variance.
Essentially, we look at the variance between samples and the variance within samples and draw
conclusions from the results. Note that the variation between samples is due to assignable (or
controlled) causes often referred in general as treatments while the variation within samples is due
to chance. In the example above concerning the surface hardness present in three samples of steel
which have received different surface hardening treatments, the following diagrams illustrate the
differences which may occur when between sample and within sample variation is considered.
Case 1
In this case the variation within samples is roughly on a par with that occurring between samples.

s̄2
s̄1
s̄3

Sample 1 Sample 2 Sample 3

Figure 1
Case 2
In this case the variation within samples is considerably less than that occurring between samples.

s̄1

s̄2

s̄3

Sample 1 Sample 2 Sample 3

Figure 2
We argue that the greater the variation present between samples in comparison with the variation
present within samples the more likely it is that there are ‘real’ differences between the population
means, say µ1, µ2 and µ3. If such ‘real’ differences are shown to exist at a sufficiently high level
of significance, we may conclude that there is sufficient evidence to enable us to reject the null
hypothesis H0 : µ1 = µ2 = µ3.

Example of variance in data


This example looks at variance in data. Four machines are set up to produce alloy spacers for use in
the assembly of microlight aircraft. The spaces are supposed to be identical but the four machines
give rise to the following varied lengths in mm.
Machine A Machine B Machine C Machine D
46 56 55 49
54 55 51 53
48 56 50 57
46 60 51 60
56 53 53 51

4
Since the machines are set up to produce identical alloy spacers it is reasonable to ask if the evidence
we have suggests that the machine outputs are the same or different in some way. We are really
asking whether the sample means, say X̄A , X̄B , X̄C and X̄D , are different because of differences in
the respective population means, say µA , µB , µC and µD , or whether the differences in X̄A , X̄B , X̄C
and X̄D may be attributed to chance variation. Stated in terms of a hypothesis test, we would write
H0 : µA = µB = µC = µD
H1 : At least one mean is different from the others
In order to decide between the hypotheses, we calculate the mean of each sample and overall mean
(the mean of the means) and use these quantities to calculate the variation present between the
samples. We then calculate the variation present within samples. The following tables illustrate the
calculations.
H0 : µA = µB = µC = µD
H1 : At least one mean is different from the others
Machine A Machine B Machine C Machine D
46 56 55 49
54 55 51 53
48 56 50 57
46 60 51 60
56 53 53 51

X̄A = 50 X̄B = 56 X̄C = 52 X̄D = 54


The mean of the means is clearly
50 + 56 + 52 + 54
X̄ = = 53
4
so the variation present between samples may be calculated as

Σ1 D 2
2
STr = n−1 X̄i − X̄
i=A

1
= (50 − 53)2 + (56 − 53)2 + (52 − 53)2 + (54 − 53)2
4−1
20
= = 6.67 to 2 d.p.
3

Note that the notation STr


2
reflects the general use of the word ‘treatment’ to describe assignable
causes of variation between samples. This notation is not universal but it is fairly common.
Variation within samples
We now calculate the variation due to chance errors present within the samples and use the results to
obtain a pooled estimate of the variance, say S2E, present within the samples. After this calculation
we will be able to compare the two variances and draw conclusions. The variance present within the
samples may be calculated as follows.

HELM (2015): 5
Section 44.1: One-Way Analysis of Variance
Sample A
Σ 2 2 2 2 2
(X − X̄A ) = (46 — 50) + (54 — 50) + (48 — 50) + (46 — 50) + (56 — 50) = 88
2

Sample B
Σ 2 2 2 2 2
(X − X̄B ) = (56 — 56) + (55 — 56) + (56 — 56) + (60 — 56) + (53 — 56) = 26
2

Sample C
Σ 2 2 2 2 2
(X − X̄C ) = (55 — 52) + (51 — 52) + (50 — 52) + (51 — 52) + (53 — 52) = 16
2

Sample D
Σ
(X − X̄D ) = (49 — 54)2 + (53 − 54) 2+ (57 − 54) 2+ (60 − 54) +
2
(51 − 54) = 80
2 2

An obvious extension of the formula for a pooled variance gives


Σ Σ Σ Σ
(X − X̄A )2 + (X − X̄B )2 + (X − X̄C )2 + (X − X̄D )2
SE =
2
(nA − 1) + (nB − 1) + (nC − 1) + (nD − 1)
where nA, nB, nC and nD represent the number of members (5 in each case here) in each sample.
Note that the quantities comprising the denominator nA —1, · · · , nD − 1 are the number of degrees
of freedom present in each of the four samples. Hence our pooled estimate of the variance present
within the samples is given by
88 + 26 + 16 + 80
SE2 = = 13.13
4+4+4+4
We are now in a position to ask whether the variation between samples S2Tr is large in comparison
with the variation within samples S2E. The answer to this question enables us to decide whether the
difference in the calculated variations is sufficiently large to conclude that there is a difference in the
population means. That is, do we have sufficient evidence to reject H0?

Using the F -test


At first sight it seems reasonable to use the ratio
2
STr
F =
SE2
but in fact the ratio
nS2
F = Tr
,
SE2
where n is the sample size, is used since it can be shown that if H0 is true this ratio will have a value
of approximately unity while if H0 is not true the ratio will have a value greater that unity. This is
because the variance of a sample mean is σ2/n.
The test procedure (three steps) for the data used here is as follows.

(a) Find the value of F ;


(b) Find the number of degrees of freedom for both the numerator and denominator of the
ratio;
(c) Accept or reject depending on the value of F compared with the appropriate tabulated
value.
Step 1
The value of F is given by
5 × 6.67
2
nSTr
F = = = 2.54
SE2 13.13
Step 2
The number of degrees of freedom for S2Tr (the numerator) is
Number of samples − 1 = 3
The number of degrees of freedom for S2E (the denominator) is
Number of samples × (sample size − 1) = 4 × (5 − 1) = 16
Step 3
The critical value (5% level of significance) from the F -tables (Table 1 at the end of this Workbook)
is F(3,16) = 3.24 and since 2.54 < 3.224 we see that we cannot reject H0 on the basis of the evidence
available and conclude that in this case the variation present is due to chance. Note that the test
used is one-tailed.

ANOVA tables
It is usual to summarize the calculations we have seen so far in the form of an ANOVA table.
Essentially, the table gives us a method of recording the calculations leading to both the numerator
and the denominator of the expression
2
nSTr
F =
SE2
In addition, and importantly, ANOVA tables provide us with a useful means of checking the accuracy
of our calculations. A general ANOVA table is presented below with explanatory notes.
Define a = number of treatments, n = number of observations per sample.
Source of Sum of Squares Degrees Mean Square Value of
Variation SS of Freedom MS F Ratio
Σ
a 2 SSTr MS Tr
Between samples MS Tr =
SSTr = n X¯i − X¯ (a − 1) (a 2− 1) F =
MSE
(due to treatments) i=1 = nS nS2

= 2Tr
Differences between SE
means X̄i and X̄
Σ
a Σ
n
2
Within samples SS = X − X̄ j SSE
E ij a(n − 1) MSE =
(due to chance errors) i=1 j=1 a(n − 1)
Differences between = S2
E

individual observations
Xij and means X̄i
Σ
a Σ
n 2

TOTALS SS T = X ij − X̄¯ (an − 1)


i=1 j=1

HELM (2015): 7
Section 44.1: One-Way Analysis of Variance
In order to demonstrate this table for the example above we need to calculate
Σ
a Σ
n
2
SST = Xij − X̄
i=1 j=1

a measure of the total variation present in the data. Such calculations are easily done using a
computer (Microsoft Excel was used here), the result being
Σ
a Σ
n
2
SST = Xij − X̄ = 310
i=1 j=1

The ANOVA table becomes


Source of Sum of Squares Degrees of Mean Square Value of
Variation SS Freedom MS F Ratio
SSTr MS Tr
MS Tr = F =
Between samples (a − 1) MSE
100 3
(due to treatments) 100
= = 2.54
3
Differences between = 33.33
means X̄i and X̄

SSE
M SE =
Within samples a(n − 1)
210 16
(due to chance errors) 210
=
Differences between 16
individual observations = 13.13
Xij and means X̄i
TOTALS 310 19

It is possible to show theoretically that


SST = SSTr + SSE
that is
Σ a Σ
n
2
a
Σ 2
a
Σ Σ
n
2
Xij − X̄ =n X̄i − X̄ + Xij − X̄j
i=1 j=1 i=1 i=1 j=1

As you can see from the table, SSTr and SSE do indeed sum to give SST even though we can
calculate them separately. The same is true of the degrees of freedom.
Note that calculating these quantities separately does offer a check on the arithmetic but that using
the relationship can speed up the calculations by obviating the need to calculate (say) SST . As
you might expect, it is recommended that you check your calculations! However, you should note
that it is usual to calculate SST and SSTr and then find SSE by subtraction. This saves a lot of
unnecessary calculation but does not offer a check on the arithmetic. This shorter method will be
used throughout much of this Workbook.

8
Unequal sample sizes
So far we have assumed that the number of observations in each sample is the same. This is not a
necessary condition for the one-way ANOVA.

Key Point 1
Suppose that the number of samples is a and the numbers of observations are n1, n2, . . . , na. Then
the between-samples sum of squares can be calculated using
Σa
Ti2 G2
SS Tr = −
ni N
i=1
a a
Σ Σ
where Ti is the total for sample i, G = T i is the overall total and N = n i.
i=1 i=1

It has a − 1 degrees of freedom.


The total sum of squares can be calculated as before, or using
Σ
a
Σ
ni
G2
X ij −
2
SST =
i=1 j=1
N

It has N − 1 degrees of freedom.


The within-samples sum of squares can be found by subtraction:
SSE = SST − SSTr
It has (N − 1) − (a − 1) = N − a degrees of freedom.
Three fuel injection systems are tested for efficiency and the following coded data
are obtained.
System 1 System 2 System 3
48 60 57
56 56 55
46 53 52
45 60 50
50 51 51

Do the data support the hypothesis that the systems offer equivalent levels of efficiency?

10
Answer
Appropriate hypotheses are
H0 = µ1 = µ2 = µ3
H1 : At least one mean is different to the others
Variation between samples
System 1 System 2 System 3
48 60 57
56 56 55
46 53 52
45 60 50
50 51 51
X̄1 = 49 X̄2 = 56 X̄3 = 53
49 + 56 + 53
The mean of the means is X̄ = = 52.67 and the variation present between samples
3
is
Σ1 3 2
1
X̄i − X̄
2
STr = = (49 − 52.67)2 + (56 − 52.67)2 + (53 − 52.67)2 = 12.33
n − 1 i=1 3−1
Variation within samples
System 1
Σ
(X − X̄1 )2 = (48 − 49)2 + (56 − 49)2 + (46 − 49)2 + (45 − 49)2 + (51 − 49)2 = 76
System 2
Σ
(X − X̄2 )2 = (60 − 56)2 + (56 − 56)2 + (53 − 56)2 + (60 − 56)2 + (51 − 56)2 = 66
System 3
Σ
(X − X̄3 )2 = (57 − 53)2 + (55 − 53)2 + (52 − 53)2 + (50 − 53)2 + (51 − 53)2 = 34
Hence
Σ Σ Σ
(X − X̄1 )2 + (X − X̄2 )2 + (X − X̄3 )2 76 + 66 + 34
SE2 = = = 14.67
(n1 − 1) + (n2 − 1) + (n3 − 1) 4+4+4
nS2 5 × 12.33
The value of F is given by F = 2Tr = = 4.20
SE 14.67
The number of degrees of freedom for S2Tr is No. of samples −1 = 2
The number of degrees of freedom for S2E is No. of samples×(sample size − 1) = 12
The critical value (5% level of significance) from the F -tables (Table 1 at the end of this Workbook)
is F(2,12) = 3.89 and since 4.20 > 3.89 we conclude that we have sufficient evidence to reject H0
so that the injection systems are not of equivalent efficiency.
Exercises
1. The yield of a chemical process, expressed in percentage of the theoretical maximum, is mea-
sured with each of two catalysts, A, B, and with no catalyst (Control: C). Five observations
are made under each condition. Making the usual assumptions for an analysis of variance, test
the hypothesis that there is no difference in mean yield between the three conditions. Use the
5% level of significance.

Catalyst A Catalyst B Control C


79.2 81.5 74.8
80.1 80.7 76.5
77.4 80.5 74.7
77.6 81.7 74.8
77.8 80.6 74.9

2. Four large trucks, A, B, C, D, are used to move stone in a quarry. On a number of days,
the amount of fuel, in litres, used per tonne of stone moved is calculated for each truck. On
some days a particular truck might not be used. The data are as follows. Making the usual
assumptions for an analysis of variance, test the hypothesis that the mean amount of fuel used
per tonne of stone moved is the same for each truck. Use the 5% level of significance.

Truck Observations
A 0.21 0.21 0.21 0.21 0.20 0.19 0.18 0.21 0.22 0.21
B 0.22 0.22 0.25 0.21 0.21 0.22 0.20 0.23
C 0.21 0.18 0.18 0.19 0.20 0.18 0.19 0.19 0.20 0.20 0.20
D 0.20 0.20 0.21 0.21 0.21 0.19 0.20 0.20 0.21

12
Answers

1. We calculate the treatment totals for A: 392.1, B: 405.0 and C: 375.7. The overall total is
ΣΣ
1172.8 and y2 = 91792.68.

The total sum of squares is


1172.82
91792.68 − = 95.357
15
on 15 − 1 = 14 degrees of freedom.

The between treatments sum of squares is


1 2
1172.82
(392.1 + 405.0 + 375.7 ) −
2 2 = 86.257
5 15
on 3 − 1 = 2 degrees of freedom.

By subtraction, the residual sum of squares is

95.357 − 86.257 = 9.100

on 14 − 2 = 12 degrees of freedom.

The analysis of variance table is as follows:

Source of Sum of Degrees of Mean Variance


variation squares freedom square ratio
Treatment 86.257 2 43.129 56.873
Residual 9.100 12 0.758
Total 95.357 14

The upper 5% point of the F2,12 distribution is 3.89. The observed variance ratio is greater
than this so we conclude that the result is significant at the 5% level and we reject the null
hypothesis at this level. The evidence suggests that there are differences in the mean yields
between the three treatments.
Answer

2. We can summarise the data as follows.


Σ Σ
Truck y y2 n
A 2.05 0.4215 10
B 1.76 0.3888 8
C 2.12 0.4096 11
D 1.83 0.3725 9
Total 7.76 1.5924 38

The total sum of squares is


7.762 −3
1.5924 − = 7.7263 × 10
38
on 38 − 1 = 37 degrees of freedom.

The between trucks sum of squares is


2.052 1.762 2.122 1.832 7.762 −3
+ + + − = 3.4581 × 10
10 8 11 9 38
on 4 − 1 = 3 degrees of freedom.

By subtraction, the residual sum of squares is

7.7263 × 10−3 − 3.4581 × 10−3 = 4.2682 × 10−3

on 37 − 3 = 34 degrees of freedom.

The analysis of variance table is as follows:

Source of Sum of Degrees of Mean Variance


variation squares freedom square ratio
Trucks 3.4581 × 10−3 3 1.1527 × 10−3 9.1824
Residual 4.2682 × 10−3 34 0.1255 × 10−3

Total 7.7263 × 10−3 37

The upper 5% point of the F3,34 distribution is approximately 2.9. The observed variance
ratio is greater than this so we conclude that the result is significant at the 5% level and we
reject the null hypothesis at this level. The evidence suggests that there are differences in the
mean fuel consumption per tonne moved between the four trucks.

14
,

Two-Way Analysis
ofVariance
Introduction
In the one-way analysis of variance (Section 44.1) we consider the effect of one factor on the values
taken by a variable. Very often, in engineering investigations, the effects of two or more factors are
considered simultaneously.
The two-away ANOVA deals with the case where there are two factors. For example, we might
compare the fuel consumptions of four car engines under three types of driving conditions (e.g.
urban, rural, motorway). Sometimes we are interested in the effects of both factors. In other cases
one of the factors is a ‘nuisance factor’ which is not of particular interest in itself but, if we allow for
it in our analysis, we improve the power of our test for the other factor.
We can also allow for interaction effects between the two factors.
1. Two-way ANOVA without interaction
The previous Section considered a one-way classification analysis of variance, that is we looked at the
variations induced by one set of values of a factor (or treatments as we called them) by partitioning
the variation in the data into components representing ‘between treatments’ and ‘within treatments.’
In this Section we will look at the analysis of variance involving two factors or, as we might say,
two sets of treatments. In general terms, if we have two factors say A and B, there is no absolute
reason to assume that there is no interaction between the factors. However, as an introduction to
the two-way analysis of variance, we will consider the case occurring when there is no interaction
between factors and an experiment is run only once. Note that some authors take the view that
interaction may occur and that the residual sum of squares contains the effects of this interaction
even though the analysis does not, at this stage, allow us to separate it out and check its possible
effects on the experiment.
The following example builds on the previous example where we looked at the one-way analysis of
variance.

Example of variance in data


In Section 44.1 we considered an example concerning four machines producing alloy spacers. This
time we introduce an extra factor by considering both the machines producing the spacers and the
performance of the operators working with the machines. In this experiment, the data appear as
follows (spacer lengths in mm). Each operator made one spacer with each machine.
Operator Machine 1 Machine 2 Machine 3 Machine 4
1 46 56 55 47
2 54 55 51 56
3 48 56 50 58
4 46 60 51 59
5 51 53 53 55
In a case such as this we are looking for discernible difference between the operators (‘operator
effects’) on the one hand and the machines (‘machine effects’) on the other.
We suppose that the observation for operator i and machine j is taken from a normal distribution
with mean
µij = µ + αi + βj
Here αi is an operator effect and βj is a machine effect. Our hypotheses may be stated as follows.
H0 : µ1j = µ2j = µ3j = µ4j = µ5j = µ + βj
Operator Effects That is α1 = α2 = α3 = α4 = α5 = 0
H1 : At least one of the operator effects is different to the others
H : µ =µ =µ =µ =µ+α 0 i1
Machine Effects That is β1 = β2 = β3 = β4 = 0
i2 i3 i4
H1 : At least one of the machine
i
effects is different to the others
Note that the five operators and four machines give rise to data which has only one observation per
‘cell.’ For example, operator 2 using machine 3 produces a spacer 51 mm long, while operator 1 using
machine 2 produces a spacer which is 56 mm long. Note also that in this example we have referred
to the machines by number and not by letter. This is not particularly important but it will simplify

16
some of the notation used when we come to write out a general two-way ANOVA table shortly. We
obtain one observation per cell and cannot measure variation within a cell. In this case we cannot
check for interaction between the operator and the machine - the two factors used in this example.
Running an experiment several times results in multiple observations per cell and in this case we
should assume that there may be interaction between the factors and check for this. In the case
considered here (no interaction between factors), the required sums of squares build easily on the
relationship used in the one-way analysis of variance
SST = SSTr + SSE
to become
SST = SSA + SSB + SSE
where SSA represent the sums of squares corresponding to factors A and B. In order to calculate
the required sums of squares we lay out the table slightly more efficiently as follows.

Operator Machine Operator Operator SS


( X̄.j − X̄ )
(j ) (i ) Means (X̄.j ) (X̄.j − X̄ )2
1 2 3 4
1 46 56 55 47 51 —2 4
2 54 55 51 56 54 1 1
3 48 56 50 58 53 0 0
4 46 60 51 59 54 1 1
5 51 53 53 55 53 0 0
Machine
49 56 52 55 X̄ = 53 Sum = 0 6 × 4 = 24
Means (X̄i. )

( X̄.j − X̄ )
−4 3 —1 2 Sum = 0

Machine SS (X̄.j − X̄ )2

16 9 1 4 30 × 5 = 150

Note 1
The . notation means that summation takes place over that variable. For example, the five operator
46 + 56 + 55 + 47
means X̄.j are obtained as X̄.1 = = 51 and so on, while the four machine means
4
46 + 54 + 48 + 46 + 51
X̄i. are obtained as X̄1. = = 49 and so on. Put more generally (and this is
5
just an example)
Σ
m
xij
i=1
X̄.j =
m

HELM (2015): 17
Section 44.2: Two-Way Analysis of Variance
Note 2
Multiplying factors were used in the calculation of the machine sum of squares (four in this case
since there are four machines) and the operator sum of squares (five in this case since there are five
operators).
Note 3
The two statements ‘Sum = 0’ are included purely as arithmetic checks.
We also know that SSO = 24 and SSM = 150.
Calculating the error sum of squares
Note that the total sum of squares is easy to obtain and that the error sum of squares is then obtained
by straightforward subtraction.
The total sum of squares is given by summing the quantities (Xij − X̄ )2 for the table of entries.
Subtracting X̄ = 53 from each table member and squaring gives:
Operator (j) Machine (i)
1 2 3 4
1 49 9 4 36
2 1 4 4 9
3 25 9 9 25
4 49 49 4 36
5 4 0 0 4
The total sum of squares is SST = 330.
The error sum of squares is given by the result

SSE = SST − SSA − SSB


= 330 − 24 − 150
= 156

At this stage we display the general two-way ANOVA table and then particularise the table for the
example we are engaged in and draw conclusions by using the test as we have previously done with
one-way ANOVA.
A General Two-Way ANOVA Table

Source of Sum of Squares Degrees of Mean Square Value of


Variation SS Freedom MS F Ratio
Between samples a " #2 SSA MSA
(a − 1) MSA = F=
(due to factor A) Σ (a − 1) MSE
Differences between SSA = b X̄i. − X̄
i =1
means X̄i. and X̄
Between samples b " #2 SSB MSB
(b − 1) MSB = F =
(due to factor B ) Σ (b − 1) MSE
Differences between SSB = a X̄.j − X̄
j=1
means X̄.j and X̄
Within samples Σ
a Σ
b " #2 (a − 1) SSE
SS = X − X̄ − X̄ X̄ MSE =
(due to chance errors) E ij i. .j + ×(b − 1) (a − 1)(b − 1)
i=1 j=1
Differences between
individual observations
and fitted values.
Σ Σ
SS T =
a b " X − X̄#2
¯
Totals
i=1 j=1
ij
(ab − 1)

Hence the two-way ANOVA table for the example under consideration is

Source of Sum of Squares Degrees of Mean Square Value of


Variation SS Freedom MS F Ratio
Between samples
6
(due to factor A) 24 F =
24 4 =6 13
Differences between 4 = 0.46
means X̄i· and X̄
Between samples 50
(due to factor B ) 150 F =
150 3 = 50 13
Differences between 3 = 3.85
means X̄ j and X̄ height
·
Within samples 156
156 12 = 13
(due to chance errors) 12
Differences between
individual observations
and fitted values.
TOTALS 330 19

From the F -tables (at the end of the Workbook) F4,12 = 3.26 and F3,12 = 3.49. Since 0.46 < 3.26
we conclude that we do not have sufficient evidence to reject the null hypothesis that there is no
difference between the operators. Since 3.85 > 3.49 we conclude that we do have sufficient evidence
at the 5% level of significance to reject the null hypothesis that there in no difference between the
machines.
If we have two factors, A and B, with a levels of factor A and b levels of factor B, and one
observation per cell, we can calculate the sum of squares as follows.
The sum of squares for factor A is

1Σ 2 G
a 2
SS A = A − with a − 1 degrees of freedom
b i=1 i N
and the sum of squares for factor B is
1Σ G
B2 − 2
b
SS = with b − 1 degrees of freedom
B j
a N
j=1

where
Σ
b
Ai = Xij is the total for level i of factor A,
j=1

Σ
a

Bj = Xij is the total for level j of factor B,


i=1
a b
ΣΣ
G= Xij is the overall total of the data, and
i=1 j=1

N = ab is the total number of observations.


The total sum of squares is
a b 2
ΣΣ G
SST = X2ij − with N − 1 degrees of freedom
i=1 j=1
N

The within-samples, or ‘error’, sum of squares can be found by subtraction. So


SSE = SST − SSA − SSB
with
(N − 1) − (a − 1) − (b − 1) = (ab − 1) − (a − 1) − (b − 1)
= (a − 1)(b − 1) degrees of freedom
A vehicle manufacturer wishes to test the ability of three types of steel-alloy panels
to resist corrosion when three different paint types are applied. Three panels with
differing steel-alloy composition are coated with three types of paint. The following
coded data represent the ability of the painted panels to resist weathering.
Paint Steel-Alloy Steel-Alloy Steel-Alloy
Type 1 2 3
1 40 51 56
2 54 55 50
3 47 56 50
Use a two-way ANOVA procedure to determine whether any difference in the ability
of the panels to resist corrosion may be assigned to either the type of paint or the
steel-alloy composition of the panels.

Your solution
Do your working on separate paper and enter the main conclusions here.

21
Answer
Our hypotheses may be stated as follows.
H 0 : µ 1 = µ 2 = µ3
Paint type
H1 : At least one of the means is different from the others
H0 : µ1 = µ2 = µ3
Steel-Alloy
H1 : At least one of the means is different from the others
Following the methods of calculation outlined above we obtain:

Paint Type Steel-Alloy Paint Means Paint SS


(X̄.j − X̄ )
(j ) (i ) (X̄.j ) (X̄.j − X̄ )2
1 2 3
1 40 51 56 49 —2 4
2 54 55 50 53 2 4
3 47 54 52 51 0 0
Steel-Alloy 47 54 52 X̄ = 51 Sum = 0 8 × 3 = 24
Means (X̄i. )

(X̄.j − X̄ )
−4 3 1 Sum = 0

Steel-Alloy SS (X̄.j − X̄ )2
16 9 1 26 × 3 = 78

Hence SSPa = 24 and SSSt = 78. We now require SSE. The calculations are as follows.
In the table below, the predicted outputs are given in parentheses.

Paint Type Machine Paint Means (X̄.j − X̄ )


(j ) (i ) (X̄.j )

1 2 3
40 51 56
1
(45) (52) (50) 49 —2

54 55 50
2 53 2
(49) (56) (54)
47 56 50
3 (47) (54) (52) 51 0

Steel-Alloy
47 54 52 X̄ = 51 Sum = 0
Means (X̄i.)

(X̄.j − X̄ )
−4 3 1 Sum = 0

22
Answers continued
A table of squared residuals is easily obtained as
Paint Steel
(j) (i)
1 2 3
1 25 1 36
2 25 1 16
3 0 4 4
Hence the residual sum of squares is SSE = 112. The total sum of squares is given by subtracting
X̄ = 51 from each table member and squaring to obtain
Paint Steel
(j) (i)
1 2 3
1 121 0 25
2 9 16 1
3 16 25 1
The total sum of squares is SST = 214. We should now check to see that SST = SSPa+SSSt+SSE.
Substitution gives 214 = 24 + 78 + 112 which is correct.
The values of F are calculated as shown in the ANOVA table below.

Source of Sum of Squares Degrees of Mean Square Value of


Variation SS Freedom MS F Ratio
Between samples 24 12
24 2 MSA = = 12 F =
(due to treatment A, 12 28
say, paint ) = 0.429

Between samples 78
78 2 MSB = = 39
(due to treatment B, 2 39
F =
say, Steel − Alloy) 28
= 1.393
Within samples 112
112 4 MSE = = 28
(due to chance errors) 4

Totals 214 8

From the F -tables the critical values of F2,4 = 6.94 and since both of the calculated F values are
less than 6.94 we conclude that we do not have sufficient evidence to reject either null hypothesis.

23
2. Two-way ANOVA with interaction
The previous subsection looked at two-way ANOVA under the assumption that there was no inter-
action between the factors A and B. We will now look at the developments of two-way ANOVA
to take into account possible interaction between the factors under consideration. The following
analysis allows us to test to see whether we have sufficient evidence to reject the null hypothesis that
the amount of interaction is effectively zero.
To see how we might consider interaction between factors A and B taking place, look at the following
table which represents observations involving a two-factor experiment.
Factor B
Factor A 1 2 3 4 5
1 3 5 1 9 12
2 4 6 2 10 13
3 6 8 4 12 15
A brief inspection of the numbers in the five columns reveals that there is a constant difference
between any two rows as we move from column to column. Similarly there is a constant difference
between any two columns as we move from row to row. While the data are clearly contrived, it
does illustrate that in this case that no interaction arises from variations in the differences between
either rows or columns. Real data do not exhibit such behaviour in general of course, and we expect
differences to occur and so we must check to see if the differences are large enough to provide
sufficient evidence to reject the null hypothesis that the amount of interaction is effectively zero.
Notation
Let a represent the number of ‘levels’ present for factor A, denoted i = 1, . . . , a.
Let b represent the number of ‘levels’ present for factor B, denoted j = 1, . . . , b.
Let n represent the number of observations per cell. We assume that it is the same for each cell.
In the table above, a = 3, b = 5, n = 1. In the examples we shall consider, n will be greater than 1
and we will be able to check for interaction between the factors.
We suppose that the observations at level i of factor A and level j of factor B are taken from a
normal distribution with mean µij. When we assumed that there was no interaction, we used the
additive model
µij = µ + αi + βj
So, for example, the difference µi1 − µi2 between the means at levels 1 and 2 of factor B is equal
to β1 β−2 and does not depend upon the level of factor A. When we allow interaction, this is not
necessarily true and we write
µij = µ + αi + βj + γij
Here γij is an interaction effect. Now µi1 — µi2 = β1 − β2 + γi1 − γi2 so the difference between
two levels of factor B depends on the level of factor A.

24
Fixed and random effects
Often the levels assigned to a factor will be chosen deliberately. In this case the factors are said to be
fixed and we have a fixed effects model. If the levels are chosen at random from a population of all
possible levels, the factors are said to be random and we have a random effects model. Sometimes
one factor may be fixed while one may be random. In this case we have a mixed effects model. In
effect, we are asking whether we are interested in certain particular levels of a factor (fixed effects) or
whether we just regard the levels as a sample and are interested in the population in general (random
effects).
Calculation method
The data you will be working with will be set out in a manner similar to that shown below.
The table assumes n observations per cell and is shown along with a variety of totals and means
which will be used in the calculations of the various test statistics to follow.
Factor B
Factor A Level 1 Level 2 ... Level j . . . Level b Totals
x111 x121 x1j1 x1b1
Level 1 . . ... . ... . T1··
x11n x12n x1jn x1bn
x211 x221 x2j1 x2b1
Level 2 . . ... . ... . T2··
x21n x22n x2jn x2bn
. . . . . ... . .
xi11 xij1 xib1
Sum of datanin cell ,
Level i . Σ . ... . Ti··
xi1n (i,j) is Tij·= xijk xijn xibn
k=1
. . . . . ... . .
xa11 xa21 xaj1 xab1
Level a . . ... . ... . Ta··
xa1n xa2n xajn xabn
Totals T·1· T·2· ... T·j· ... T·b· T···
Notes

(a) T... represents the grand total of the data values so that
b a a b n
Σ Σ ΣΣΣ
T··· = T·j· = Ti·· = xijk
j=1 i=1 i=1 j=1 k=1

(b) Ti.. represents the total of the data in the ith row.
(c) T.j. represents the total of the data in the jth column.
(d) The total number of data entries is given by N = nab.
Partitioning the variation
We are now in a position to consider the partition of the total sum of the squared deviations from
the overall mean which we estimate as
T
x = ...
N
The total sum of the squared deviations is
a b n
ΣΣΣ
(xijk − x )2
i=1 j=1 k=1

and it can be shown that this quantity can be written as


SST = SSA + SSB + SSAB + SSE
where SST is the total sum of squares given by
a b n 2
ΣΣΣ T···
SST = x2ijk − N;
i=1 j=1 k=1
SSA is the suma of squares
2 2
due to variations caused by factor A given by
Σ Ti·· T···
SSA = −
i=1
bn N
SSB is the sum of squares due to variations caused by factor B given by
b 2 2
Σ T·j· T
SSB = − ···
j=1
an N
Note that bn means b × n which is the number of observations at each level of A and an means
a × n which is the number of observations at each level of B.
SSAB is the sum of the squares due to variations caused by the interaction of factors A and B and
is given by
Σa Σb
Tij·2 T··· 2
SS = − − SS — SS .
AB A B
j=1
n N
i=1

Σ
n
Note that the quantity Tij. = xijk is the sum of the data in the (i, j)th cell and that the quantity
a b k=1
2 2
Σ Σ Tij. T... is the sum of the squares between cells.

i=1 j=1 n N
SSE is the sum of the squares due to chance or experimental error and is given by
SSE = SST − SSA − SSB − SSAB
The number of degrees of freedom (N − 1) is partitioned as follows:
SST SSA SSB SSAB SSE
N −1 a−1 b−1 (a − 1)(b − 1) N − ab

26
Note that there are ab − 1 degrees of freedom between cells and that the number of degrees of
freedom for SSAB is given by
ab − 1 − (a − 1) − (b − 1) = (a − 1)(b − 1)
This gives rise to the following two-way ANOVA tables.

Two-Way ANOVA Table - Fixed-Effects Model

Source of Sum of squares Degrees of Mean Square Value of


Variation SS Freedom MS F Ratio
SSA MSA
Factor A SSA (a − 1) MSA = F =
(a − 1) MSE

SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSE

SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)

Two-Way ANOVA Table - Random-Effects Model

Source of Sum of squares Degrees of Mean Square Value of


Variation SS Freedom MS F Ratio
SSA MSA
Factor A SSA (a − 1) MSA = F =
(a − 1) MSAB

SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSAB

SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)
Two-Way ANOVA Table - Mixed-Effects Model

Case (i) A fixed and B random.

Source of Sum of squares Degrees of Mean Square Value of


Variation SS Freedom MS F Ratio
SSA MSA
Factor A SSA (a − 1) MSA = F =
(a − 1) MSAB

SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSE

SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)

Case (ii) A random and B fixed.

Source of Sum of squares Degrees of Mean Square Value of


Variation SS Freedom MS F Ratio
SSA MSA
Factor A SSA (a − 1) MSA = F =
(a − 1) MSE

SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSAB

SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)
Example 1
In an experiment to compare the effects of weathering on paint of three different
types, two identical surfaces coated with each type of paint were exposed in each
of four environments. Measurements of the degree of deterioration were made as
follows.
Environment 1 Environment 2 Environment 3 Environment 4
Paint A 10.89 10.74 9.94 11.25 9.88 10.13 14.11 12.84
Paint B 12.28 13.11 14.45 11.17 11.29 11.10 13.44 11.37
Paint C 10.68 10.30 10.89 10.97 10.61 11.00 12.22 11.32
Making the assumptions of normality, independence and equal variance, derive the
appropriate ANOVA tables and state the conclusions which may be drawn at the
5% level of significance in the following cases.

(a) The types of paint and the environments are chosen deliberately be-
cause the interest is in these paints and these environments.
(b) The types of paint are chosen deliberately because the interest is in
these paints but the environments are regarded as a sample of possible
environments.
(c) The types of paint are regarded as a random sample of possible paints
and the environments are regarded as a sample of possible environ-
ments.

Solution
We know that case (a) is described as a fixed-effects model, case (b) is described as a mixed-effects
model (paint type fixed) and case (c) is described as a random-effects model. In all three cases the
calculations necessary to find MSP (paints), MSN (environments), MSP and MSN are identical.
Only the calculation and interpretation of the test statistics will be different. The calculations are
shown below.
Subtracting 10 from each observation, the data become:
Environment 1 Environment 2 Environment 3 Environment 4 Total
Paint A 0.89 0.74 −0.06 1.25 −0.12 0.13 4.11 2.84 9.78
(total 1.63) (total 1.19) (total 0.01) (total 6.95)
Paint B 2.28 3.11 4.45 1.17 1.29 1.10 3.44 1.37 18.21
(total 5.39) (total 5.62) (total 2.39) (total 4.81)
Paint C 0.68 0.30 0.89 0.97 0.61 1.00 2.22 1.32 7.99
(total 0.98) (total 1.86) (total 1.61) (total 3.54)
Total 8.00 8.67 4.01 15.30 35.98

The total sum of squares is


35.982
SST = 0.892 + 0.742 + . . . + 1.322 − = 36.910
24
We can simplify the calculation by finding the between samples sum of squares
1 35.982
SSS = (1.632 + 5.392 + . . . + 3.542) − = 26.762
2 24

29
Solution (contd.)
Sum of squares for paints is
1 35.982
2 2 2
SSP = (9.78 + 18.15 + 7.99 ) − = 7.447
8 24
Sum of squares for environments is
1 35.982
2 2 2 2
SSN = (8.00 + 8.67 + 3.98 + 15.30 ) − = 10.950
6 24
So the interaction sum of squares is SSPN = SSS − SSP − SSN = 8.365 and
the residual sum of squares is SSE = SST −SSS = 10.148 The results are combined in the following
ANOVA table
Deg. of Sum of Mean Variance Variance Variance Ratio
Freedom Squares Square Ratio (fixed) Ratio (mixed) (random)
Paints 2 7.447 3.724 4.40 2.67 2.67
F2,12 = 3.89 F2,6 = 5.14 F2,6 = 5.14
Environments 3 10.950 3.650 4.31 4.31 2.61
F3,12 = 3.49 F3,12 = 3.49 F3,6 = 4.76
Interaction 6 8.365 1.394 1.65 1.65 1.65
F6,12 = 3.00 F6,12 = 3.00 F6,12 = 3.00
Treatment 11 26.762 2.433
combinations
Residual 12 10.148 0.846
Total 23 36.910

The following conclusions may be drawn. There is insufficient evidence to support the interaction
hypothesis in any case. Therefore we can look at the tests for the main effects.
Case (a) Since 4.40 > 3.89 we have sufficient evidence to conclude that paint type affects the
degree of deterioration. Since 4.07 > 3.49 we have sufficient evidence to conclude that environment
affects the degree of deterioration.
Case (b) Since 2.67 < 5.14 we do not have sufficient evidence to reject the hypothesis that paint
type has no effect on the degree of deterioration. Since 4.07 > 3.49 we have sufficient evidence to
conclude that environment affects the degree of deterioration.
Case (c) Since 2.67 < 5.14 we do not have sufficient evidence to reject the hypothesis that paint
type has no effect on the degree of deterioration. Since 2.61 < 4.76 we do not have sufficient
evidence to reject the hypothesis that environment has no effect on the degree of deterioration.
If the test for interaction had given a significant result then we would have concluded that there
was an interaction effect. Therefore the differences between the average degree of deterioration for
different paint types would have depended on the environment and there might have been no overall
‘best paint type’. We would have needed to compare combinations of paint types and environments.
However the relative sizes of the mean squares would have helped to indicate which effects were
most important.
A motor company wishes to check the influences of tyre type and shock absorber
settings on the roadholding of one of its cars. Two types of tyre are selected
from the tyre manufacturer who normally provides tyres for the company’s new
vehicles. A shock absorber with three possible settings is chosen from a range of
shock absorbers deemed to be suitable for the car. An experiment is conducted
by conducting roadholding tests using each tyre type and shock absorber setting.
The (coded) data resulting from the experiment are given below.
Factor Shock Absorber Setting
Tyre B1=Comfort B2=Normal B3=Sport
5 8 6
Type A1 6 5 9
8 3 12
9 10 12
Type A2 7 9 10
7 8 9
Decide whether an appropriate model has random-effects, mixed-effects or fixed-
effects and derive the appropriate ANOVA table. State clearly any conclusions
that may be drawn at the 5% level of significance.

Your solution
Do the calculations on separate paper and use the space here and on the following page for your
summary and conclusions.
Answer
We know that both the tyres and the shock absorbers are not chosen at random from populations
consisting of all possible tyre types and shock absorber types so that their influence is described by
a fixed-effects model. The calculations necessary to find MSA, MS B , MSAB and MSE are shown
below.
B1 B2 B3 Totals
5 8 6
A1 6 5 9
8 3 12
T11 = 19 T12 = 16 T13 = 27 T1·· = 62
9 10 12
A2 7 9 10
7 8 9
T21 = 23 T22 = 27 T23 = 31 T2·· = 81
Totals T·1· = 42 T·2· = 43 T·3· = 58 T··· = 143
The sums of squares calculations are:
Σ2 Σ 3 Σ 3 2 143 2 143 2
SS = x2 — T··· = 52 + 62 + . . . + 102 + 92 − = 1233 − = 96.944
T ijk
N 18 18
i=1 j=1 k=1
2 2 2 2 2 2
Σ Ti··2 T··· = 62 + 81 − 143 = 10405 − 143 = 20.056
SSA = bn− N 3×3 18 9 18
i=1
Σ T·j·
3 2
T···2 2 2 2 2 2
SS = − = 42 + 43 + 58 − 143 = 6977 − 143 = 26.778
B
j=1
an N 2×3 18 6 18
2 3 2 2 2 2 2
Σ Σ Tij· T 19 + . . . + 31 143
SSAB = − ··· − SS A — SS B = − − 20.056 − 26.778
i=1 j=1
n N 3 18
3565 1432
= − − 20.056 − 26.778 = 5.444
3 18
SSE = SST − SSA − SSB − SSAB = 96.944 − 20.056 − 26.778 − 5.444 = 44.666
The results are combined in the following ANOVA table.
Source SS DoF MS F (Fixed) F (Fixed)
MSA
Factor 20.056 1 20.056 5.39
MSE
A F1,12 = 4.75
MSB
Factor 26.778 2 13.389 3.60
MSE
B F2,12 = 3.89
MSAB
Interaction 5.444 2 2.722 0.731
MSE
AB F2,12 = 3.89
Residual 44.666 12 3.722
E
Totals 96.944 17

32
Answer
The following conclusions may be drawn:
Interaction: There is insufficient evidence to support the hypothesis that interaction takes place
between the factors.
Factor A: Since 5.39 > 4.75 we have sufficient evidence to reject the hypothesis that tyre type does
not affect the roadholding of the car.
Factor B: Since 3.60 < 3.89 we do not have sufficient evidence to reject the hypothesis that shock
absorber settings do not affect the roadholding of the car.

The variability of a measured characteristic of an electronic assembly is a source


of trouble for a manufacturer with global manufacturing and sales facilities. To
investigate the possible influences of assembly machines and testing stations on
the characteristic, an engineer chooses three testing stations and three assembly
machines from the large number of stations and machines in the possession of
the company. For each testing station - assembly machine combination, three
observations of the characteristic are made.
The (coded) data resulting from the experiment are given below.
Factor Testing Station
Assembly Machine B1 B2 B3
2.3 3.7 3.1
A1 3.4 2.8 3.2
3.5 3.7 3.5
3.5 3.9 3.3
A2 2.6 3.9 3.4
3.6 3.4 3.5
2.4 3.5 2.6
A3 2.7 3.2 2.6
2.8 3.5 2.5
Decide whether an appropriate model has random-effects, mixed-effects or fixed-
effects and derive the appropriate ANOVA table.
State clearly any conclusions that may be drawn at the 5% level of significance.

Your solution
Do the calculations on separate paper and use the space here and on the following page for your
summary and conclusions.
Your solution contd.

Answer
Both the machines and the testing stations are effectively chosen at random from populations
consisting of all possible types so that their influence is described by a random-effects model. The
calculations necessary to find MSA, MS B , MSAB and MSE are shown below.
B1 B2 B3 Totals
2.3 3.7 3.1
A1 3.4 2.8 3.2
3.5 3.7 3.5
T11 = 9.2 T12 = 10.2 T13 = 9.8 T1·· = 29.2
3.5 3.9 3.3
A2 2.6 3.9 3.4
3.6 3.4 3.5
T21 = 9.7 T22 = 11.2 T23 = 10.2 T2·· = 31.1
2.4 3.5 2.6
A3 2.7 3.2 2.6
2.8 3.5 2.5
T31 = 7.9 T32 = 10.2 T33 = 7.7 T3·· = 25.8
Totals T·1· = 26.8 T·2· = 31.6 T·3· = 27.7 T··· = 86.1
a = 3, b = 3, n = 3, N = 27 and the sums of squares calculations are:
3 3 3 2 2
ΣΣΣ T··· 86.1
= 2.32 + 3.42 + . . . + 2.62 + 2.52 = 5.907
SST = x2ijk − N − 27
i=1 j=1 k=1
3 2 2 2 2 2 2
Σ Ti·· T···
SSA = − = 29.2 + 31.1 + 25.8 − 86.1 = 1.602
i=1
bn N 3×3 27
Σ T·j·2 T 2
3 2 2 2 2
SS = − ···
= 26.8 + 31.6 + 27.7 − 86.1 = 1.447
B
j=1
an N 3×3 27
3 3 2 2
Σ Σ Tij· T
SSAB = − ··· − SS A — SSB
i=1 j=1
n N
9.22 + 10.22 + . . . + 10.22 + 7.72 86.12
= − − 1.602 − 1.447 = 0.398
3 27
SSE = SST − SSA − SSB − SSAB = 5.907 − 1.602 − 1.447 − 0.398 = 2.46
Answer continued
The results are combined in the following ANOVA table

Source SS DoF MS F (Random) F (Random)


MSA
Factor 1.602 2 0.801 8.05
MSAB
A F2,4 = 6.94
(Machines)
MSB
Factor 1.447 2 0.724 7.28
MSAB
B F2,4 = 6.94
(Stations)
MSAB
Interaction 0.398 4 0.099(5) 0.728
MSE
AB F4,18 = 2.93
Residual 2.460 18 0.136
E
Totals 5.907 26
The following conclusions may be drawn.
Interaction: There is insufficient evidence to support the hypothesis that interaction takes place
between the factors.
Factor A: Since 8.05 > 6.94 we have sufficient evidence to reject the hypothesis that the assembly
machines do not affect the assembly characteristic.
Factor B: Since 7.28 > 6.94 we have sufficient evidence to reject the hypothesis that the choice of
testing station does not affect the assembly characteristic.

3. Two-way ANOVA versus one-way ANOVA


You should note that a two-way ANOVA design is rather more efficient than a one-way design. In
the last example, we could fix the testing station and look at the electronic assemblies produced by a
variety of machines. We would have to replicate such an experiment for every testing station. It would
be very difficult (impossible!) to exactly duplicate the same conditions for all of the experiments.
This implies that the consequent experimental error could be very large. Remember also that in a
one-way design we cannot check for interaction between the factors involved in the experiment. The
three main advantages of a two-way ANOVA may be stated as follows:

(a) It is possible to simultaneously test the effects of two factors. This saves both time and
money.
(b) It is possible to determine the level of interaction present between the factors involved.
(c) The effect of one factor can be investigated over a variety of levels of another and so
any conclusions reached may be applicable over a range of situations rather than a single
situation.

HELM (2015): 35
Section 44.2: Two-Way Analysis of Variance
Exercises
1. The temperatures, in Celsius, at three locations in the engine of a vehicle are measured after
each of five test runs. The data are as follows. Making the usual assumptions for a two-
way analysis of variance without replication, test the hypothesis that there is no systematic
difference in temperatures between the three locations. Use the 5% level of significance.

Location Run 1 Run 2 Run 3 Run 4 Run 5


A 72.8 77.3 82.9 69.4 74.6
B 71.5 72.4 80.7 67.0 74.0
C 70.8 74.0 79.1 69.0 75.4

2. Waste cooling water from a large engineering works is filtered before being released into the
environment. Three separate discharge pipes are used, each with its own filter. Five samples
of water are taken on each of four days from each of the three discharge pipes and the
concentrations of a pollutant, in parts per million, are measured. The data are given below.
Analyse the data to test for differences between the discharge pipes. Allow for effects due to
pipes and days and for an interaction effect. Treat the pipe effects as fixed and the day effects
as random. Use the 5% level of significance.

Day Pipe A
1 160 181 163 173 178
2 175 170 219 166 171
3 169 186 179 178 183
4 230 206 216 195 250
Day Pipe B
1 172 164 186 185 172
2 177 170 156 140 155
3 193 194 189 156 181
4 212 235 195 206 209
Day Pipe C
1 214 196 207 219 200
2 186 184 181 189 179
3 209 220 199 185 228
4 254 293 283 262 259
Answers
1. We calculate totals as follows.
Run Total Location Total
1 215.1 A 377.0
2 223.7 B 365.6
3 242.7 C 368.3
4 205.4 Total 1110.9
5 224.0
Total 1110.9
ΣΣ
y2ij = 82552.17
The total sum of squares is
1110.92
8255217 − = 278.916 on 15 − 1 = 14 degrees of freedom.
15
The between-runs sum of squares is
1 1110.92
(215.12 + 223.72 + 242.72 + 205.42 + 224.02) − = 252.796
3 15
on 5 − 1 = 4 degrees of freedom.
The between-locations sum of squares is
1 1110.92
(377.02 + 365.62 + 368.32) − = 14.196 on 3 − 1 = 2 degrees of freedom.
5 15
By subtraction, the residual sum of squares is
278.916 − 252.796 − 14.196 = 11.924 on 14 − 4 − 2 = 8 degrees of freedom.
The analysis of variance table is as follows.

Source of Sum of Degrees of Mean Variance


variation squares freedom square ratio
Runs 252.796 4 63.199
Locations 14.196 2 7.098 4.762
Residual 11.924 8 1.491
Total 278.916 14

The upper 5% point of the F2,8 distribution is 4.46. The observed variance ratio is greater than this
so we conclude that the result is significant at the 5% level and reject the null hypothesis at this
level. The evidence suggests that there are systematic differences between the temperatures at the
three locations. Note that the Runs mean square is large compared to the Residual mean square
showing that it was useful to allow for differences between runs.

HELM (2015): 37
Section 44.2: Two-Way Analysis of Variance
Answers continued
2. We calculate totals as follows.
Day 1 Day 2 Day 3 Day 4 Total
Pipe A 855 901 895 1097 3748
Pipe B 879 798 913 1057 3647
Pipe C 1036 919 1041 1351 4347
Total 2770 2618 2849 3505 11742
ΣΣΣ
y2ijk = 2356870
The total number of observations is N = 60.
The total sum of squares is

117422
2356870 − = 58960.6
60
on 60 − 1 = 59 degrees of freedom.
The between-cells sum of squares is
1 2 2 117422
(855 + · · · + 1351 ) − = 58960.6
5 60
on 12 − 1 = 11 degrees of freedom, where by “cell” we mean the combination of a pipe and a day.
By subtraction, the residual sum of squares is
58960.6 − 48943.0 = 10017.6
on 59 − 11 = 48 degrees of freedom.
The between-days sum of squares is
1 2 2 2 2 117422
(2770 + 2618 + 2849 + 3505 ) − = 30667.3
15 60
on 4 − 1 = 3 degrees of freedom.
The between-pipes sum of squares is
1 2 2 2 117422
(3748 + 3647 + 4347 ) − = 14316.7
20 60
on 3 − 1 = 2 degrees of freedom.
By subtraction the interaction sum of squares is
48943.0 − 30667.3 − 14316.7 = 3959.0
on 11 − 3 − 2 = 6 degrees of freedom.

38
Answers continued
The analysis of variance table is as follows.

Source of Sum of Degrees of Mean Variance


variation squares freedom square ratio
Pipes 14316.7 2 7158.4 10.85
Days 30667.3 3 10222.4 48.98
Interaction 3959.0 6 659.8 3.16
Cells 48943.0 11 4449.4 21.32
Residual 10017.6 48 208.7
Total 58960.6 59

Notice that, because Days are treated as a random effect, we divide the Pipes mean square by the
Interaction mean square rather than by the Residual mean square.
The upper 5% point of the F6,48 distribution is approximately 2.3. Thus the Interaction variance
ratio is significant at the 5% level and we reject the null hypothesis of n o interaction. We must
therefore conclude that there are differences between the means for pipes and for days and that
the difference between one pipe and another varies from day to day. Looking at the mean squares,
however, we see that both the Pipes and Days mean squares are much bigger than the Interaction
mean square. Therefore it seems that the interaction effect is relatively small compared to the
differences between days and between pipes.
What is Decision Theory?
Decision theory is an interdisciplinary approach to arrive at the decisions that are the most
advantageous given an uncertain environment.

Key Takeaways

 Decision theory is an interdisciplinary approach to arrive at the decisions that are the
most advantageous given an uncertain environment.
 Decision theory brings together psychology, statistics, philosophy, and mathematics to
analyze the decision-making process.
 Descriptive, prescriptive, and normative are three main areas of decision theory and each
studies a different type of decision making.

Understanding Decision Theory


Decision theory brings together psychology, statistics, philosophy, and mathematics to analyze
the decision-making process. Decision theory is closely related to game theory and is studied
within the context of understanding the activities and decisions underpinning activities such as
auctions, evolution, and marketing.

There are three main areas of decision theory. Each studies a different type of decision making.

1. Descriptive decision theory: examines how irrational beings make decisions.


2. Prescriptive decision theory: tries to provide guidelines for agents to make the best
possible decisions given an uncertain decision-making framework.
3. Normative decision theory: provides guidance for making decisions given a set of values.

Decision theory framework generally identifies three types of decision classes:

1. Decisions under certainty: an abundance of information leads to an obvious decision


2. Decisions under uncertainty: analysis of known and unknown variables lead to the best
probabilistic decision.
3. Decisions under conflict: a reactive approach that involves anticipating potential
consequences to the decision, prior to making a decision.
Decision Under Uncertainty: Prisoner's Dilemma
A common example of decision theory stems from the prisoner's dilemma in which two
individuals are faced with an uncertain decision where the outcome is not only based on their
personal decision, but also on that of the other individual. Since both parties do not know what
actions the other person will take, this results in an uncertain decision framework. While
mathematics and statistical models determine what the optimal decision should be, psychology
and philosophy introduce factors of human behaviors to suggest the most likely outcome.

Decision Tree

Decision Tree : Decision tree is the most powerful and popular tool for classification and
prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a
test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal
node) holds a class label.
A decision tree for the concept PlayTennis.

Construction of Decision Tree :


A tree can be “learned” by splitting the source set into subsets based on an attribute value test.
This process is repeated on each derived subset in a recursive manner called recursive
partitioning. The recursion is completed when the subset at a node all has the same value of the
target variable, or when splitting no longer adds value to the predictions. The construction of
decision tree classifier does not require any domain knowledge or parameter setting, and
therefore is appropriate for exploratory knowledge discovery. Decision trees can handle high
dimensional data. In general decision tree classifier has good accuracy. Decision tree induction is
a typical inductive approach to learn knowledge on classification.

Decision Tree Representation :


Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. An instance is classified by starting at the root
node of the tree,testing the attribute specified by this node,then moving down the tree branch
corresponding to the value of the attribute as shown in the above figure.This process is then
repeated for the subtree rooted at the new node.

The decision tree in above figure classifies a particular morning according to whether it is
suitable for playing tennis and returning the classification associated with the particular leaf.(in
this case Yes or No).
For example,the instance

(Outlook = Rain, Temperature = Hot, Humidity = High, Wind = Strong )

would be sorted down the leftmost branch of this decision tree and would therefore be classified
as a negative instance.

In other words we can say that decision tree represent a disjunction of conjunctions of constraints
on the attribute values of instances.

(Outlook = Sunny ^ Humidity = Normal) v (Outllok = Overcast) v (Outlook = Rain ^ Wind =


Weak)

Strengths and Weakness of Decision Tree approach


The strengths of decision tree methods are:

 Decision trees are able to generate understandable rules.


 Decision trees perform classification without requiring much computation.
 Decision trees are able to handle both continuous and categorical variables.
 Decision trees provide a clear indication of which fields are most important for
prediction or classification.

The weaknesses of decision tree methods :

 Decision trees are less appropriate for estimation tasks where the goal is to predict
the value of a continuous attribute.
 Decision trees are prone to errors in classification problems with many class and
relatively small number of training examples.
 Decision tree can be computationally expensive to train. The process of growing a
decision tree is computationally expensive. At each node, each candidate splitting
field must be sorted before its best split can be found. In some algorithms,
combinations of fields are used and a search must be made for optimal combining
weights. Pruning algorithms can also be expensive since many candidate sub-trees must
be formed and compared.
For the students of

M. Com. (Applied Economics) Sem. IV

Paper: Research Methodology (Unit IV)

Note: Study material may be useful for the courses wherever Research Methodology paper is
being taught.

Prepared by:
Dr. Anoop Kumar Singh
Dept. of Applied Economics,
University of Lucknow
Topic: Chi Square- Test

The 2 test (pronounced as chi-square test) is an important and popular test of hypothesis which
fall is categorized in non-parametric test. This test was first introduced by Karl Pearson in the
year 1900.

It is used to find out whether there is any significant difference between observed frequencies
and expected frequencies pertaining to any particular phenomenon. Here frequencies are shown
in the different cells (categories) of a so-called contingency table. It is noteworthy that we take
the observations in categorical form or rank order, but not in continuation or normal distribution.
The test is applied to assess how likely the observed frequencies would be assuming the null
hypothesis is true.
This test is also useful in ascertaining the independence of two random variables based on
observations of these variables.
This is a non parametric test which is being extensively used for the following reasons:
1. This test is a Distribution free method, which does not rely on assumptions that the data are
drawn from a given parametric family of probability distributions.
2. This is easier to compute and simple enough to understand as compared to parametric test.
3. This test can be used in the situations where parametric test are not appropriate or
measurements prohibit the use of parametric tests.
It is defined as:

(𝑂−𝐸)2
2 = ∑ 𝐸

Where O refers to the observed frequencies and E refers to the expected frequencies.

1
Uses of Chi-Square Test

Chi Square test has a large number of applications where paremertic tests can not be applied.
Their uses can be summarized as under along with examples:

(a) A test of independence.

This test is helpful in detecting the association between two or more attributes. Suppose we
have N observations classified according to two attributes. By applying this test on the given
observations (data) we try to find out whether the attributes have some association or they are
independent. This association may be positive, negative or absence of association. For example
we can find out whether there is any association between regularity in class and division of
passing of the students, similarly we can find out whether quinine is effective in controlling fever
or not. In order to test whether or not the attributes are associated we take the null hypothesis
that there is no association in the attributes under study. In other words, the two attributes are
independent.

After computing the value of chi square, we compare the calculated value with its corresponding
critical value for the given degree of freedom at a certain level of significance. If calculate value
of 2 is less than critical or table value, null hypothesis is said to be accepted and it is
concluded that two attributes have no association that means they are independent. On the
other hand, if the calculated value is greater than the table value, it means that the results of the
experiment do not support the hypothesis and hypothesis is rejected, and it is concluded that
the attributes are associated.

Illustration 1: From the data given in the following table, find out whether there is any
relationship between gender and the preference of colour.

Colour Male Female Total


Red 25 45 70
Blue 45 25 70
Green 50 10 60
Total 120 80 200

(Given : For  =2, 2 0.05 = 5.991)

2
Solution: Let us take the following hypothesis:

Null Hypothesis 𝐻0 : There is no relationship between gender and preference of colour.

Alternative Hypothesis 𝐻𝑎 : There is relationship between gender and preference of colour.

We have to first calculate the expected value for the observed frequencies. These are shown
below along with the observed frequencies:

Colour Gender O E O-E (O-E)2 (O-E)2/E


Red M 25 42 -17 289 6.88
F 45 28 17 289 10.32
Blue M 45 42 3 9 0.21
F 25 28 -3 9 0.32
Green M 50 36 14 196 5.44
F 10 24 -14 196 8.16
2= 31.33

The degree of freedom are (r-1) (c-1) = (3-1) (2-1) = 2.

The critical value of 2 for 2 degrees of freedom at 5% level of significance is 5.991.

Since the calculated 2 =31.33 exceeds the critical value of 2 , the null hypothesis is rejected.
Hence, the conclusion is that there is a definite relationship between gender and preference of
colour.

(B) A test of goodness of fit

It is the most important utility of the Chi Square test. This method is mainly used for testing of

goodness of fit. It attempts to set up whether an observed frequency distribution differs from an

estimated frequency distribution. When an ideal frequency curve whether normal or some other

type is fitted to the data, we are interested in finding out how well this curve fits with the

observed facts.

3
The following steps are followed for the above said purpose:

i. A null and alternative hypothesis pertaining to the enquiry are established,

ii. A level of significance is chosen for rejection of the null hypothesis.

iii. A random sample of observations is drawn from a relevant statistical population.

iv. On the basis of given actual observations, expected or theoretical frequencies are derived

through probability. This generally takes the form of assuming that a particular probability

distribution is applicable to the statistical population under consideration.

v. The observed frequencies are compared with the expected or theoretical frequencies.

vi. If the calculated value of 2 is less than the table value at a certain level of significance

(generally 5% level) and for certain degrees of freedom the, fit is considered to be good. i.e.. the

divergence between the actual and expected frequencies is attributed to fluctuations of simple

sampling. On the other hand, if the calculated value of 2 is greater than the table value, the fit

is considered to be poor i.e. it cannot be attributed to the fluctuations of simple sampling rather it

is due to the inadequacy of the theory to fit the observed facts.

Illustration 2:

In an anti malaria campaign in a certain area, quinine was administered to 812 persons out of a
total population of 3248. The number of fever cases is shown below:

Treatment Fever (A) No fever (a) Total


Quinine (B) 140(AB) 30 (aB) 170 (B)
No Quinine (b) 60(Ab) 20 (ab) 80 (b)
Total 200(A) 50 (a) 250 (N)

Discuss the usefulness of quinine in checking malaria.

(Given: For  =1, 2 0.05 = 3.84)

4
Solution: Let us take the following hypotheses:

Null Hypothesis 𝐻𝑂 : Quinine is not effective in checking malaria.

Alternative Hypothesis 𝐻𝑎 : Quinine is effective in checking malaria.

Applying 2 test:
(𝐴)𝑋(𝐵) 200𝑋 170
Expectated frequency of say AB = = = 136
𝑁 250

Or 𝐸1 , i.e., expected frequency corresponding to first row and first column is 60.

The table of expected frequencies shall be :

Treatment Fever No Fever Total


Quinine 136 34 170
No quinine 64 16 80
Total 200 50 250 (N)

Computation of Chi Square value

O E (O-E)2 (O-E)2 /E
140 136 16 0.118
60 64 16 0.250
30 34 16 0.471
20 16 16 1.000
(𝑂−𝐸)2
∑ = 1.839
𝐸

(𝑂−𝐸)2
2 = ∑ 𝐸
= 1.839

Degree of freedom = (r-1) (c-1) = (2-1) (2-1) = 1

Table Value: For  =1, 2 0.05 = 3.84

The calculated value of 2 i.e. 1.839 is less than the table value i.e. 3.84, the null hypothesis is
accepted. Hence quinine is not useful in checking malaria.

5
(C) A test of homogeneity

The 2 test of homogeneity is an extension of the 2 test of independence. Such tests indicate
whether two or more independent samples are drawn from the same population or from different
populations. Instead of one sample as we use in the independence problem, we shall now have
two or more samples. Supposes a test is given to students in two different higher secondary
schools. The sample size in both the cases is the same. The question we have to ask: is there any
difference between the two higher secondary schools? In order to find the answer, we have to set
up the null hypothesis that the two samples came from the same population. The word
‘homogeneous’ is used frequently in Statistics to indicate ‘the same’ or ‘equal’. Accordingly, we
can say that we want to test in our example whether the two samples are homogeneous. Thus, the
test is called a test of homogeneity.

Illustration 3: Two hundred bolts were selected at random from the output of each of the five
machines. The number of defective bolts found were 5, 9, 13, 7 and 6 . Is there a significant
difference among the machines? Use 5% level of significance.

( Given: For  =4, 2 0.05 = 9.488)

Solution: Let us take the following hypothesis:

𝐻𝑂 : There is no significant difference among the machines.

𝐻𝑎 : There is significant difference among the machines.

As there are five machines, the total number of defective bolts should be equally distributed
among these machines. That is how we can get expected frequencies as under:

Here expected no. of defective bolts for each machine (E) =


𝑆𝑢𝑚 𝑜𝑓 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑏𝑜𝑙𝑡𝑠 40
= = 8.
𝑁𝑜. 𝑜𝑓 𝑚𝑎𝑐ℎ𝑖𝑛𝑒𝑠 𝑝𝑟𝑜𝑑𝑢𝑐𝑖𝑛𝑔 𝑡ℎ𝑒𝑠𝑒 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑏𝑜𝑙𝑡𝑠 5

6
Computation of Chi Square test

Machine O E O-E (O-E)2 (O-E)2/E


1 5 8 -3 9 1.125
2 9 8 1 1 0.125
3 13 8 5 25 3.125
4 7 8 -1 1 0.125
5 6 8 -2 4 0.5
∑(O-E)2/E= 5.00

Decision: The critical value of 2 at 0.05 level of significance for 4 degrees of freedom is 9.488.
As the calculated value of 2 = 5 is less than the critical value, 𝐻𝑂 is accepted. In other words,
the difference among the five machines in respect of defective bolts is not significant.

7
CHI-SQUARE TEST

DR RAMAKANTH
Introduction
• The Chi-square test is one of the most commonly used non-parametric
test, in which the sampling distribution of the test statistic is a chi-square
distribution, when the null hypothesis is true.

• It was introduced by Karl Pearson as a test of association. The Greek Letter


χ2 is used to denote this test.

• It can be applied when there are few or no assumptions about the


population parameter.

• It can be applied on categorical data or qualitative data using a contingency


table.

• Used to evaluate unpaired/unrelated samples and proportions.


Chi-squared distribution
• The distribution of the chi-square statistic is called the chi-square
distribution.
• The chi-squared distribution with k degrees of freedom is the distribution
of a sum of the squares of k independent standard normal random
variables. It is determined by the degrees of freedom.
• The simplest chi-squared distribution is the square of a standard normal
distribution.
• The chi-squared distribution is used primarily in hypothesis testing.
• The chi-square distribution has the following properties:
1. The mean of the distribution is equal to the number of degrees of
freedom: μ = v.
2. The variance is equal to two times the number of degrees of freedom:
σ2 = 2 * v
3. The 2 distribution is not symmetrical and all the values are positive.
The distribution is described by degrees of freedom. For each
degrees of freedom we have asymmetric curves.
4. As the degrees of freedom increase, the chi-square curve
approaches a normal distribution.
Cumulative Probability and the Chi-
Square Distribution
• The chi-square distribution is constructed so that the total area under the
curve is equal to 1. The area under the curve between 0 and a particular
chi-square value is a cumulative probability associated with that chi-
square value.
• Ex: The shaded area represents a cumulative probability associated with a
chi-square statistic equal to A; that is, it is the probability that the value of
a chi-square statistic will fall between 0 and A.
Contingency table
• A contingency table is a type of table in a matrix format that displays
the frequency distribution of the variables.
• They provide a basic picture of the interrelation between two variables and
can help find interactions between them.

• The chi-square statistic compares the observed count in each table cell to
the count which would be expected under the assumption of no association
between the row and column classifications.
Degrees of freedom
• The number of independent pieces of information which are free to vary, that
go into the estimate of a parameter is called the degrees of freedom.
• In general, the degrees of freedom of an estimate of a parameter is equal to
the number of independent scores that go into the estimate minus the
number of parameters used as intermediate steps in the estimation of the
parameter itself (i.e. the sample variance has N-1 degrees of freedom, since
it is computed from N random scores minus the only 1 parameter estimated
as intermediate step, which is the sample mean).
• The number of degrees of freedom for ‘n’ observations is ‘n-k’ and is usually
denoted by ‘ν ’, where ‘k’ is the number of independent linear constraints
imposed upon them. It is the only parameter of the chi-square distribution.
• The degrees of freedom for a chi squared contingency table can be
calculated as:
Chi Square formula
• The chi-squared test is used to determine whether there is a significant
difference between the expected frequencies and the observed
frequencies in one or more categories.
• The value of χ 2 is calculated as:

The observed frequencies are the frequencies obtained from the


observation, which are sample frequencies.
The expected frequencies are the calculated frequencies.
Alternate χ 2 Formula

The alternate χ 2 formula applies only to 2x2 tables


Characteristics of Chi-Square test
1. It is often regarded as a non-parametric test where no parameters
regarding the rigidity of populations are required, such as mean and SD.
2. It is based on frequencies.
3. It encompasses the additive property of differences between observed
and expected frequencies.
4. It tests the hypothesis about the independence of attributes.
5. It is preferred in analyzing complex contingency tables.
Steps in solving problems related to
Chi-Square test
• Calculate the expected frequencies
STEP 1

• Take the difference between the observed and expected


frequencies and obtain the squares of these differences
STEP 2 (O-E)2

• Divide the values obtained in Step 2 by the respective


expected frequency, E and add all the values to get the
STEP 3 value according to the formula given by:
Conditions for applying Chi-Square test
1. The data used in Chi-Square test must be quantitative and in the form of
frequencies, which must be absolute and not in relative terms.
2. The total number of observations collected for this test must be large ( at
least 10) and should be done on a random basis.
3. Each of the observations which make up the sample of this test must be
independent of each other.
4. The expected frequency of any item or cell must not be less than 5; the
frequencies of adjacent items or cells should be polled together in order to
make it more than 5.
5. This test is used only for drawing inferences through test of the hypothesis,
so it cannot be used for estimation of parameter value.
Practical applications of Chi-Square
test
• The applications of Chi-Square test include testing:
1. The significance of sample & population variances [σ2 s & σ2 p]
2. The goodness of fit of a theoretical distribution: Testing for goodness of
fit determines if an observed frequency distribution fits/matches a
theoretical frequency distribution (Binomial distribution, Poisson
distribution or Normal distribution). These test results are helpful to
know whether the samples are drawn from identical distributions or
not. When the calculated value of χ2 is less than the table value at
certain level of significance, the fit is considered to be good one and if
the calculated value is greater than the table value, the fit is not
considered to be good.
Table/Critical values of  2
3. The independence in a contingency table:
– Testing independence determines whether two or more observations
across two populations are dependent on each other.
– If the calculated value is less than the table value at certain level of
significance for a given degree of freedom, then it is concluded that
null hypothesis is true, which means that two attributes are
independent and hence not associated.
– If calculated value is greater than the table value, then the null
hypothesis is rejected, which means that two attributes are
dependent.
4. The chi-square test can be used to test the strength of the association
between exposure and disease in a cohort study, an unmatched case-
control study, or a cross-sectional study.
Chi-Square Test

Parametric Non-Parametric

Testing
Test for
Independence
comparing
Test for Goodness of
variance
Fit
Interpretation of Chi-Square values
• The χ 2 statistic is calculated under the assumption of no association. „

• Large value of χ 2 statistic ⇒ Small probability of occurring by chance alone


(p < 0.05) ⇒ Conclude that association exists between disease and
exposure. „(Null hypothesis rejected)

• Small value of χ 2 statistic ⇒ Large probability of occurring by chance alone


(p > 0.05) ⇒ Conclude that no association exists between disease and
exposure. (Null hypothesis accepted)
Interpretation of Chi-Square values
• The left hand side indicates the degrees of freedom. If the calculated value
of χ2 falls in the acceptance region, the null hypothesis ‘Ho’ is accepted
and vice-versa.
Limitations of the Chi-Square Test
1. The chi-square test does not give us much information about the strength
of the relationship. It only conveys the existence or nonexistence of the
relationships between the variables investigated.
2. The chi-square test is sensitive to sample size. This may make a weak
relationship statistically significant if the sample is large enough. Therefore,
chi-square should be used together with measures of association like
lambda, Cramer's V or gamma to guide in deciding whether a relationship
is important and worth pursuing.
3. The chi-square test is also sensitive to small expected frequencies. It can be
used only when not more than 20% of the cells have an expected frequency
of less than 5.
4. Cannot be used when samples are related or matched.
EXAMPLES:
Estrogen supplementation to delay or prevent the onset of
Alzheimer's disease in postmenopausal women.

The null hypothesis (H0): Estrogen supplementation in postmenopausal women is


unrelated to Alzheimer's onset.
The alternate hypothesis(HA): Estrogen supplementation in postmenopausal
women delays/prevents Alzheimer's onset.
Of the women who did not receive estrogen supplementation, 16.3%
(158/968) showed signs of Alzheimer's disease onset during the five-
year period; whereas, of the women who did receive estrogen
supplementation, only 5.8% (9/156) showed signs of disease onset.
• Next step: To calculate expected cell frequencies
The next step is to refer calculated value of chi-square to the
appropriate sampling distribution, which is defined by the
applicable number of degrees of freedom.
• For this example, there are 2 rows and 2 columns. Hence,

df = (2—1)(2—1) = 1
• The calculated value of χ2 =11.01 exceeds the value of chi-square (10.83)
required for significance at the 0.001 level.
• Hence we can say that the observed result is significant beyond the 0.001
level.
• Thus, the null hypothesis can be rejected with a high degree of
confidence.

You might also like