Unit 5 Mba 1ST
Unit 5 Mba 1ST
and ANOVA
Goals
• Introduction to ANOVA
Hypothesis Testing
HA: 1 - 2 > 0
HA: 1 - 2 0
P-values
• Calculate a test statistic in the sample data that is
relevant to the hypothesis being tested
Do Not
Reject H0
Rejct H0
Errors in Hypothesis Testing
Power = 1 -
Parametric and Non-Parametric Tests
Type of Data
Goal Gaussian Non-Gaussian Binomial
Compare one
group to a One sample
Wilcoxon Test Binomial Test
hypothetical t-test
value
Compare two
Paired t-test Wilcoxon Test McNemar’s Test
paired groups
Compare two
Two sample Wilcoxon-Mann- Chi-Square or
unpaired
t-test Whitney Test Fisher’s Exact Test
groups
General Form of a t-test
x x y (1 2 )
Statistic T T
1 1
s n
sp
m n
df t,n1 t,m n 2
Non-Parametric Alternatives
• General form:
xˆ critical value se
1 1
x - y t ,m n 2 sp
m n
Interpreting a 95% CI
• We calculate a 95% CI for a hypothetical sample mean to be
between 20.6 and 35.4. Does this mean there is a 95%
probability the true population mean is between 20.6 and 35.4?
• Independence
• Normality
• Let Xij denote the data from the ith level and jth observation
82 83 97 83 78 68 38 59 55
x.. 71.4
9
Partitioning Total Variation
• Recall, variation is simply average squared deviations from the mean
(x ij x.. )
2
n (x
i i. x.. )
2
(x ij x i. )
2
Sum of squared
Sum of squared Sum of squared deviations for all
deviations about the deviations for each observations within
grand mean across all group mean about each group from that
N observations the grand mean group mean, summed
across all groups
In Our Example
(x ij x.. )
2
n (xi i. x.. )
2
(x ij x i. )2
i1 j1 i1 j1
i1
(82 71.4)2 (83 71.4)2 (97 71.4)2 3(87.3 71.4)2 (82 87.3)2 (83 87.3)2 (97 87.3)2
(83 71.4)2 (78 71.4)2 (68 71.4)2 3(76.3 71.4)2 (83 76.3)2 (78 76.3)2 (68 76.3)2
(38 71.4)2 (59 71.4)2 (55 71.4)2 3(50.6 71.4)2 (38 50.6)2 (59 50.6)2 (55 50.6)2
(x ij x.. )
2
n (x
i i. x .. )
2
(x ij x i. )2
i1 j1 i1 i1 j1
x1.
x2.
x..
x3.
Calculating Mean Squares
MSTG 1062.2
F 12.59
MSTE 84.3
Source of df Sum of MS F
Variation Squares
SSTG
SSTG k 1
Group k-1 SSTG k 1 SSTE
Nk
SSTE
Error N-k SSTE Nk
• In R, kruskal.test()
T - Test
Tests of Hypotheses on Population Means
There are two general methods used to make a “ good guess” as to the true
value of 0
The first involves determine a confidence interval on (CI).
The second concerned with making a guess as to the value of and
then testing to see if such a guess is compatible with observed data . This
is method is called hypothesis testing.
1
Two tailed test 5% Significance level
2
Decide a test statistics; z-test , t- test , F-test. Calculate the value of test
statistics.
Calculate the T critical at given Significance level from the table Compare
the T critical value with |TStatistic| calculated value, If T critical > calculated
value |T statistical | Accept H0 or If T critical < |T statistical |
calculated value Rejected H0
State the null State a significance Decide a test Calculate the value
(Ho)and alternate level;1%, 5%, 10% statistics; z-test , t- of test statistics
(Ha) Hypothesis test , F-test.
3
Table of TCritical values
4
Types of t-tests
A t-test is a hypothesis test of the mean of one or two normally distributed
populations. Several types of t-tests exist for different situations, but they all
use a test statistic that follows a t-distribution under the null hypothesis:
1 sample t-test Tests whether the mean of a single Is the mean height of female
population is equal to a target value college students greater than
5.5 feet?
2 sample t-test Tests whether the difference Does the mean height of
between the means of two female college students
independent populations is equal to significantly differ from the
a target value mean height of male college
students?
3-paired t-test Tests whether the mean of the If you measure the weight of
differences between dependent or male college students before
paired observations is equal to a and after each subject takes a
target value weight-loss pill, is the mean
weight loss significant
enough to conclude that the
pill works?
unknown
1 may be one-sided or two sided
5
H0: μ = µ0 The population mean (μ) equals the hypothesized mean
(µ0).
Alternative hypothesis
H1: μ ≠ µ0 The population mean (μ) differs from the hypothesized
mean (µ0).
Test Statistics
𝒙 − 𝝁
𝒕= 𝒔
√𝒏
with d.f ( degree of freedom ) = n – 1
∑(𝒙𝒊−𝒙)𝟐
where 𝒔=√
(𝒏−𝟏)
In the formula that follows, we use a new symbol ( ) to indicate the
population standard value, and s the standard deviation , x= mean of the
sample
Example
The following data represents hemoglobin values in gm/dl for 10 patients:
10.5.
Is the mean value for patients significantly differ from the mean value of
general population (12 gm/dl). Evaluate the role of chance.( = 0.05 )
Solution
Mention all steps of testing hypothesis.
First, we must compute the mean (or average) of this sample:
6
6.5 8.95 6.0025
8 8.95 0.9025
11 8.95 4.2025
7 8.95 3.8025
7.5 8.95 2.1025
8.5 8.95 0.2025
9.5 8.95 0.3025
12 8.95 9.3025
x= 89.5/10 (x -x)2= 722.4025
∑(𝒙𝒊−𝒙)𝟐 𝟕𝟐𝟐.𝟒𝟎𝟐𝟓
𝒔=√ = √ =1.802005
(𝒏−𝟏) 𝟗
𝒙−𝝁 𝟖.𝟗𝟓−𝟏𝟐
𝒕= 𝒔 = 𝟏.𝟖𝟎𝟐𝟎𝟎𝟐 = -5.35234
√𝒏 √𝟏𝟎
Then compare with tabulated value, for 9 df, and 5% level of significance. It
is = 2.262, the calculated value>tabulated value. Reject Ho and conclude that
there is a statistically significant difference between the mean of sample and
population mean, and this difference is unlikely due to chance.
7
Where Sp is called the pooled standard deviation , and is given by
(𝑛1 − 1)𝑠2 + (𝑛2 − 1)𝑠2
1 2
𝑠𝑝 = √
𝑛1 + 𝑛2 − 2
(𝑥1 − 𝑥2)
𝑡=
(𝑛1 − 1)𝑠2 + (𝑛2 − 1)𝑠2 1 1
√ 1 2
𝑛1 + 𝑛2 − 2 √ +𝑛
𝑛1 2
d.f = n1 + n2 - 2
Example
The following data represents weight in Kg for 10 males and 12 females.
Male
80 75 95 55 60 70 75 72 80 65
Females:
60 70 50 85 45 60 80 65 70 62 77 82
(𝑥1 − 𝑥2)
𝑡=
(𝑛1 − 1)𝑠2 + (𝑛2 − 1)𝑠2 1 1
√ 1 2
𝑛1 + 𝑛2 − 2 √ +𝑛
𝑛1 2
8
60 72.7 161.29 45 67.16667 491.3611
70 72.7 7.29 80 67.16667 164.6944
75 72.7 5.29 65 67.16667 4.694444
72 72.7 0.49 60 67.16667 51.36111
80 72.7 53.29 70 67.16667 8.027778
65 72.7 59.29 62 67.16667 26.69444
77 67.16667 96.69444
82 67.16667 220.0278
Mean1=72.7 ∑(𝑥−𝑥)2 Mean2= ∑(𝑥−𝑥)2
s^2 = 67.16667
s^2 =
𝑛−1 𝑛−1
128.4556 157.7879
(72.7 − 67.166)
𝑡=
√(12 − 1)157.7879 + (10 − 1)128.4556 √ 1 + 1
12 + 10 − 2 12 10
t = 1.074
The tabulated t, 2 sides, for alpha 0.01 is 2.845 .Then accept Ho and conclude
that there is no significant difference between the 2 means. This difference
may be due to chance.
Note: - To calculated t- test using excel when α =0.05 t-
Test: Two-Sample Independent
t-Test: Two-Sample Assuming
Equal Variances
Variable 1 Variable 2
Mean 72.7 67.16666667
Variance 128.4555556 157.7878788
Observations 10 12
Pooled Variance 144.5883333
Hypothesized Mean Difference 0
df 20
t Stat 1.074730292
P(T<=t) one-tail 0.147645482
t Critical one-tail 1.724718243
P(T<=t) two-tail 0.295290964
t Critical two-tail 2.085963447
9
Decision:
We do a two-tail test. IF t Stat < t Critical, we accepted the null
hypothesis. In the case 1.07473 < 2.0859. Therefore, we do accepted
the null hypothesis; that means 0 = µ1. We can make decision
depended on the values of α and P value .IF α < P value, we accepted the
null hypothesis. In the case 0.05 < 0.2952. Therefore, we do accepted
the null hypothesis; that means 0 = µ1
The paired t – test
For the paired case, pairs are randomly selected from a single population. Each
member of a pair is randomly assigned to one of the two treatments. The null
hypothesis is that the mean different among pairs is zero. Example of pairing
observation is the before and after measurements on the same individuals.
Formula:
̅
𝑫
𝒕𝒙𝑫 =
𝑺𝑬𝒅𝒊𝒇𝒇
t is the difference in means over a standard error
𝑆𝐷𝐷
𝑆𝐸𝑑𝑖𝑓𝑓 =
√𝑛𝑝𝑎𝑖𝑟𝑠
2 (∑ 𝑑) 2
√ ∑ 𝑑 − 𝑛
where 𝑆𝐷𝐷 =
𝑛−1
The standard error is found by finding the difference between each pair of
observations. The standard deviation of these differences is SD D. Divide
SDD by sqrt(number of pairs) to get SEdiff.
̅
𝑫
𝒕𝒙𝑫 =
𝑆𝐷𝐷
√𝑛𝑝𝑎𝑖𝑟𝑠
where d.f = n – 1
10
Example Blood pressure of 8 patients, before & after treatment
Bp (before ) BP( after ) d d2
180 140 40 1600
200 145 55 3025
230 150 80 6400
240 155 85 7225
170 120 50 2500
190 130 60 3600
200 140 60 3600
165 130 35 1225
465
d = 465, Mean = = 58.123 d2=
d
8
29175
Variable 1 Variable 2
Mean 196.875 138.75
Variance 720.9821429 133.9285714
Observations 8 8
Pearson Correlation 0.882107431
Hypothesized Mean Difference 0
df 7
t Stat 9.387578897
P(T<=t) one-tail 1.62001E-05
t Critical one-tail 1.894578605
P(T<=t) two-tail 3.24001E-05
t Critical two-tail 2.364624252
Decision:
We do a two-tail test. IF t Stat > t Critical, we rejected the null
hypothesis. In the case 9.38757 < 2.3646. Therefore, we do rejected
11
the null hypothesis; that means 0 µ1. We can make decision
depended on the values of α and P value .IF α > P value, we rejected the
null hypothesis. In the case 0.05 > 3.24001E-05. Therefore, we do
rejected the null hypothesis; that means 0 µ1
12
Z test
LNCT CLG 1
Changing Standard Deviations
Whatever the mean and std dev:
• If the distribution is normally distributed – the
68-95-99.7 rule applies.
• At 68%, two-thirds of all the cases fall within
+ 1 standard deviation of the mean,
• 95% of the cases within + 2 standard
deviation of the mean, and 99.7% of the
cases within + 3 sd of the mean.
Z test versus T test
Generally, z-tests are used when we
have large sample sizes (n > 30), whereas
t-tests are most helpful with a smaller
sample size (n < 30). Both methods
assume a normal distribution of the
data, but the z-tests are most useful
when the standard deviation is known.
Z test
DEFINATION Z test is a statistical procedure used to
test an alternative hypothesis against a null
hypothesis.
Z-test is any statistical hypothesis used to determine
whether two samples' means are different when
variances are known and sample is large (n ≥ 30).
A z-test is a statistical test used to determine
whether two population means are different when
the variances are known and the sample size is
large.
Z Scores
• We call these standard deviation values “Z-
scores”
• Z score is defined as the number of standard
units any score or value is from the mean.
• Z score states how many standard deviations
the observation X falls away from the mean
and in which direction – plus or minus.
Formula for z score
x
z
Examples of computing z-scores
X X
X X XX SD z
SD
5 3 2 2 1
6 3 3 2 1.5
5 10 -5 4 -1.25
6 3 3 4 .75
4 8 -4 2 -2
Computing raw scores from z scores
X z or X zSD X
X X
z
SD SD zSD X X
1 2 2 3 5
-2 2 -4 2 -2
.5 4 2 10 12
-1 5 -5 10 5
example
• Party-time employee salaries in a company are
normally distributed with mean $20,000 and
Standard Dev. $1,000
• How many Std. Devs. Is $18,500 away from the
mean?
• Z= 18.500-200000/1000=
• -1.5 (negative specifies
directio,n5)00 ,
?
11
Example
• How many Std. Devs. Is $19,371 away from the
mean?
• Z = x-µ/Sd
• Z= 19.371-200000/1000=
= -.269 Std. devs. away
?
z-scores and conversions
• What is a z-score?
– A measure of an observation’s distance from the
mean.
– The distance is measured in standard deviation
units.
• If a z-score is zero, it’s on the mean.
• If a z-score is positive, it’s above the mean.
• If a z-score is negative, it’s below the mean.
• If a z-score is 1, it’s 1 SD above the mean.
• If a z-score is –2, it’s 2 SDs below the mean.
IQ is normally distributed with a mean of 100 and sd of 15.
How do you interpret a score of 109?
Use Z score
x
109 100 9
z .6
15 15
What does this Z-score .60 mean?
Does not mean 60 percent of cases below this score BUT rather
that this Z score is .60 standard units above the mean,
x 21 25
z 2.00
Calculating z-scores
Mean delivery time = 25 minutes
Standard deviation = 2 minutes
Convert 29.7 minutes to a z score.
x 29.7 25
z 2.35
To find the area to the left of
z = 1.34=0.4099
below this value =0.5+0.4099=0.9099
Dr.Abid Ahmad 17
To find the area to the left of
z = 1.34=0.4099
area to the left of this value=0.5+0.4099
= 0.9099
Dr.Abid Ahmad 18
Patterns for Finding Areas Under
the Standard Normal Curve
To find the area to the left of a given negative
z:
z 0
Patterns for Finding Areas Under the
Standard Normal Curve
0 z
Patterns for Finding Areas
Under the Standard Normal
Curve
To find the area between z values on either
side of zero:
Subtract area to left of z1 from area to left
of z2 .
z1 0 z2
Patterns for Finding Areas
Under the Standard Normal
Curve
To find the area between z values on the
same side of zero:
Subtract area to left of z1 from area to left
of z2 .
0 z1 z2
Patterns for Finding Areas
Under the Standard Normal
Curve
To find the area to the right of a positive z
value or to the right of a negative z value:
Subtract from 1.0000 the area to the left of the
given z.
Area under
entire curve
= 1.000.
0 z
Z score
s
confidence interval observed mean Z/2 *( )
n
• 140 children with urinary lead concentrarion in
umol/24 hours. The mean = 2.18 and SD =0.87
umol/24 hours .
A)What is 95% probability limits for this mean ?
• C I = The mean ± 1.96 SD
• 2.18 ± 1.96 * 0.87
• 3.88 – 0.475
• B) How many SD is a reading of 4.8 umol/24
hours from the mean, what is the probability of
getting such a reading, and is this reading
significantly differ from the mean or not ?
• Z= 4.8 – 2.18 / 0.87 = 3.01
• Z = 0.0013
• That means that the probability is very small to
come from the same population
• Mean diastolic blood pressure among printers
was found to be 88 mm Hg and with SD of 4.5.
One of the printers diastolic blood pressure was
found to be 100 mm Hg .
• A) is this significantly different or not at 95%
level.
• B) is this significantly different or not at 99%
level.
A) C I = The mean ± 1.96 SD
• 95% confidence interval = 88 ± 1.96* 4.5
• = 79.2 – 96.8 mm Hg
• So 100 mm Hg is outside this interval.
B) 99% confidence interval = 88 ± 3* 4.5
• = 74.5 – 101.5
• So 100 mm Hg lies within this interval
• We can use SE instead of SD
C I = The mean ± 1.96 SE
• SE = SD/√n
• = 4.5/√72 = 0.53 mm Hg
• so the answers will be:
• A) 95% CI = 88 ± 1.96* 0.53
• = 86.96 – 89.04
• B) 99% CI = 88 ± 3 * 0.53
• = 86.41 – 89.59
Z value application in difference between two means
Z= x1 – x 2 / (SD1)/√ n1 + (SD2)/√ n 2
0.025 0.025
z
z0 = 1.96 0 z0 = 1.96
47
Two-Sample Z test
• We want to test the null hypothesis that the two
populations have different means
• H0: 1 = 2 or equivalently, 1 - 2 = 0
• Two-sided alternative hypothesis: 1 - 2 0
• If we assume our population SDs 1 and 2 are
known, we can calculate a two-sample Z statistic:
48
Two samples Z test
• In Tikrit Medical college, to decide any difference
between two student samples age means from
outside Tikrit and those from Tikrit center.
• From Tikrit: mean1= 18, SD= 1.5, N1=60
• From outside Tikrit: mean2=19, SD=2, N2=60
Introduction
Analysis of variance (ANOVA) is statistical technique used for analyzing the difference between
the means of more than two samples. It is a parametric test of hypothesis. It is a step wise
estimation procedures (such as the "variation" among and between groups) used to attest the
equality between two or more population means .
ANOVA was developed by statistician and eugenicist Ronald Fisher. Though many statisticians
including Fisher worked on the development of ANOVA model but it became widely known
after being included in Fisher's 1925 book “Statistical Methods for Research Workers”. The
ANOVA is based on the law of total variance, where the observed variance in a particular
variable is partitioned into components attributable to different sources of variation. ANOVA
provides an analytical study for testing the differences among group means and thus generalizes
the t-test beyond two means. ANOVA uses F-tests to statistically test the equality of means.
Concept of Variance
Variance is an important tool in the sciences including statistical science. In the Theory of
Probability and statistics, variance is the expectation of the squared deviation of a random
variable from its mean. Actually, it is measured to find out the degree to which the data in series
are scattered around its average value. Variance is widely used in statistics, its use is ranging
from descriptive statistics to statistical inference and testing of hypothesis.
1
dependent variable associated with the effect of the controlled independent variables,
after taking into account the influence of the uncontrolled independent variables.
We take the null hypothesis that there is no significant difference between the means of
different populations. In its simplest form, analysis of variance must have a dependent
variable that is metric (measured using an interval or ratio scale). There must also be
one or more independent variables. The independent variables must be all categorical
(non-metric). Categorical independent variables are also called factors. A particular
combination of factor levels, or categories, is called a treatment.
What type of analysis would be made for examining the variations depends upon the
number of independent variables taken into account for the study purpose. One-way
analysis of variance involves only one categorical variable, or a single factor. If two or
more factors are involved, the analysis is termed n-way (eg. Two-Way, Three-Way etc.)
Analysis of Variance.
F Tests
F-tests are named after the name of Sir Ronald Fisher. The F-statistic is simply a ratio of two
variances. Variance is the square of the standard deviation. For a common person, standard
deviations are easier to understand than variances because they’re in the same units as the data
rather than squared units. F-statistics are based on the ratio of mean squares. The term “mean
squares” may sound confusing but it is simply an estimate of population variance that accounts
for the degrees of freedom (DF) used to calculate that estimate.
For carrying out the test of significance, we calculate the ratio F, which is defined as:
2
𝐹 = 𝑆1 , where 𝑆2= (𝑋1 −𝑋̅1 )2
𝑆22 1 𝑛1−1
(𝑋2 −𝑋̅2 ) 2
And 𝑆22 =
𝑛2−1
It should be noted that 𝑆2 is always the larger estimate of variance, i.e., 𝑆2> 𝑆2
1 1 2
2
The calculated value of F is compared with the table value for 1 and 2 at 5% or 1% level of
significance. If calculated value of F is greater than the table value then the F ratio is considered
significant and the null hypothesis is rejected. On the other hand, if the calculated value of F is
less than the table value the null hypothesis is accepted and it is inferred that both the samples
have come from the population having same variance.
Illustration 1: Two random samples were drawn from two normal populations and their values
are:
A 65 66 73 80 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97
Test whether the two populations have the same variance at the 5% level of significance.
Solution: Let us take the null hypothesis that the two populations have not the same variance.
Applying F-test:
2
F=𝑆 1
𝑆22
𝑋̅ = ∑𝑋2 = 913 = 83
2 𝑛2 11
798
𝑆2= ∑𝑥2/n1-1= = 99.75
1 1 9−1
3
734
𝑆2= ∑𝑥2/n2-1= = 129.8
2 2 11−1
2
F= 𝑆1 = 99.75
𝑆22
= 0.768
129.8
𝐴𝑡 5 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒, For 1= 10 and 2 =8, the table value of 𝐹0.05=3.36.
The calculated value of F is less than the table value. The hypothesis is accepted. Hence the two
populations have not the same variance.
ONE-WAY CLASSIFICATION
In one way classification, following steps are carrying out for computing F- ratio through most
popular method i.e. short-cut method:
1. Firstly get the squared value of all the observation for different samples (column)
2. Get the sum total of sample observations as ∑X1, ∑X2,……. ∑Xk in each column.
3. Get the sum total of squared values for each column as ∑𝑋2, ∑𝑋2,……. ∑𝑋2 in each column.
1 2 𝐾
4. Finding the value of “T” by adding up all the sums of sample observations i.e. T= ∑X1+
∑X2+……. ∑Xk
6. Find out Total sum of Squares (SST) through squared values and C F:
7. Find out Sum of square between the samples SSC by following formula:
8. Finally, find out sum of squares within samples i.e. SSE as under:
SSE= SST-SSC
4
(Treatments)
Within samples SSE 2= N-C MSE SSE/2 F=
𝑀𝑆𝐶
𝑀𝑆𝐸
(error)
Total SST n-1
Illustration 2: To test the significance of variation in the retail prices of a commodity in three
principal cities, Mumbai, Kolkata, and Delhi, four shops were chosen at random in each city and
the prices who lack confidence in their mathematical ability observed in rupees were as follows:
Kanpur 15 7 11 13
Lucknow 14 10 10 6
Delhi 4 10 8 8
Do the data indicate that the price in the three cities are significantly different?
Solution: Let us take the null hypothesis that there is no significant difference in the prices of a
commodity in the three cities.
5
2 2
CF = Correction Factor = 𝑇 = (116) = 1121.33
𝑛 12
ANOVA TABLE
The table value of F for df1 =2 , df2 = 9, and = 5% level of significance is 4.26. Since
calculated value of F is less than its critical (or table) value, the null hypothesis is accepted.
Hence we conclude that prices of a commodity in three cities have no significant difference.
6
TESTING EQUALITY OF POPULATION (TREATMENT) MEANS: TWO-WAY
CLLASIFICATION
Total variation consists of three parts: (i) variation between columns, SSC; (ii) variation between
rows, SSR; and (iii) actual variation due to random error, SSE. That is,
SST=SSC+(SSR+SSE).
The degrees of freedom associated with SST are cr-1, where c and r are the number of columns
and rows, respectively.
Illustration 3: The following table gives the number of refrigerators sold by 4 salesmen in three
months May, June and July:
Month Salesman
A B C D
March 50 40 48 39
April 46 48 50 45
May 39 44 40 39
7
Is there a significant difference in the sales made by the four salesmen? Is there a significant
difference in the sales made during different months?
The given data are coded by subtracting 40 from each observation. Calculations for a two-
criteria-month and salesman-analysis of variance are shown below:
= (75+48+108+3)-192= 42
= (137+80+164+27)-192 = 216
8
The total degrees of freedom are df= n-1=12-1=11.
(a) The table value of F = 4.75 for df1 =3, df2 = 6, and =5%. Since the calculated value of
𝐹𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 = 1.018 is less than its table value, the null hypothesis is accepted. Hence we
conclude that the sales made by the salesmen do not differ significantly.
(b) The table value of F= 5.14 for df1=2, df2=6, and = 5%. Since the calculated value of 𝐹𝐵𝑙𝑜𝑐𝑘=
3.327 is less than its table value, the null hypothesis is accepted. Hence we conclude that sales
made during different months do not differ significantly.
9
Analysis of Variance (ANOVA)
Recall, when we wanted to compare two population means, we used the 2-sample t procedures .
Now let’s expand this to compare k ≥ 3 population means. As with the t-test, we can graphically get an
idea of what is going on by looking at side-by-side boxplots. (See Example 12.3, p. 748, along with Figure
12.3, p. 749.)
Generally, we are considering a quantitative response variable as it relates to one or more explanatory
variables, usually categorical. Questions which fit this setting:
(i) Which academic department in the sciences gives out the lowest average grades? (Explanatory vari-
able: department; Response variable: student GPA’s for individual courses)
(ii) Which kind of promotional campaign leads to greatest store income at Christmas time? (Explanatory
variable: promotion type; Response variable: daily store income)
(iii) How do the type of career and marital status of a person relate to the total cost in annual claims
she/he is likely to make on her health insurance. (Explanatory variables: career and marital status;
Response variable: health insurance payouts)
Each value of the explanatory variable (or value-pair, if there is more than one explanatory variable) repre-
sents a population or group. In the Physicians’ Health Study of Example 3.3, p. 238, there are two factors
(explanatory variables): aspirin (values are “taking it” or “not taking it”) and beta carotene (values again are
“taking it” or “not taking it”), and this divides the subjects into four groups corresponding to the four cells
of Figure 3.1 (p. 239). Had the response variable for this study been quantitative—like systolic blood pres-
sure level—rather than categorical, it would have been an appropriate scenario in which to apply (2-way)
ANOVA.
H0: The (population) means of all groups under consideration are equal.
Ha: The (pop.) means are not all equal. (Note: This is different than saying “they are all unequal ”!)
Analysis of variance is a perfectly descriptive name of what is actually done to analyze sample data ac-
quired to answer problems such as those described in Section 1.1. Take a look at Figures 12.2(a) and 12.2(b)
(p. 746) in your text. Side-by-side boxplots like these in both figures reveal differences between samples
taken from three populations. However, variations like those depicted in 12.2(a) are much less convincing
that the population means for the three populations are different than if the variations are as in 12.2(b). The
reason is because the ratio of variation between groups to variation within groups is much
smaller for 12.2(a) than it is for 12.2(b).
1.4 Assumptions of ANOVA
Like so many of our inference procedures, ANOVA has some underlying assumptions which should be in
place in order to make the results of calculations completely trustworthy. They include:
Fortunately, ANOVA is somewhat robust (i.e., results remain fairly trustworthy despite mild violations of
these assumptions). Assumptions (ii) and (iii) are close enough to being true if, after gathering SRS samples
from each group, you:
(ii) look at normal quantile plots for each group and, in each case, see that the data points fall close to a
line.
(iii) compute the standard deviations for each group sample, and see that the ratio of the largest to the
smallest group sample s.d. is no more than two.
2 One-Way ANOVA
When there is just one explanatory variable, we refer to the analysis of variance as one-way ANOVA.
2.1 Notation
Here is a key to symbols you may see as you read through this section.
1 ni
si = the sample standard deviation from the ith group =
ni − 1
(x i j − x¯i)2
j=1
Viewed as one sample (rather than k samples from the individual groups/populations), one might measure
the total amount of variability among observations by summing the squares of the differences between each
xij and x¯:
k ni
1. Variability between group means (specifically, variation around the overall mean x¯)
k
SSG := ni (x̄ i − x̄) 2 , and
i=1
2. Variability within groups means (specifically, variation of observations about their group mean x¯i)
k ni k
If the variability between groups/treatments is large relative to the variability within groups/treatments,
then the data suggest that the means of the populations from which the data were drawn are significantly
different. That is, in fact, how the F statistic is computed: it is a measure of the variability between treat-
ments divided by a measure of the variability within treatments. If F is large, the variability between
treatments is large relative to the variation within treatments, and we reject the null hypothesis of equal
means. If F is small, the variability between treatments is small relative to the variation within treatments,
and we do not reject the null hypothesis of equal means. (In this case, the sample data is consistent with
the hypothesis that population means are equal between groups.)
To compute this ratio (the F statistic) is difficult and time consuming. Therefore we are always going to let
the computer do this for us. The computer generates what is called an ANOVA table:
Source SS df MS F
SSG MSG
Model/Group SSG k−1 MSG =
k−1 MSE
SSE
Residual/Error SSE n−k MSE =
n−k
• The source (of variability) column tells us SS=Sum of Squares (sum of squared deviations):
SST measures variation of the data around the overall mean x¯
SSG measures variation of the group means around the overall mean
SSE measures the variation of each observation around its group mean x̄i
• Degrees of freedom
k − 1 for SSG, since it measures the variation of the k group means about the overall
mean
n−k for SSE, since it measures the variation of the n observations about k group means
n−1 for SST, since it measures the variation of all n observations about the overall mean
• MS = Mean Square = SS/df :
This is like a standard deviation. Look at the formula we learned back in Chapter 1 for sample stan-
dard deviation (p. 51). Its numerator was a sum of squared deviations (just like our SS formulas), and
it was divided by the appropriate number of degrees of freedom.
It is interesting to note that another formula for MSE is
Analysis of Variance
Source SS df MS F Prob > F
Note: One more thing you will often find on an ANOVA table is R2 (the coefficient of determination). It
indicates the ratio of the variability between group means in the sample to the overall sample variability,
meaning that it has a similar interpretation to that for R2 in linear regression.
As with other tests of significance, one-way ANOVA has the following steps:
If P > α, then we have no reason to reject the null hypothesis. We state this as our conclusion along with
the relevant information (F-value, df-numerator, df-denominator, P-value). Ideally, a person conducting the
study will have some preconceived hypotheses (more specialized than the H0, Ha we stated for ANOVA,
and ones which she held before ever collecting/looking at the data) about the group means that she wishes
to investigate. When this is the case, she may go ahead and explore them (even if ANOVA did not indicate
an overall difference in group means), often employing the method of contrasts. We will not learn this
method as a class, but if you wish to know more, some information is given on pp. 762–769.
When we have no such preconceived leanings about the group means, it is, generally speaking, inappro-
priate to continue searching for evidence of a difference in means if our F-value from ANOVA was not
significant. If, however, P < α, then we know that at least two means are not equal, and the door is open
to our trying to determine which ones. In this case, we follow up a significant F-statistic with pairwise
comparisons of the means, to see which are significantly different from each other.
This involves doing a t-test between each pair of means. This we do using the pooled estimate for the
(assumed) common standard deviation of all groups (see the MS bullet in Section 2.3):
x¯i − x¯j
tij = q .
sp 1/n i + 1/n j
Where the row labeled ‘2’ meets the column labeled ‘1’, we are told that the sample mean response for
Program 2 was 3 lower than the mean response for Program 1 (Row Mean - Col Mean = -3), and that the
adjusted Bonferroni probability is 0.057. Thus, this difference is not statistically significant at the 5% level
to conclude the mean response from Program 1 is actually different than the mean response from Program
2. Which programs have statistically significant (at significance level 5%) mean responses?
Apparently, programs 1 and 3 are the most successful, with no statistically-significant difference between
them. At this stage, other factors, such as how much it will cost the company to implement the two pro-
grams, may be used to determine which program will be set in place.
Note: Sometimes instead of giving P-values, a software package will generate confidence intervals for the
differences between means. Just remember that if the CI includes 0, there is no statistically significant
difference between the means.
3 Two-Way ANOVA
Two-way ANOVA allows to compare population means when the populations are classified according to
two (categorical) factors.
Example. We might like to look at SAT scores of students who are male or female (first factor) and either
have or have not had a preparatory course (second factor).
Example. A researcher wants to investigate the effects of the amounts of calcium and magnesium in a
rat’s diet on the rat’s blood pressure. Diets including high, medium and low amounts of each mineral (but
otherwise identical) will be fed to the rats. And after a specified time on the diet, the blood pressure wil be
measured. Notice that the design includes nine different treatments because there are three levels to each
of the two factors.
• usually have a smaller total sample size, since you’re studying two things at once [rat diet example,
p. 800]
• removes some of the random variability (some of the random variability is now explained by the
second factor, so you can more easily find significant differences)
• we can look at interactions between factors (a significant interaction means the effect of one variable
changes depending on the level of the other factor).
Below is the outline of a two-way ANOVA table, with factors A and B, having I and J groups, respectively.
Source df SS MS F p-value
A I−1 SSA MSA MSA/MSE
B J−1 SSB MSB MSB/MSE
A ×B ( I − 1)(J − 1 ) SSAB MSAB MSAB/MSE
Error n − IJ SSE MSE
Total n−1 SST
The general layout of the ANOVA table should be familiar to us from the ANOVA tables we have seen for
regression and one-way ANOVA. Notice that this time we are dividing the variation into four components:
Since there are three different values of F, we must be doing three different hypothesis tests at once. We’ll
get to the hypotheses of these tests shortly.
The model for two-way ANOVA is that each of the I J groups has a normal distribution with potentially
different means (µij), but with a common standard deviation (σ). That is,
xijk = µij + sijk , where sijk ∼ N(0, σ)
`˛¸x `˛¸x
group mean resid ual
As usual, we will use two-way ANOVA provided it is reasonable to assume normal group distributions
and the ratio of the largest group standard deviation to the smallest group standard deviation is at most 2 .
Main Effects
Example. We consider whether the classifying by diagnosis (anxiety, depression, CDFS/Court referred)
and prior abuse (yes/no) is related to mean BC (Being Cautious) score. Below is a table where each cell
contains he mean BC score for people who were in that group.
df SS MS F p-value
Diagnosis 2 222.3 111.15 2.33 .11
Ever abused 1 819.06 819.06 17.2 .0001*
D* E 2 165.2 82.60 1.73 .186
Error 62 2958.0 47.71
Total 67
The table has three P-values, corresponding to three tests of significance:
I. H0: The mean BC score is the same for each of the three diagnoses.
Ha: The mean BC score is not the same for all three diagnoses.
The evidence here is not significant to reject the null hypothesis. (F = 2.33, df 1 = 2, df 2 = 62, P =
0.11)
II. H0: There is no main effect due to ever being abused.
Ha: There is a main effect due to being abused.
The evidence is significant to conclude a main effect exists. (F = 17.2, df 1 = 1, df 2 = 62, P = 0.0001)
III. H0: There is no interaction effect between diagnosis and ever being abused.
Ha: There is an interaction effect between the two variables.
The evidence here is not significant to reject the null hypothesis. (F = 1.73, df 1 = 2, df 2 = 62, P =
0.186)
When a main effect has been found for just one variable without variable interactions, we might combine
data across diagnoses and perform one of the other tests we know that is applicable. (Two-sample t or One-
way ANOVA are both options here, since the combining of information leaves us with just two groups.)
But we might also perform a simpler task: Draw a plot of the main effects due to abuse.
Interaction Effects
Example. We consider whether the mean BSI (Belonging/Social Interest) is the same after classifying people
on the basis of whether or not they’ve been abused and diagnosis.
ANOVA table:
df SS MS F p-value
Diagnosis 2 118.0 59.0 1.89 .1602
Ever abused 1 483.6 483.6 15.5 .0002*
DX E 2 387.0 193.5 6.19 .0035*
Error 62 1938.12 31.26
Total 67
Since the interaction is significant, let’s look at the individual mean BSI at each level of diagnosis on an
interaction plot.
Knowing how many people were reflected in each category (information that is not provided here) would
allow us to conduct 2-sample t tests at each level of diagnosis. Such tests reveal that there is a significant
difference in mean BSI between those who have ever been abused and those not abused only for those who
have a DCFS/Court Referred disorder. There is no statistically significant difference between these two
groups for those with a Depressive or Anxiety disorder (though it’s pretty close for those with an Anxiety
disorder).
Example. Promotional fliers. [Exercise 13.15, p. 821 in Moore/McCabe]
Means:
discount
promos 10 20 30 40
1 4.423 4.225 4.689 4.920
3 4.284 4.097 4.524 4.756
5 4.058 3.890 4.251 4.393
7 3.780 3.760 4.094 4.269
Standard Deviations:
discount
promos 10 20 30 40
1 0.18476 0.38561 0.23307 0.15202
3 0.20403 0.23462 0.27073 0.24291
5 0.17599 0.16289 0.26485 0.26854
7 0.21437 0.26179 0.24075 0.26992
Counts:
discount
promos 10 20 30 40
1 10 10 10 10
3 10 10 10 10
5 10 10 10 10
7 10 10 10 10
discount promos
4.8
4.8
40 1
30 3
10 5
20 7
4.6
4.6
mean of expPrice
mean of expPrice
4.4
4.4
4.2
4.2
4.0
4.0
3.8
3.8
1 3 5 7 10 20 30 40
promos discount
Question: Was it worth plotting the interaction effects, or would we have learned the same things plotting
only the main effect?
44.1 One-Way Analysis of Variance 2
of Variance
Introduction
Problems in engineering often involve the exploration of the relationships between values taken by
a variable under different conditions. 41 introduced hypothesis testing which enables us to
compare two population means using hypotheses of the general form
H0 : µ1 = µ2
H1 : µ1 /= µ2
or, in the case of more than two populations,
H0 : µ1 = µ2 = µ3 = . . . = µk
H1 : H0 is not true
If we are comparing more than two population means, using the type of hypothesis testing referred
to above gets very clumsy and very time consuming. As you will see, the statistical technique called
Analysis of Variance (ANOVA) enables us to compare several populations simultaneously. We
might, for example need to compare the shear strengths of five different adhesives or the surface
toughness of six samples of steel which have received different surface hardening treatments.
,
1. One-way ANOVA
In this Workbook we deal with one-way analysis of variance (one-way ANOVA) and two-way analysis of
variance (two-way ANOVA). One-way ANOVA enables us to compare several means simultaneously
by using the F -test and enables us to draw conclusions about the variance present in the set of
samples we wish to compare.
Multiple (greater than two) samples may be investigated using the techniques of two-population
hypothesis testing. As an example, it is possible to do a comparison looking for variation in the
surface hardness present in (say) three samples of steel which have received different surface hardening
treatments by using hypothesis tests of the form
H0 : µ1 = µ2
H1 : µ1 /= µ2
We would have to compare all possible pairs of samples before reaching a conclusion. If we are
dealing with three samples we would need to perform a total of
3
3!
C2 = =3
1!2!
hypothesis tests. From a practical point of view this is not an efficient way of dealing with the
problem, especially since the number of tests required rises rapidly with the number of samples
involved. For example, an investigation involving ten samples would require
10!
10
C2 = = 45
8!2!
separate hypothesis tests.
There is also another crucially important reason why techniques involving such batteries of tests are
unacceptable. In the case of 10 samples mentioned above, if the probability of correctly accepting a
given null hypothesis is 0.95, then the probability of correctly accepting the null hypothesis
H0 : µ1 = µ2 = . . . = µ10
is (0.95)45 ≈0.10 and we have only a 10% chance of correctly accepting the null hypothesis for
all 45 tests. Clearly, such a low success rate is unacceptable. These problems may be avoided by
simultaneously testing the significance of the difference between a set of more than two population
means by using techniques known as the analysis of variance.
Essentially, we look at the variance between samples and the variance within samples and draw
conclusions from the results. Note that the variation between samples is due to assignable (or
controlled) causes often referred in general as treatments while the variation within samples is due
to chance. In the example above concerning the surface hardness present in three samples of steel
which have received different surface hardening treatments, the following diagrams illustrate the
differences which may occur when between sample and within sample variation is considered.
Case 1
In this case the variation within samples is roughly on a par with that occurring between samples.
s̄2
s̄1
s̄3
Figure 1
Case 2
In this case the variation within samples is considerably less than that occurring between samples.
s̄1
s̄2
s̄3
Figure 2
We argue that the greater the variation present between samples in comparison with the variation
present within samples the more likely it is that there are ‘real’ differences between the population
means, say µ1, µ2 and µ3. If such ‘real’ differences are shown to exist at a sufficiently high level
of significance, we may conclude that there is sufficient evidence to enable us to reject the null
hypothesis H0 : µ1 = µ2 = µ3.
4
Since the machines are set up to produce identical alloy spacers it is reasonable to ask if the evidence
we have suggests that the machine outputs are the same or different in some way. We are really
asking whether the sample means, say X̄A , X̄B , X̄C and X̄D , are different because of differences in
the respective population means, say µA , µB , µC and µD , or whether the differences in X̄A , X̄B , X̄C
and X̄D may be attributed to chance variation. Stated in terms of a hypothesis test, we would write
H0 : µA = µB = µC = µD
H1 : At least one mean is different from the others
In order to decide between the hypotheses, we calculate the mean of each sample and overall mean
(the mean of the means) and use these quantities to calculate the variation present between the
samples. We then calculate the variation present within samples. The following tables illustrate the
calculations.
H0 : µA = µB = µC = µD
H1 : At least one mean is different from the others
Machine A Machine B Machine C Machine D
46 56 55 49
54 55 51 53
48 56 50 57
46 60 51 60
56 53 53 51
Σ1 D 2
2
STr = n−1 X̄i − X̄
i=A
1
= (50 − 53)2 + (56 − 53)2 + (52 − 53)2 + (54 − 53)2
4−1
20
= = 6.67 to 2 d.p.
3
HELM (2015): 5
Section 44.1: One-Way Analysis of Variance
Sample A
Σ 2 2 2 2 2
(X − X̄A ) = (46 — 50) + (54 — 50) + (48 — 50) + (46 — 50) + (56 — 50) = 88
2
Sample B
Σ 2 2 2 2 2
(X − X̄B ) = (56 — 56) + (55 — 56) + (56 — 56) + (60 — 56) + (53 — 56) = 26
2
Sample C
Σ 2 2 2 2 2
(X − X̄C ) = (55 — 52) + (51 — 52) + (50 — 52) + (51 — 52) + (53 — 52) = 16
2
Sample D
Σ
(X − X̄D ) = (49 — 54)2 + (53 − 54) 2+ (57 − 54) 2+ (60 − 54) +
2
(51 − 54) = 80
2 2
ANOVA tables
It is usual to summarize the calculations we have seen so far in the form of an ANOVA table.
Essentially, the table gives us a method of recording the calculations leading to both the numerator
and the denominator of the expression
2
nSTr
F =
SE2
In addition, and importantly, ANOVA tables provide us with a useful means of checking the accuracy
of our calculations. A general ANOVA table is presented below with explanatory notes.
Define a = number of treatments, n = number of observations per sample.
Source of Sum of Squares Degrees Mean Square Value of
Variation SS of Freedom MS F Ratio
Σ
a 2 SSTr MS Tr
Between samples MS Tr =
SSTr = n X¯i − X¯ (a − 1) (a 2− 1) F =
MSE
(due to treatments) i=1 = nS nS2
X̄
= 2Tr
Differences between SE
means X̄i and X̄
Σ
a Σ
n
2
Within samples SS = X − X̄ j SSE
E ij a(n − 1) MSE =
(due to chance errors) i=1 j=1 a(n − 1)
Differences between = S2
E
individual observations
Xij and means X̄i
Σ
a Σ
n 2
HELM (2015): 7
Section 44.1: One-Way Analysis of Variance
In order to demonstrate this table for the example above we need to calculate
Σ
a Σ
n
2
SST = Xij − X̄
i=1 j=1
a measure of the total variation present in the data. Such calculations are easily done using a
computer (Microsoft Excel was used here), the result being
Σ
a Σ
n
2
SST = Xij − X̄ = 310
i=1 j=1
SSE
M SE =
Within samples a(n − 1)
210 16
(due to chance errors) 210
=
Differences between 16
individual observations = 13.13
Xij and means X̄i
TOTALS 310 19
As you can see from the table, SSTr and SSE do indeed sum to give SST even though we can
calculate them separately. The same is true of the degrees of freedom.
Note that calculating these quantities separately does offer a check on the arithmetic but that using
the relationship can speed up the calculations by obviating the need to calculate (say) SST . As
you might expect, it is recommended that you check your calculations! However, you should note
that it is usual to calculate SST and SSTr and then find SSE by subtraction. This saves a lot of
unnecessary calculation but does not offer a check on the arithmetic. This shorter method will be
used throughout much of this Workbook.
8
Unequal sample sizes
So far we have assumed that the number of observations in each sample is the same. This is not a
necessary condition for the one-way ANOVA.
Key Point 1
Suppose that the number of samples is a and the numbers of observations are n1, n2, . . . , na. Then
the between-samples sum of squares can be calculated using
Σa
Ti2 G2
SS Tr = −
ni N
i=1
a a
Σ Σ
where Ti is the total for sample i, G = T i is the overall total and N = n i.
i=1 i=1
Do the data support the hypothesis that the systems offer equivalent levels of efficiency?
10
Answer
Appropriate hypotheses are
H0 = µ1 = µ2 = µ3
H1 : At least one mean is different to the others
Variation between samples
System 1 System 2 System 3
48 60 57
56 56 55
46 53 52
45 60 50
50 51 51
X̄1 = 49 X̄2 = 56 X̄3 = 53
49 + 56 + 53
The mean of the means is X̄ = = 52.67 and the variation present between samples
3
is
Σ1 3 2
1
X̄i − X̄
2
STr = = (49 − 52.67)2 + (56 − 52.67)2 + (53 − 52.67)2 = 12.33
n − 1 i=1 3−1
Variation within samples
System 1
Σ
(X − X̄1 )2 = (48 − 49)2 + (56 − 49)2 + (46 − 49)2 + (45 − 49)2 + (51 − 49)2 = 76
System 2
Σ
(X − X̄2 )2 = (60 − 56)2 + (56 − 56)2 + (53 − 56)2 + (60 − 56)2 + (51 − 56)2 = 66
System 3
Σ
(X − X̄3 )2 = (57 − 53)2 + (55 − 53)2 + (52 − 53)2 + (50 − 53)2 + (51 − 53)2 = 34
Hence
Σ Σ Σ
(X − X̄1 )2 + (X − X̄2 )2 + (X − X̄3 )2 76 + 66 + 34
SE2 = = = 14.67
(n1 − 1) + (n2 − 1) + (n3 − 1) 4+4+4
nS2 5 × 12.33
The value of F is given by F = 2Tr = = 4.20
SE 14.67
The number of degrees of freedom for S2Tr is No. of samples −1 = 2
The number of degrees of freedom for S2E is No. of samples×(sample size − 1) = 12
The critical value (5% level of significance) from the F -tables (Table 1 at the end of this Workbook)
is F(2,12) = 3.89 and since 4.20 > 3.89 we conclude that we have sufficient evidence to reject H0
so that the injection systems are not of equivalent efficiency.
Exercises
1. The yield of a chemical process, expressed in percentage of the theoretical maximum, is mea-
sured with each of two catalysts, A, B, and with no catalyst (Control: C). Five observations
are made under each condition. Making the usual assumptions for an analysis of variance, test
the hypothesis that there is no difference in mean yield between the three conditions. Use the
5% level of significance.
2. Four large trucks, A, B, C, D, are used to move stone in a quarry. On a number of days,
the amount of fuel, in litres, used per tonne of stone moved is calculated for each truck. On
some days a particular truck might not be used. The data are as follows. Making the usual
assumptions for an analysis of variance, test the hypothesis that the mean amount of fuel used
per tonne of stone moved is the same for each truck. Use the 5% level of significance.
Truck Observations
A 0.21 0.21 0.21 0.21 0.20 0.19 0.18 0.21 0.22 0.21
B 0.22 0.22 0.25 0.21 0.21 0.22 0.20 0.23
C 0.21 0.18 0.18 0.19 0.20 0.18 0.19 0.19 0.20 0.20 0.20
D 0.20 0.20 0.21 0.21 0.21 0.19 0.20 0.20 0.21
12
Answers
1. We calculate the treatment totals for A: 392.1, B: 405.0 and C: 375.7. The overall total is
ΣΣ
1172.8 and y2 = 91792.68.
on 14 − 2 = 12 degrees of freedom.
The upper 5% point of the F2,12 distribution is 3.89. The observed variance ratio is greater
than this so we conclude that the result is significant at the 5% level and we reject the null
hypothesis at this level. The evidence suggests that there are differences in the mean yields
between the three treatments.
Answer
on 37 − 3 = 34 degrees of freedom.
The upper 5% point of the F3,34 distribution is approximately 2.9. The observed variance
ratio is greater than this so we conclude that the result is significant at the 5% level and we
reject the null hypothesis at this level. The evidence suggests that there are differences in the
mean fuel consumption per tonne moved between the four trucks.
14
,
Two-Way Analysis
ofVariance
Introduction
In the one-way analysis of variance (Section 44.1) we consider the effect of one factor on the values
taken by a variable. Very often, in engineering investigations, the effects of two or more factors are
considered simultaneously.
The two-away ANOVA deals with the case where there are two factors. For example, we might
compare the fuel consumptions of four car engines under three types of driving conditions (e.g.
urban, rural, motorway). Sometimes we are interested in the effects of both factors. In other cases
one of the factors is a ‘nuisance factor’ which is not of particular interest in itself but, if we allow for
it in our analysis, we improve the power of our test for the other factor.
We can also allow for interaction effects between the two factors.
1. Two-way ANOVA without interaction
The previous Section considered a one-way classification analysis of variance, that is we looked at the
variations induced by one set of values of a factor (or treatments as we called them) by partitioning
the variation in the data into components representing ‘between treatments’ and ‘within treatments.’
In this Section we will look at the analysis of variance involving two factors or, as we might say,
two sets of treatments. In general terms, if we have two factors say A and B, there is no absolute
reason to assume that there is no interaction between the factors. However, as an introduction to
the two-way analysis of variance, we will consider the case occurring when there is no interaction
between factors and an experiment is run only once. Note that some authors take the view that
interaction may occur and that the residual sum of squares contains the effects of this interaction
even though the analysis does not, at this stage, allow us to separate it out and check its possible
effects on the experiment.
The following example builds on the previous example where we looked at the one-way analysis of
variance.
16
some of the notation used when we come to write out a general two-way ANOVA table shortly. We
obtain one observation per cell and cannot measure variation within a cell. In this case we cannot
check for interaction between the operator and the machine - the two factors used in this example.
Running an experiment several times results in multiple observations per cell and in this case we
should assume that there may be interaction between the factors and check for this. In the case
considered here (no interaction between factors), the required sums of squares build easily on the
relationship used in the one-way analysis of variance
SST = SSTr + SSE
to become
SST = SSA + SSB + SSE
where SSA represent the sums of squares corresponding to factors A and B. In order to calculate
the required sums of squares we lay out the table slightly more efficiently as follows.
( X̄.j − X̄ )
−4 3 —1 2 Sum = 0
Machine SS (X̄.j − X̄ )2
16 9 1 4 30 × 5 = 150
Note 1
The . notation means that summation takes place over that variable. For example, the five operator
46 + 56 + 55 + 47
means X̄.j are obtained as X̄.1 = = 51 and so on, while the four machine means
4
46 + 54 + 48 + 46 + 51
X̄i. are obtained as X̄1. = = 49 and so on. Put more generally (and this is
5
just an example)
Σ
m
xij
i=1
X̄.j =
m
HELM (2015): 17
Section 44.2: Two-Way Analysis of Variance
Note 2
Multiplying factors were used in the calculation of the machine sum of squares (four in this case
since there are four machines) and the operator sum of squares (five in this case since there are five
operators).
Note 3
The two statements ‘Sum = 0’ are included purely as arithmetic checks.
We also know that SSO = 24 and SSM = 150.
Calculating the error sum of squares
Note that the total sum of squares is easy to obtain and that the error sum of squares is then obtained
by straightforward subtraction.
The total sum of squares is given by summing the quantities (Xij − X̄ )2 for the table of entries.
Subtracting X̄ = 53 from each table member and squaring gives:
Operator (j) Machine (i)
1 2 3 4
1 49 9 4 36
2 1 4 4 9
3 25 9 9 25
4 49 49 4 36
5 4 0 0 4
The total sum of squares is SST = 330.
The error sum of squares is given by the result
At this stage we display the general two-way ANOVA table and then particularise the table for the
example we are engaged in and draw conclusions by using the test as we have previously done with
one-way ANOVA.
A General Two-Way ANOVA Table
Hence the two-way ANOVA table for the example under consideration is
From the F -tables (at the end of the Workbook) F4,12 = 3.26 and F3,12 = 3.49. Since 0.46 < 3.26
we conclude that we do not have sufficient evidence to reject the null hypothesis that there is no
difference between the operators. Since 3.85 > 3.49 we conclude that we do have sufficient evidence
at the 5% level of significance to reject the null hypothesis that there in no difference between the
machines.
If we have two factors, A and B, with a levels of factor A and b levels of factor B, and one
observation per cell, we can calculate the sum of squares as follows.
The sum of squares for factor A is
1Σ 2 G
a 2
SS A = A − with a − 1 degrees of freedom
b i=1 i N
and the sum of squares for factor B is
1Σ G
B2 − 2
b
SS = with b − 1 degrees of freedom
B j
a N
j=1
where
Σ
b
Ai = Xij is the total for level i of factor A,
j=1
Σ
a
Your solution
Do your working on separate paper and enter the main conclusions here.
21
Answer
Our hypotheses may be stated as follows.
H 0 : µ 1 = µ 2 = µ3
Paint type
H1 : At least one of the means is different from the others
H0 : µ1 = µ2 = µ3
Steel-Alloy
H1 : At least one of the means is different from the others
Following the methods of calculation outlined above we obtain:
(X̄.j − X̄ )
−4 3 1 Sum = 0
Steel-Alloy SS (X̄.j − X̄ )2
16 9 1 26 × 3 = 78
Hence SSPa = 24 and SSSt = 78. We now require SSE. The calculations are as follows.
In the table below, the predicted outputs are given in parentheses.
1 2 3
40 51 56
1
(45) (52) (50) 49 —2
54 55 50
2 53 2
(49) (56) (54)
47 56 50
3 (47) (54) (52) 51 0
Steel-Alloy
47 54 52 X̄ = 51 Sum = 0
Means (X̄i.)
(X̄.j − X̄ )
−4 3 1 Sum = 0
22
Answers continued
A table of squared residuals is easily obtained as
Paint Steel
(j) (i)
1 2 3
1 25 1 36
2 25 1 16
3 0 4 4
Hence the residual sum of squares is SSE = 112. The total sum of squares is given by subtracting
X̄ = 51 from each table member and squaring to obtain
Paint Steel
(j) (i)
1 2 3
1 121 0 25
2 9 16 1
3 16 25 1
The total sum of squares is SST = 214. We should now check to see that SST = SSPa+SSSt+SSE.
Substitution gives 214 = 24 + 78 + 112 which is correct.
The values of F are calculated as shown in the ANOVA table below.
Between samples 78
78 2 MSB = = 39
(due to treatment B, 2 39
F =
say, Steel − Alloy) 28
= 1.393
Within samples 112
112 4 MSE = = 28
(due to chance errors) 4
Totals 214 8
From the F -tables the critical values of F2,4 = 6.94 and since both of the calculated F values are
less than 6.94 we conclude that we do not have sufficient evidence to reject either null hypothesis.
23
2. Two-way ANOVA with interaction
The previous subsection looked at two-way ANOVA under the assumption that there was no inter-
action between the factors A and B. We will now look at the developments of two-way ANOVA
to take into account possible interaction between the factors under consideration. The following
analysis allows us to test to see whether we have sufficient evidence to reject the null hypothesis that
the amount of interaction is effectively zero.
To see how we might consider interaction between factors A and B taking place, look at the following
table which represents observations involving a two-factor experiment.
Factor B
Factor A 1 2 3 4 5
1 3 5 1 9 12
2 4 6 2 10 13
3 6 8 4 12 15
A brief inspection of the numbers in the five columns reveals that there is a constant difference
between any two rows as we move from column to column. Similarly there is a constant difference
between any two columns as we move from row to row. While the data are clearly contrived, it
does illustrate that in this case that no interaction arises from variations in the differences between
either rows or columns. Real data do not exhibit such behaviour in general of course, and we expect
differences to occur and so we must check to see if the differences are large enough to provide
sufficient evidence to reject the null hypothesis that the amount of interaction is effectively zero.
Notation
Let a represent the number of ‘levels’ present for factor A, denoted i = 1, . . . , a.
Let b represent the number of ‘levels’ present for factor B, denoted j = 1, . . . , b.
Let n represent the number of observations per cell. We assume that it is the same for each cell.
In the table above, a = 3, b = 5, n = 1. In the examples we shall consider, n will be greater than 1
and we will be able to check for interaction between the factors.
We suppose that the observations at level i of factor A and level j of factor B are taken from a
normal distribution with mean µij. When we assumed that there was no interaction, we used the
additive model
µij = µ + αi + βj
So, for example, the difference µi1 − µi2 between the means at levels 1 and 2 of factor B is equal
to β1 β−2 and does not depend upon the level of factor A. When we allow interaction, this is not
necessarily true and we write
µij = µ + αi + βj + γij
Here γij is an interaction effect. Now µi1 — µi2 = β1 − β2 + γi1 − γi2 so the difference between
two levels of factor B depends on the level of factor A.
24
Fixed and random effects
Often the levels assigned to a factor will be chosen deliberately. In this case the factors are said to be
fixed and we have a fixed effects model. If the levels are chosen at random from a population of all
possible levels, the factors are said to be random and we have a random effects model. Sometimes
one factor may be fixed while one may be random. In this case we have a mixed effects model. In
effect, we are asking whether we are interested in certain particular levels of a factor (fixed effects) or
whether we just regard the levels as a sample and are interested in the population in general (random
effects).
Calculation method
The data you will be working with will be set out in a manner similar to that shown below.
The table assumes n observations per cell and is shown along with a variety of totals and means
which will be used in the calculations of the various test statistics to follow.
Factor B
Factor A Level 1 Level 2 ... Level j . . . Level b Totals
x111 x121 x1j1 x1b1
Level 1 . . ... . ... . T1··
x11n x12n x1jn x1bn
x211 x221 x2j1 x2b1
Level 2 . . ... . ... . T2··
x21n x22n x2jn x2bn
. . . . . ... . .
xi11 xij1 xib1
Sum of datanin cell ,
Level i . Σ . ... . Ti··
xi1n (i,j) is Tij·= xijk xijn xibn
k=1
. . . . . ... . .
xa11 xa21 xaj1 xab1
Level a . . ... . ... . Ta··
xa1n xa2n xajn xabn
Totals T·1· T·2· ... T·j· ... T·b· T···
Notes
(a) T... represents the grand total of the data values so that
b a a b n
Σ Σ ΣΣΣ
T··· = T·j· = Ti·· = xijk
j=1 i=1 i=1 j=1 k=1
(b) Ti.. represents the total of the data in the ith row.
(c) T.j. represents the total of the data in the jth column.
(d) The total number of data entries is given by N = nab.
Partitioning the variation
We are now in a position to consider the partition of the total sum of the squared deviations from
the overall mean which we estimate as
T
x = ...
N
The total sum of the squared deviations is
a b n
ΣΣΣ
(xijk − x )2
i=1 j=1 k=1
Σ
n
Note that the quantity Tij. = xijk is the sum of the data in the (i, j)th cell and that the quantity
a b k=1
2 2
Σ Σ Tij. T... is the sum of the squares between cells.
−
i=1 j=1 n N
SSE is the sum of the squares due to chance or experimental error and is given by
SSE = SST − SSA − SSB − SSAB
The number of degrees of freedom (N − 1) is partitioned as follows:
SST SSA SSB SSAB SSE
N −1 a−1 b−1 (a − 1)(b − 1) N − ab
26
Note that there are ab − 1 degrees of freedom between cells and that the number of degrees of
freedom for SSAB is given by
ab − 1 − (a − 1) − (b − 1) = (a − 1)(b − 1)
This gives rise to the following two-way ANOVA tables.
SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSE
SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)
SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSAB
SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)
Two-Way ANOVA Table - Mixed-Effects Model
SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSE
SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)
SSB MSB
Factor B SSB (b − 1) MSB = F =
(b − 1) MSAB
SSAB MSAB
Interaction SSAB (a − 1) × (b − 1) MSAB = F =
(a − 1)(b − 1) MSE
SSE
Residual Error SSE (N − ab) MSE =
N − ab
Totals SST (N − 1)
Example 1
In an experiment to compare the effects of weathering on paint of three different
types, two identical surfaces coated with each type of paint were exposed in each
of four environments. Measurements of the degree of deterioration were made as
follows.
Environment 1 Environment 2 Environment 3 Environment 4
Paint A 10.89 10.74 9.94 11.25 9.88 10.13 14.11 12.84
Paint B 12.28 13.11 14.45 11.17 11.29 11.10 13.44 11.37
Paint C 10.68 10.30 10.89 10.97 10.61 11.00 12.22 11.32
Making the assumptions of normality, independence and equal variance, derive the
appropriate ANOVA tables and state the conclusions which may be drawn at the
5% level of significance in the following cases.
(a) The types of paint and the environments are chosen deliberately be-
cause the interest is in these paints and these environments.
(b) The types of paint are chosen deliberately because the interest is in
these paints but the environments are regarded as a sample of possible
environments.
(c) The types of paint are regarded as a random sample of possible paints
and the environments are regarded as a sample of possible environ-
ments.
Solution
We know that case (a) is described as a fixed-effects model, case (b) is described as a mixed-effects
model (paint type fixed) and case (c) is described as a random-effects model. In all three cases the
calculations necessary to find MSP (paints), MSN (environments), MSP and MSN are identical.
Only the calculation and interpretation of the test statistics will be different. The calculations are
shown below.
Subtracting 10 from each observation, the data become:
Environment 1 Environment 2 Environment 3 Environment 4 Total
Paint A 0.89 0.74 −0.06 1.25 −0.12 0.13 4.11 2.84 9.78
(total 1.63) (total 1.19) (total 0.01) (total 6.95)
Paint B 2.28 3.11 4.45 1.17 1.29 1.10 3.44 1.37 18.21
(total 5.39) (total 5.62) (total 2.39) (total 4.81)
Paint C 0.68 0.30 0.89 0.97 0.61 1.00 2.22 1.32 7.99
(total 0.98) (total 1.86) (total 1.61) (total 3.54)
Total 8.00 8.67 4.01 15.30 35.98
29
Solution (contd.)
Sum of squares for paints is
1 35.982
2 2 2
SSP = (9.78 + 18.15 + 7.99 ) − = 7.447
8 24
Sum of squares for environments is
1 35.982
2 2 2 2
SSN = (8.00 + 8.67 + 3.98 + 15.30 ) − = 10.950
6 24
So the interaction sum of squares is SSPN = SSS − SSP − SSN = 8.365 and
the residual sum of squares is SSE = SST −SSS = 10.148 The results are combined in the following
ANOVA table
Deg. of Sum of Mean Variance Variance Variance Ratio
Freedom Squares Square Ratio (fixed) Ratio (mixed) (random)
Paints 2 7.447 3.724 4.40 2.67 2.67
F2,12 = 3.89 F2,6 = 5.14 F2,6 = 5.14
Environments 3 10.950 3.650 4.31 4.31 2.61
F3,12 = 3.49 F3,12 = 3.49 F3,6 = 4.76
Interaction 6 8.365 1.394 1.65 1.65 1.65
F6,12 = 3.00 F6,12 = 3.00 F6,12 = 3.00
Treatment 11 26.762 2.433
combinations
Residual 12 10.148 0.846
Total 23 36.910
The following conclusions may be drawn. There is insufficient evidence to support the interaction
hypothesis in any case. Therefore we can look at the tests for the main effects.
Case (a) Since 4.40 > 3.89 we have sufficient evidence to conclude that paint type affects the
degree of deterioration. Since 4.07 > 3.49 we have sufficient evidence to conclude that environment
affects the degree of deterioration.
Case (b) Since 2.67 < 5.14 we do not have sufficient evidence to reject the hypothesis that paint
type has no effect on the degree of deterioration. Since 4.07 > 3.49 we have sufficient evidence to
conclude that environment affects the degree of deterioration.
Case (c) Since 2.67 < 5.14 we do not have sufficient evidence to reject the hypothesis that paint
type has no effect on the degree of deterioration. Since 2.61 < 4.76 we do not have sufficient
evidence to reject the hypothesis that environment has no effect on the degree of deterioration.
If the test for interaction had given a significant result then we would have concluded that there
was an interaction effect. Therefore the differences between the average degree of deterioration for
different paint types would have depended on the environment and there might have been no overall
‘best paint type’. We would have needed to compare combinations of paint types and environments.
However the relative sizes of the mean squares would have helped to indicate which effects were
most important.
A motor company wishes to check the influences of tyre type and shock absorber
settings on the roadholding of one of its cars. Two types of tyre are selected
from the tyre manufacturer who normally provides tyres for the company’s new
vehicles. A shock absorber with three possible settings is chosen from a range of
shock absorbers deemed to be suitable for the car. An experiment is conducted
by conducting roadholding tests using each tyre type and shock absorber setting.
The (coded) data resulting from the experiment are given below.
Factor Shock Absorber Setting
Tyre B1=Comfort B2=Normal B3=Sport
5 8 6
Type A1 6 5 9
8 3 12
9 10 12
Type A2 7 9 10
7 8 9
Decide whether an appropriate model has random-effects, mixed-effects or fixed-
effects and derive the appropriate ANOVA table. State clearly any conclusions
that may be drawn at the 5% level of significance.
Your solution
Do the calculations on separate paper and use the space here and on the following page for your
summary and conclusions.
Answer
We know that both the tyres and the shock absorbers are not chosen at random from populations
consisting of all possible tyre types and shock absorber types so that their influence is described by
a fixed-effects model. The calculations necessary to find MSA, MS B , MSAB and MSE are shown
below.
B1 B2 B3 Totals
5 8 6
A1 6 5 9
8 3 12
T11 = 19 T12 = 16 T13 = 27 T1·· = 62
9 10 12
A2 7 9 10
7 8 9
T21 = 23 T22 = 27 T23 = 31 T2·· = 81
Totals T·1· = 42 T·2· = 43 T·3· = 58 T··· = 143
The sums of squares calculations are:
Σ2 Σ 3 Σ 3 2 143 2 143 2
SS = x2 — T··· = 52 + 62 + . . . + 102 + 92 − = 1233 − = 96.944
T ijk
N 18 18
i=1 j=1 k=1
2 2 2 2 2 2
Σ Ti··2 T··· = 62 + 81 − 143 = 10405 − 143 = 20.056
SSA = bn− N 3×3 18 9 18
i=1
Σ T·j·
3 2
T···2 2 2 2 2 2
SS = − = 42 + 43 + 58 − 143 = 6977 − 143 = 26.778
B
j=1
an N 2×3 18 6 18
2 3 2 2 2 2 2
Σ Σ Tij· T 19 + . . . + 31 143
SSAB = − ··· − SS A — SS B = − − 20.056 − 26.778
i=1 j=1
n N 3 18
3565 1432
= − − 20.056 − 26.778 = 5.444
3 18
SSE = SST − SSA − SSB − SSAB = 96.944 − 20.056 − 26.778 − 5.444 = 44.666
The results are combined in the following ANOVA table.
Source SS DoF MS F (Fixed) F (Fixed)
MSA
Factor 20.056 1 20.056 5.39
MSE
A F1,12 = 4.75
MSB
Factor 26.778 2 13.389 3.60
MSE
B F2,12 = 3.89
MSAB
Interaction 5.444 2 2.722 0.731
MSE
AB F2,12 = 3.89
Residual 44.666 12 3.722
E
Totals 96.944 17
32
Answer
The following conclusions may be drawn:
Interaction: There is insufficient evidence to support the hypothesis that interaction takes place
between the factors.
Factor A: Since 5.39 > 4.75 we have sufficient evidence to reject the hypothesis that tyre type does
not affect the roadholding of the car.
Factor B: Since 3.60 < 3.89 we do not have sufficient evidence to reject the hypothesis that shock
absorber settings do not affect the roadholding of the car.
Your solution
Do the calculations on separate paper and use the space here and on the following page for your
summary and conclusions.
Your solution contd.
Answer
Both the machines and the testing stations are effectively chosen at random from populations
consisting of all possible types so that their influence is described by a random-effects model. The
calculations necessary to find MSA, MS B , MSAB and MSE are shown below.
B1 B2 B3 Totals
2.3 3.7 3.1
A1 3.4 2.8 3.2
3.5 3.7 3.5
T11 = 9.2 T12 = 10.2 T13 = 9.8 T1·· = 29.2
3.5 3.9 3.3
A2 2.6 3.9 3.4
3.6 3.4 3.5
T21 = 9.7 T22 = 11.2 T23 = 10.2 T2·· = 31.1
2.4 3.5 2.6
A3 2.7 3.2 2.6
2.8 3.5 2.5
T31 = 7.9 T32 = 10.2 T33 = 7.7 T3·· = 25.8
Totals T·1· = 26.8 T·2· = 31.6 T·3· = 27.7 T··· = 86.1
a = 3, b = 3, n = 3, N = 27 and the sums of squares calculations are:
3 3 3 2 2
ΣΣΣ T··· 86.1
= 2.32 + 3.42 + . . . + 2.62 + 2.52 = 5.907
SST = x2ijk − N − 27
i=1 j=1 k=1
3 2 2 2 2 2 2
Σ Ti·· T···
SSA = − = 29.2 + 31.1 + 25.8 − 86.1 = 1.602
i=1
bn N 3×3 27
Σ T·j·2 T 2
3 2 2 2 2
SS = − ···
= 26.8 + 31.6 + 27.7 − 86.1 = 1.447
B
j=1
an N 3×3 27
3 3 2 2
Σ Σ Tij· T
SSAB = − ··· − SS A — SSB
i=1 j=1
n N
9.22 + 10.22 + . . . + 10.22 + 7.72 86.12
= − − 1.602 − 1.447 = 0.398
3 27
SSE = SST − SSA − SSB − SSAB = 5.907 − 1.602 − 1.447 − 0.398 = 2.46
Answer continued
The results are combined in the following ANOVA table
(a) It is possible to simultaneously test the effects of two factors. This saves both time and
money.
(b) It is possible to determine the level of interaction present between the factors involved.
(c) The effect of one factor can be investigated over a variety of levels of another and so
any conclusions reached may be applicable over a range of situations rather than a single
situation.
HELM (2015): 35
Section 44.2: Two-Way Analysis of Variance
Exercises
1. The temperatures, in Celsius, at three locations in the engine of a vehicle are measured after
each of five test runs. The data are as follows. Making the usual assumptions for a two-
way analysis of variance without replication, test the hypothesis that there is no systematic
difference in temperatures between the three locations. Use the 5% level of significance.
2. Waste cooling water from a large engineering works is filtered before being released into the
environment. Three separate discharge pipes are used, each with its own filter. Five samples
of water are taken on each of four days from each of the three discharge pipes and the
concentrations of a pollutant, in parts per million, are measured. The data are given below.
Analyse the data to test for differences between the discharge pipes. Allow for effects due to
pipes and days and for an interaction effect. Treat the pipe effects as fixed and the day effects
as random. Use the 5% level of significance.
Day Pipe A
1 160 181 163 173 178
2 175 170 219 166 171
3 169 186 179 178 183
4 230 206 216 195 250
Day Pipe B
1 172 164 186 185 172
2 177 170 156 140 155
3 193 194 189 156 181
4 212 235 195 206 209
Day Pipe C
1 214 196 207 219 200
2 186 184 181 189 179
3 209 220 199 185 228
4 254 293 283 262 259
Answers
1. We calculate totals as follows.
Run Total Location Total
1 215.1 A 377.0
2 223.7 B 365.6
3 242.7 C 368.3
4 205.4 Total 1110.9
5 224.0
Total 1110.9
ΣΣ
y2ij = 82552.17
The total sum of squares is
1110.92
8255217 − = 278.916 on 15 − 1 = 14 degrees of freedom.
15
The between-runs sum of squares is
1 1110.92
(215.12 + 223.72 + 242.72 + 205.42 + 224.02) − = 252.796
3 15
on 5 − 1 = 4 degrees of freedom.
The between-locations sum of squares is
1 1110.92
(377.02 + 365.62 + 368.32) − = 14.196 on 3 − 1 = 2 degrees of freedom.
5 15
By subtraction, the residual sum of squares is
278.916 − 252.796 − 14.196 = 11.924 on 14 − 4 − 2 = 8 degrees of freedom.
The analysis of variance table is as follows.
The upper 5% point of the F2,8 distribution is 4.46. The observed variance ratio is greater than this
so we conclude that the result is significant at the 5% level and reject the null hypothesis at this
level. The evidence suggests that there are systematic differences between the temperatures at the
three locations. Note that the Runs mean square is large compared to the Residual mean square
showing that it was useful to allow for differences between runs.
HELM (2015): 37
Section 44.2: Two-Way Analysis of Variance
Answers continued
2. We calculate totals as follows.
Day 1 Day 2 Day 3 Day 4 Total
Pipe A 855 901 895 1097 3748
Pipe B 879 798 913 1057 3647
Pipe C 1036 919 1041 1351 4347
Total 2770 2618 2849 3505 11742
ΣΣΣ
y2ijk = 2356870
The total number of observations is N = 60.
The total sum of squares is
117422
2356870 − = 58960.6
60
on 60 − 1 = 59 degrees of freedom.
The between-cells sum of squares is
1 2 2 117422
(855 + · · · + 1351 ) − = 58960.6
5 60
on 12 − 1 = 11 degrees of freedom, where by “cell” we mean the combination of a pipe and a day.
By subtraction, the residual sum of squares is
58960.6 − 48943.0 = 10017.6
on 59 − 11 = 48 degrees of freedom.
The between-days sum of squares is
1 2 2 2 2 117422
(2770 + 2618 + 2849 + 3505 ) − = 30667.3
15 60
on 4 − 1 = 3 degrees of freedom.
The between-pipes sum of squares is
1 2 2 2 117422
(3748 + 3647 + 4347 ) − = 14316.7
20 60
on 3 − 1 = 2 degrees of freedom.
By subtraction the interaction sum of squares is
48943.0 − 30667.3 − 14316.7 = 3959.0
on 11 − 3 − 2 = 6 degrees of freedom.
38
Answers continued
The analysis of variance table is as follows.
Notice that, because Days are treated as a random effect, we divide the Pipes mean square by the
Interaction mean square rather than by the Residual mean square.
The upper 5% point of the F6,48 distribution is approximately 2.3. Thus the Interaction variance
ratio is significant at the 5% level and we reject the null hypothesis of n o interaction. We must
therefore conclude that there are differences between the means for pipes and for days and that
the difference between one pipe and another varies from day to day. Looking at the mean squares,
however, we see that both the Pipes and Days mean squares are much bigger than the Interaction
mean square. Therefore it seems that the interaction effect is relatively small compared to the
differences between days and between pipes.
What is Decision Theory?
Decision theory is an interdisciplinary approach to arrive at the decisions that are the most
advantageous given an uncertain environment.
Key Takeaways
Decision theory is an interdisciplinary approach to arrive at the decisions that are the
most advantageous given an uncertain environment.
Decision theory brings together psychology, statistics, philosophy, and mathematics to
analyze the decision-making process.
Descriptive, prescriptive, and normative are three main areas of decision theory and each
studies a different type of decision making.
There are three main areas of decision theory. Each studies a different type of decision making.
Decision Tree
Decision Tree : Decision tree is the most powerful and popular tool for classification and
prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a
test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal
node) holds a class label.
A decision tree for the concept PlayTennis.
The decision tree in above figure classifies a particular morning according to whether it is
suitable for playing tennis and returning the classification associated with the particular leaf.(in
this case Yes or No).
For example,the instance
would be sorted down the leftmost branch of this decision tree and would therefore be classified
as a negative instance.
In other words we can say that decision tree represent a disjunction of conjunctions of constraints
on the attribute values of instances.
Decision trees are less appropriate for estimation tasks where the goal is to predict
the value of a continuous attribute.
Decision trees are prone to errors in classification problems with many class and
relatively small number of training examples.
Decision tree can be computationally expensive to train. The process of growing a
decision tree is computationally expensive. At each node, each candidate splitting
field must be sorted before its best split can be found. In some algorithms,
combinations of fields are used and a search must be made for optimal combining
weights. Pruning algorithms can also be expensive since many candidate sub-trees must
be formed and compared.
For the students of
Note: Study material may be useful for the courses wherever Research Methodology paper is
being taught.
Prepared by:
Dr. Anoop Kumar Singh
Dept. of Applied Economics,
University of Lucknow
Topic: Chi Square- Test
The 2 test (pronounced as chi-square test) is an important and popular test of hypothesis which
fall is categorized in non-parametric test. This test was first introduced by Karl Pearson in the
year 1900.
It is used to find out whether there is any significant difference between observed frequencies
and expected frequencies pertaining to any particular phenomenon. Here frequencies are shown
in the different cells (categories) of a so-called contingency table. It is noteworthy that we take
the observations in categorical form or rank order, but not in continuation or normal distribution.
The test is applied to assess how likely the observed frequencies would be assuming the null
hypothesis is true.
This test is also useful in ascertaining the independence of two random variables based on
observations of these variables.
This is a non parametric test which is being extensively used for the following reasons:
1. This test is a Distribution free method, which does not rely on assumptions that the data are
drawn from a given parametric family of probability distributions.
2. This is easier to compute and simple enough to understand as compared to parametric test.
3. This test can be used in the situations where parametric test are not appropriate or
measurements prohibit the use of parametric tests.
It is defined as:
(𝑂−𝐸)2
2 = ∑ 𝐸
Where O refers to the observed frequencies and E refers to the expected frequencies.
1
Uses of Chi-Square Test
Chi Square test has a large number of applications where paremertic tests can not be applied.
Their uses can be summarized as under along with examples:
This test is helpful in detecting the association between two or more attributes. Suppose we
have N observations classified according to two attributes. By applying this test on the given
observations (data) we try to find out whether the attributes have some association or they are
independent. This association may be positive, negative or absence of association. For example
we can find out whether there is any association between regularity in class and division of
passing of the students, similarly we can find out whether quinine is effective in controlling fever
or not. In order to test whether or not the attributes are associated we take the null hypothesis
that there is no association in the attributes under study. In other words, the two attributes are
independent.
After computing the value of chi square, we compare the calculated value with its corresponding
critical value for the given degree of freedom at a certain level of significance. If calculate value
of 2 is less than critical or table value, null hypothesis is said to be accepted and it is
concluded that two attributes have no association that means they are independent. On the
other hand, if the calculated value is greater than the table value, it means that the results of the
experiment do not support the hypothesis and hypothesis is rejected, and it is concluded that
the attributes are associated.
Illustration 1: From the data given in the following table, find out whether there is any
relationship between gender and the preference of colour.
2
Solution: Let us take the following hypothesis:
We have to first calculate the expected value for the observed frequencies. These are shown
below along with the observed frequencies:
Since the calculated 2 =31.33 exceeds the critical value of 2 , the null hypothesis is rejected.
Hence, the conclusion is that there is a definite relationship between gender and preference of
colour.
It is the most important utility of the Chi Square test. This method is mainly used for testing of
goodness of fit. It attempts to set up whether an observed frequency distribution differs from an
estimated frequency distribution. When an ideal frequency curve whether normal or some other
type is fitted to the data, we are interested in finding out how well this curve fits with the
observed facts.
3
The following steps are followed for the above said purpose:
iv. On the basis of given actual observations, expected or theoretical frequencies are derived
through probability. This generally takes the form of assuming that a particular probability
v. The observed frequencies are compared with the expected or theoretical frequencies.
vi. If the calculated value of 2 is less than the table value at a certain level of significance
(generally 5% level) and for certain degrees of freedom the, fit is considered to be good. i.e.. the
divergence between the actual and expected frequencies is attributed to fluctuations of simple
sampling. On the other hand, if the calculated value of 2 is greater than the table value, the fit
is considered to be poor i.e. it cannot be attributed to the fluctuations of simple sampling rather it
Illustration 2:
In an anti malaria campaign in a certain area, quinine was administered to 812 persons out of a
total population of 3248. The number of fever cases is shown below:
4
Solution: Let us take the following hypotheses:
Applying 2 test:
(𝐴)𝑋(𝐵) 200𝑋 170
Expectated frequency of say AB = = = 136
𝑁 250
Or 𝐸1 , i.e., expected frequency corresponding to first row and first column is 60.
O E (O-E)2 (O-E)2 /E
140 136 16 0.118
60 64 16 0.250
30 34 16 0.471
20 16 16 1.000
(𝑂−𝐸)2
∑ = 1.839
𝐸
(𝑂−𝐸)2
2 = ∑ 𝐸
= 1.839
The calculated value of 2 i.e. 1.839 is less than the table value i.e. 3.84, the null hypothesis is
accepted. Hence quinine is not useful in checking malaria.
5
(C) A test of homogeneity
The 2 test of homogeneity is an extension of the 2 test of independence. Such tests indicate
whether two or more independent samples are drawn from the same population or from different
populations. Instead of one sample as we use in the independence problem, we shall now have
two or more samples. Supposes a test is given to students in two different higher secondary
schools. The sample size in both the cases is the same. The question we have to ask: is there any
difference between the two higher secondary schools? In order to find the answer, we have to set
up the null hypothesis that the two samples came from the same population. The word
‘homogeneous’ is used frequently in Statistics to indicate ‘the same’ or ‘equal’. Accordingly, we
can say that we want to test in our example whether the two samples are homogeneous. Thus, the
test is called a test of homogeneity.
Illustration 3: Two hundred bolts were selected at random from the output of each of the five
machines. The number of defective bolts found were 5, 9, 13, 7 and 6 . Is there a significant
difference among the machines? Use 5% level of significance.
As there are five machines, the total number of defective bolts should be equally distributed
among these machines. That is how we can get expected frequencies as under:
6
Computation of Chi Square test
Decision: The critical value of 2 at 0.05 level of significance for 4 degrees of freedom is 9.488.
As the calculated value of 2 = 5 is less than the critical value, 𝐻𝑂 is accepted. In other words,
the difference among the five machines in respect of defective bolts is not significant.
7
CHI-SQUARE TEST
DR RAMAKANTH
Introduction
• The Chi-square test is one of the most commonly used non-parametric
test, in which the sampling distribution of the test statistic is a chi-square
distribution, when the null hypothesis is true.
• The chi-square statistic compares the observed count in each table cell to
the count which would be expected under the assumption of no association
between the row and column classifications.
Degrees of freedom
• The number of independent pieces of information which are free to vary, that
go into the estimate of a parameter is called the degrees of freedom.
• In general, the degrees of freedom of an estimate of a parameter is equal to
the number of independent scores that go into the estimate minus the
number of parameters used as intermediate steps in the estimation of the
parameter itself (i.e. the sample variance has N-1 degrees of freedom, since
it is computed from N random scores minus the only 1 parameter estimated
as intermediate step, which is the sample mean).
• The number of degrees of freedom for ‘n’ observations is ‘n-k’ and is usually
denoted by ‘ν ’, where ‘k’ is the number of independent linear constraints
imposed upon them. It is the only parameter of the chi-square distribution.
• The degrees of freedom for a chi squared contingency table can be
calculated as:
Chi Square formula
• The chi-squared test is used to determine whether there is a significant
difference between the expected frequencies and the observed
frequencies in one or more categories.
• The value of χ 2 is calculated as:
Parametric Non-Parametric
Testing
Test for
Independence
comparing
Test for Goodness of
variance
Fit
Interpretation of Chi-Square values
• The χ 2 statistic is calculated under the assumption of no association. „
df = (2—1)(2—1) = 1
• The calculated value of χ2 =11.01 exceeds the value of chi-square (10.83)
required for significance at the 0.001 level.
• Hence we can say that the observed result is significant beyond the 0.001
level.
• Thus, the null hypothesis can be rejected with a high degree of
confidence.