0% found this document useful (0 votes)
30 views7 pages

Lecture 11

Uploaded by

muhammad ziyam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views7 pages

Lecture 11

Uploaded by

muhammad ziyam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lecture 11: Statistical Inference by Dr.

Javed Iqbal
Analysis of Variance:
We have learned how to compare two population means, that is, the means of a single variable for
two different populations. We studied various methods for making such comparisons, one being
the pooled t-procedure. Analysis of variance (ANOVA) provides a method for testing equality of
several population means. One-way analysis of variance deals with comparing means of different
populations (or treatment) for a single variable.

Consider Weiss, Example 16.2, p-726: Here they want to compare the average energy
consumption in the four regions of US.

As before we take random samples from each population.

The ANOVA considers the total variability (SST) in the response variable under study and
partition this sum of squares into two parts:
i) SSTR: sum of squares due to a treatment (in this case region) which we think affects
the response or dependent variable (explained variation due to different regions)
ii) SSE: sum of squares due to error or unexplained variation
(unexplained variation: why there are differences in household consumption within a
region?)
Thus we have the famous one way ANOVA identity:

SST = SSTR + SSE

The logic of ANOVA is to compare the variance due to treatment and variance due to error. If
variance due to treatment (i.e. explained variation) is large relative to error (i.e. unexplained
variation), then we reject the null hypothesis that means of all treatment are same.

Suppose there are k levels of a factor (i.e. k treatments) and (n1 + n2 +…+ nk = n) total number of
observations.

𝑆𝑆𝑇𝑅 = 𝑛1 (𝑥̅1 − 𝑥̅ )2 + 𝑛2 (𝑥̅2 − 𝑥̅ )2 + ⋯ + 𝑛𝑘 (𝑥̅𝑘 − 𝑥̅ )2


𝑆𝑆𝐸 = (𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 + ⋯ + (𝑛𝑘 − 1)𝑠𝑘2
𝑆𝑆𝑇𝑅 𝑆𝑆𝐸
𝑀𝑆𝑇𝑅 = 𝑀𝑆𝐸 =
𝑘−1 𝑛−𝑘

Where 𝑥̅𝑖 is sample mean of treatment i and 𝑥̅ is the grand mean of all n observations.
𝑛1 𝑥̅ 1 + 𝑛2 𝑥̅ 2 +⋯+ 𝑛𝑘 𝑥̅ 𝑘
Grand mean 𝑥̅ = , 𝑛 = 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘
𝑛1 + 𝑛2 +⋯+ 𝑛𝑘

The test statistic has the F distribution with k-1 and n-k degrees of freedom.

𝑀𝑆𝑇𝑅
𝐹= ~ 𝐹( 𝑘 − 1, 𝑛 − 𝑘)
𝑀𝑆𝐸
In the energy consumption example above for the four regions of US. Here k = 4 and n =20

Here k=number of treatments (or number of groups) = 4,


n = total number of observations = 5 + 6 + 4 + 5 = 20.
𝑥̅1 = 11.0 , 𝑥̅2 = 12.50 𝑥̅ 3 = 7.5 , 𝑥̅4 = 7.2, 𝑥̅ = grand mean of all n = 20 observations = 9.8
𝑛1 𝑥̅ 1 + 𝑛2 𝑥̅2 + 𝑛3 𝑥̅ 3 +𝑛4 𝑥̅ 4 5(11)+6(12.5)+4(7.5)+5(7.2)
Note: Grand mean 𝑥̅ = = = 9.8
𝑛1 + 𝑛2 + 𝑛3 +𝑛4 5+6+4+5

𝑆𝑆𝑇𝑅 = 𝑛1 (𝑥̅1 − 𝑥̅ ) 2 + 𝑛2 (𝑥̅2 − 𝑥̅ ) 2 + 𝑛3 (𝑥̅3 − 𝑥̅ ) 2 + 𝑛4 (𝑥̅4 − 𝑥̅ ) 2


SSTR = 5(11.0 − 9.8)2 + 6(12.5 − 9.8)2 + 4(7.5 − 9.8)2 + 5(7.2 − 9.8)2 = 105.9
𝑠12 = 3.5, 𝑠22 = 6.7, 𝑠32 = 9.0, 𝑠42 = 3.7
SSE =(𝑛1 − 1)𝑠12 +(𝑛2 − 1)𝑠22 + (𝑛3 − 1)𝑠32 + (𝑛4 − 1)𝑠42
SSE = (5 − 1)(3.5) + (6 − 1)(6.7) + (4 − 1)(9.0) + (5 − 1)(3.7) = 89.3
𝑆𝑆𝑇𝑅 105.9 𝑆𝑆𝐸 89.3
MSTR = = = 35.3 MSE = = = 5.581
𝑘−1 4−1 𝑛−𝑘 20−4
𝑀𝑆𝑇𝑅 35.3
𝐹= = = 6.32
𝑀𝑆𝐸 5.581
CR = F(0.05, 3,16) = 3.24

Conclusion: As F calculated falls in the rejection region, we reject H0 and conclude that average
household level of energy consumption differs significantly in the 4 regions of the US.
The box plot of the 4 regions indicate that the highest average energy consumption is for Midwest
region. The average level of energy consumption for the South and West region is nearly same.
Assumptions of ANOVA model:
1)The variances of k treatments or populations are equal (that’s why this procedure is an extension
of pooled t test).
2)Treatments are applied randomly and independently to subjects
3)Error within each population is normally distributed.
Note that one of the assumptions of the ANOVA is that variances are equal for the sub populations
which seems to be violated in this case. In such cases one can transform the data e.g., by log
transformation before applying ANOVA.
Anderson Ex 9, 10, 12 pdf p-644
Sol Ex 9:

SUMMARY
Groups Count Sum Average Variance
50° 5 165 33 32
60° 5 145 29 17.5
70° 5 140 28 9.5
Grand mean = 30
SSTR = 70, MSTR = 35, SSE = 236, MSE = 19.66667, F = 1.7796, F(0.05, 2,12) =3.885
Do not reject H0. There is no sufficient evidence of difference in average yield at the three
temperatures.

ANOVA
Source of
Variation SS df MS F P-value F crit
Between Groups 70 2 35 1.779661 0.210447 3.885294
Within Groups 236 12 19.66667
Total 306 14

One way (Single Factor) ANOVA in Excel:


First copy data in 4 different columns as:
Then in the Data Analysis Table >Select ANOVA: Single Factor and Input the range of 4 columns
as Input Range. The result is:

Anova: Single Factor


SUMMARY
Groups Count Sum Average Variance
Northeast 5 55 11 3.5
Midwest 6 75 12.5 6.7
South 4 30 7.5 9
West 5 36 7.2 3.7
ANOVA
Source of Variation SS df MS F P-value F critical
Between Groups 105.9 3 35.3 6.324748 0.00493 3.238872
Within Groups 89.3 16 5.58125
Total 195.2 19

In two-way analysis of variance, we are interested in knowing the effect of two factors
simultaneously affecting a response variable e.g. in agricultural experiments both variety of seed
and fertilizer type may affect the response crop yield. Analysis of variance is the computational
method in the statistical field of Design of Experiment.

One –way ANOVA in R:


First arrange the data in an Excel file and save the file as csv (call it energy .csv saved in D
drive)
Then in R use these commands:
data=read.csv(“D:\\energy.csv”)
attach(data)
model=aov(consumption~Region, data=data)
Summary(model)

One –way ANOVA in SPSS:


First make a single column of the data (energy consumption here) and in the next column write
codes (1,2,3,4) for each region as follows.
Then Analyze>Compare Means>One-way Anova
The result is the ANOVA table with F calculated and p-value of the test (sig value)

You might also like