0% found this document useful (0 votes)
129 views32 pages

Anova Test

The one-way analysis of variance (ANOVA) tests whether the means of three or more groups are equal. It compares the variation between groups to the variation within groups. The ANOVA calculates an F-ratio by dividing the between-group variation by the within-group variation. If the F-ratio is sufficiently large and the p-value is sufficiently small, the null hypothesis that the group means are equal can be rejected, indicating that at least one group mean is different. The one-way ANOVA extends the independent samples t-test to compare more than two groups.

Uploaded by

Tumabang Divine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views32 pages

Anova Test

The one-way analysis of variance (ANOVA) tests whether the means of three or more groups are equal. It compares the variation between groups to the variation within groups. The ANOVA calculates an F-ratio by dividing the between-group variation by the within-group variation. If the F-ratio is sufficiently large and the p-value is sufficiently small, the null hypothesis that the group means are equal can be rejected, indicating that at least one group mean is different. The one-way ANOVA extends the independent samples t-test to compare more than two groups.

Uploaded by

Tumabang Divine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

ANOVA

Analysis of Variance
One-Way ANOVA

◼ The one-way analysis of variance is used to test the


claim that three or more population means are equal

◼ This is an extension of the two independent samples t-


test
One-Way ANOVA
◼ The response variable is the variable you’re comparing

◼ The factor variable is the categorical variable being


used to define the groups
◼ We will assume k samples (groups)

◼ The one-way is because each value is classified in


exactly one way
◼ Examples include comparisons by gender, race,
political party, color, age groups, education level,
marital status, etc.
Conditions or Assumptions
◼ The data are randomly sampled

◼ The variances of each sample are assumed


equal

◼ The residuals are normally distributed


◼ The null hypothesis is that the means are all
equal
H : =  =  =
0 1 2 3
= k

◼ The alternative hypothesis is that at least


one of the means is different

◼ The ANOVA doesn’t test that one mean is


less than another, only whether they’re all
equal or at least one is different.
Example
In order to verify if the scores of students in the class are
influenced by their sitting position, the statistic teacher
divided the class into tree groups and compares their exam
scores.
Front: 82, 83, 97, 93, 55, 67, 53
Middle: 83, 78, 68, 61, 77, 54, 69, 51, 63
Back: 38, 59, 55, 66, 45, 52, 52, 61
The summary statistics for the grades of each row
are shown in the table below

Row Front Middle Back


Sample size 7 9 8
Mean 75.71 67.11 53.50
St. Dev 17.63 10.95 8.96
Variance 310.90 119.86 80.29
Variation

Variation is the sum of the squares of the


deviations between a value and the mean of
the value

Sum of Squares is abbreviated by SS and


often followed by a variable in parentheses
such as SS(B) or SS(W) so we know which
sum of squares we’re talking about
THREE QUESTIONS

Are all of the values identical?

◼ No, so there is some variation in the data

◼ This is called the total variation

◼ Denoted SS(Total) or the total Sum of Squares


(variation)

◼ Sum of Squares is another name for variation


Are all of the sample means identical?

◼ No, so there is some variation between the groups

◼ This is called the between group variation

◼ Sometimes called the variation due to the factor

◼ Denoted SS(B) or Sum of Squares (variation)


between the groups
Are each of the values within each group identical?

◼ No, there is some variation within the groups

◼ This is called the within group variation

◼ Sometimes called the error variation

◼ Denoted SS(W) for Sum of Squares (variation)


within the groups
There are two sources of variation

◼ The variation between the groups, SS(B), or the


variation due to the factor

◼ The variation within the groups, SS(W), or the


variation that can’t be explained by the factor so it’s
called the error variation
◼ Basic ANOVA table

Source SS df MS F p

Between

Within

Total
Grand Mean
The grand mean is the average of all the values when the
factor is ignored

It is a weighted average of the individual sample means

nx +n x + +n x
k

n x
x= i =1
i i
x= 1 1 2 2 k k

n +n + +n
k

n
i =1
i 1 2 k

7 ( 75.71) + 9 ( 67.11) + 8 ( 53.50 )


x =
7+9+8
1562
x =
24
x = 65.08
The Between Group Variation for our example is
SS(B)=1902

SS ( B ) =  n ( x − x )
k 2

i i
i =1

SS ( B ) = n ( x − x ) + n ( x − x ) + + n (x − x )
2 2 2

1 1 2 2 k k

SS ( B ) = 7 ( 75.71 − 65.08) + 9 ( 67.11 − 65.08) + 8 (53.50 − 65.08)


2 2 2

SS ( B ) = 1900.8376  1902
Within Group Variation, SS(W)

◼ The Within Group Variation is the weighted total of


the individual variations
◼ The weighting is done with the degrees of freedom
◼ The df for each sample is one less than the sample
size for that sample. df = ni-1

SS (W )
k
=  df s i
2
i
i =1

SS (W ) = df s + df s +
1
2
1 2
2
2
+ df s
k
2
k

SS (W ) = 6 ( 310.90 ) + 8 (119.86 ) + 7 (80.29 )

SS (W ) = 3386.31  3386
❑ The between group df is one less than the number of
groups
❑ We have three groups, so df(B) = 2

❑ The within group df is the sum of the individual df’s of


each group
❑ The sample sizes are 7, 9, and 8

❑ df(W) = 6 + 8 + 7 = 21

❑ The total df is one less than the sample size


❑ df(Total) = 24 – 1 = 23
Filling in the degrees of freedom gives this …

Source SS df MS F p

Between 1902 2

Within 3386 21

Total 5288 23
Variances
◼ The variances are also called the Mean of the
Squares and abbreviated by MS, often with an
accompanying variable MS(B) or MS(W)

◼ They are an average squared deviation from the


mean and are found by dividing the variation by the
degrees of freedom

◼ MS = SS / df

Variation
Variance =
df
❖ MS(B) = 1902 / 2 = 951.0

❖ MS(W) = 3386 / 21 = 161.2

❖ MS(T) = 5288 / 23 = 229.9

❖ Notice that the MS(Total) is NOT the sum of


MS(Between) and MS(Within).

❖ This works for the sum of squares SS(Total), but not the
mean square MS(Total)

❖ The MS(Total) isn’t usually shown


Special Variances
◼ The MS(Within) is also known as the pooled
estimate of the variance since it is a weighted
average of the individual variances

◼ The MS(Total) is the variance of the response


variable.
◼ Not technically part of ANOVA table, but useful
none the less
F test statistic
An F test statistic is the ratio of two sample
variances
The MS(B) and MS(W) are two sample
variances and that’s what we divide to find F.

F = MS(B) / MS(W)

For our data, F = 951.0 / 161.2 = 5.9


◼ Adding F to the table …

Source SS df MS F p

Between 1902 2 951.0 5.9

Within 3386 21 161.2

Total 5288 23 229.9


◼ The F test is a right tail test

◼ The F test statistic has an F distribution


with df(B) numerator df and df(W)
denominator df

◼ The p-value is the area to the right of the


test statistic

◼ P(F2,21 > 5.9) = 0.009


Completing the table with the p-value

Source SS df MS F p

Between 1902 2 951.0 5.9 0.009

Within 3386 21 161.2

Total 5288 23 229.9


◼ The p-value is 0.009, which is less than the significance
level of 0.05, so we reject the null hypothesis.

◼ The null hypothesis is that the means of the three rows


in class were the same, but we reject that, so at least
one row has a different mean.

◼ There is enough evidence to support the claim that there


is a difference in the mean scores of the front, middle,
and back rows in class.

◼ The ANOVA doesn’t tell which row is different, you would


need to look at confidence intervals or run post hoc tests
to determine that.
Example
Treatment 1 Treatment 2 Treatment 3 Treatment 4
60 inches 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
Example
Step 1) calculate the sum of
squares between groups:
Treatment 1 Treatment 2 Treatment 3 Treatment 4
60 inches 50 48 47
Mean for group 1 = 62.0 67 52 49 67
42 43 50 54
Mean for group 2 = 59.7
67 67 55 67
Mean for group 3 = 56.3 56 67 56 68
62 59 61 65
Mean for group 4 = 61.4 64 67 61 65
59 64 60 56
72 63 59 60
Grand mean= 59.85 71 65 64 65

SSB = [(62-59.85)2 + (59.7-59.85)2 + (56.3-59.85)2 + (61.4-59.85)2 ] xn per


group= 19.65x10 = 196.5
Example

Step 2) calculate the sum of


squares within groups: Treatment 1 Treatment 2 Treatment 3 Treatent 4
60 50 48 47
67 52 49 67
( 60-62) 2+(67-62) 2+ (42-62) 2+
42 43 50 54
(67-62) 2+ (56-62) 2+ (62-62) 2+
(64-62) 2+ (59-62) 2+ (72-62) 2+ 67 67 55 67
(71-62) 2+ (50-59.7) 2+ (52-59.7) 56 67 56 68
2+ (43-59.7) 2+67-59.7) 2+ (67-
62 59 61 65
59.7) 2+ (69-59.7) 2…+….(sum
of 40 squared deviations) = 64 67 61 65
2060.6 59 64 60 56
72 63 59 60
71 65 64 65
Step 3) Fill in the ANOVA table

Source of d.f. Sum of Mean Sum F- p-value


variation squares of Squares statistic
Between 3 196.5 65.5 1.14 0.344

Within 36 2060.6 57.2

Total 39 2257.1

INTERPRETATION of ANOVA:
How much of the variance in height is explained by treatment group?
R2=“Coefficient of Determination” = SSB/TSS = 196.5/2275.1=9%
Coefficient of Determination

SSB SSB
R =2
=
SSB + SSE SST

The amount of variation in the outcome variable


(dependent variable) that is explained by the predictor
(independent variable).

You might also like