0% found this document useful (0 votes)
18 views21 pages

Anova (Sta 305)

1) ANOVA compares the means of different groups to determine if they are statistically different. It compares the variance between groups to the variance within groups. 2) Key terms include dependent variable, independent variable, null hypothesis (that means are equal), alternative hypothesis (that means are different). 3) There are two main types of ANOVA - one-way (one independent variable) and two-way (two independent variables that may interact).

Uploaded by

perfectNja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views21 pages

Anova (Sta 305)

1) ANOVA compares the means of different groups to determine if they are statistically different. It compares the variance between groups to the variance within groups. 2) Key terms include dependent variable, independent variable, null hypothesis (that means are equal), alternative hypothesis (that means are different). 3) There are two main types of ANOVA - one-way (one independent variable) and two-way (two independent variables that may interact).

Uploaded by

perfectNja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

(PART 1)

STA 305 (ANOVA)


Lecture 1

Course: 305
Date: 31/01/2023

Analysis of variance (ANOVA) is a statistical formula used to


compare variances across the means (or average) of different
groups.

A range of scenarios usually to determine if there's any


difference between the mean of different groups. For
Example; to study effectiveness of different diabetes
medications, scientist design and experiment to explore the
relationship between the type of medicine and the resulting
blood sugar level. The sample population is a set of people,
we divide the sample into multiple group and each group
receive a particular medicine for a trial. period. Blood sugar
level are measured for each of the individual participants
then for each group. ANOVA help to compare these means to
find if they are statistically different or if they are similar.

The outcome of ANOVA is the F-Statistic, this ratio shows the


difference between the within group variance and the
between group variance which ultimately produces a figure
which allow a conclusion that the null will be accepted or
rejected. If there's a significant difference between the
groups, the null hypothesis is not supported and the F-Ratio
will be larger.

DEFINITION OF TERMS
1. DEPENDENT VARIABLE: This is the item been measured
that is theorized to be affected by the independent variable.

2. INDEPENDENT VARIABLE: these are the items been


measured that may have affect the dependent variable.

3. NULL HYPOTHESIS (Ho): This is when there's no difference


between the groups or means.
Depending on the results of the ANOVA Test, the Null
hypothesis will either be accepted or rejected.

4. ALTERNATIVE HYPOTHESIS (H1): when it is theorized that


there's a difference between groups and means.

5. FACTORS AND LEVELS: In ANOVA terminology, an


independent variable is called a factor which affect the
dependent variable. Level denote difference value of the
independent variable that are used in an experiment.

6. FIXED FACTOR MODEL: Some experiment used only a


discrete set of levels for factors. For Example; a fixed factor
test will be testing three different dosages of a drug and not
looking at any other dosages

7. RANDOM FACTOR MODEL: This model draws a random


value of levels from all the possible values of the
independent variable

THEY'RE ARE TWO TYPES OF ANOVA.


1. One-Way ANOVA: The One-way analysis of variance is also
known as single factor ANOVA or simple ANOVA. as the name
suggest the One-Way ANOVA is suitable for experiment with
only one independent variable (factor) with two or more
levels.

For instance, a dependent variable maybe; what month of


the year, there are more flowers, there will be 12 levels.

ASSUMPTIONS OF ONE-WAY ANOVA


1. Independent; the value of the dependent variable for one
observation is independent of the value of any other
observations.
2. Normalcy; the value of the dependent variable is normally
distributed.
3. Variance; the variance is comparable in different
experiment groups.
4. Continues; the dependent variable is continues and can be
measured on a scale which can be subdivided.

2. FULL FACTORIAL ANOVA ALSO CALLED TWO-WAY ANOVA:


this is used when they're two or more independent variable,
each of these factors can have multiple levels. Full factorial
ANOVA can only be used in the case of a Full factorial
experiment where there is used of every possible
permutations of factors and their levels. This might be the
month of the year when they're more flowers in the garden
and then the number of sunshine hours.

This Two-Way ANOVA not only measures the independent vs


the independent variables but the two factors affect each
other.

LEAST SIGNIFICANT DIFFERENCE TEST (L.S.D)


The Least Significant Difference Test is used in the context of
the Analysis of variance, when the F-ratio suggest rejection of
the Null Hypothesis (Ho) i.e when the difference between the
population mean is significant. The test helps to identify the
populations whose means are statistically different. The basic
idea of the test is to compare the populations taken in pairs.
it's then used to proceed in One-Way or Two-Way analysis of
variance given that the Null hypothesis has already been
rejected.
Lecture 2
Course: STA 305
07/02/2023
One-way classification: random sample of size are selected
from each K population. It will assume that the K populations
are independent and normally distributed with mean μ1, μ2...,
μk and common variance σ2. We wish to derive appropriate
method for testing the hypothesis;

Ho: μ1=μ2=...=μk
H1: μ1≠μ2≠...≠μk

At least two of the means are not equal.

Decision Criterion

Reject Ho if Fcal > Fα k-1, n-k

For α level of significance.

Let Xij denote difference the jth observation from the ith
population and arrange the data as in Table one. Ti is the
Total of all observation in the sample from the ith population,
the mean observation Xgi. is the mean of all observation in the
sample from the ith population, T.. is the Total of all the Nk
observations and Xg.. is the mean of all the Nk observations.
Each observation maybe written in the form;
Xij= μi + ∑ij
∑ij measures the deviation of the jth observation of the ith
sample from the corresponding population mean.
An alternative and preferred form of this equation obtained
by substituting μ1=μ + αi where μ is defined to be the mean
of all the μ.. that is,
Table 1

K Random Samples
POPULATION

1 2 ... I . . .k

x11 x12 .......... xi1. . . xk1


X11 x12 . . . . . . . . . . xi1. . . xk1
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
X11 x12 xi1. . . xk1
Total T1 T2 Tn Tk T..

Mean Xg1 Xg2 Xgi Xgk Xg..


Formula;
∑%
&'( #$
μ=
)

Hence, we may write


Xij = μ + αi + ∑ij

Subject to the restriction that ∑αi= 0


It is customary to refer to αi as effect of the ith population.
The Null hypothesis (Ho) that the K population means are
equal against the alternative (H1) that at least Two of the
means are unequal may now be replaced by the equivalent
hypothesis then;

Ho: α1 = α2 = ... = αk =0
H1: At least one of the αi is not equal to zero

Our test will be based on comparison of two independent


estimate of the common population variance σ2. This
estimate will be obtained by splitting the Total variability of
our data into two components. The variance of all the
observations grouped into a simple sample of size nk given
by the formula;

+$,-+. 2
S = ∑)
2 ∑ /
320 120( )
/)-0

The double summation means that to sum all possible terms


allowing i to assume value from 1 to k for each value of j
from 1 to the numerator of the S2 called the sum of the
square’s measures. The total variance of our data. It maybe
partitioned by means of the following identity.

Theorem 11.1 (Sum of square identity)


∑)320 ∑/120(Xij − X9. .) = 𝑛 ∑)320(X9i −
X9. . ) + ∑)320 ∑/120(Xij − X9𝐢. )2
ASSIGNMENT
Proof the sum of squares identity above !

Note;

SST= ∑) ∑ /
320 120 (Xij − 9
X . . ) 2 - Total sum of squares

SSC= n∑) 9 9 2
320(Xi − X. .) = Sum of Square For column
mean

SSE= ∑) ∑ /
320 120 (Xij − 9
X i ) 2= Error sum of squares

The sum of squares identity can then be represented


symbolically by the notation

SST= SSC + SSE


Many authors refer to the Sum of squares for column mean
as the treatment sum of squares. this terminology is derived
from the facts that the K different populations are often
classified according to different treatment. Thus, the
observations Xij (j=1, 2..., n) to represent the n
measurements corresponding to the ith treatment.

One Estimate of σ2 base σn k-1 degree of freedom is given by

??@
S12=
)-0

Ho is true S12 is an unbiased estimate of σ2 however, if Hi is


true SSC will have a larger numerical value and S12 over
estimates σ2 a second independent estimate of σ21 base on
K(n-1) degrees of freedom is given by

??A
S12=
B(/-0)

Estimate S22 is unbiased regardless of the truth or falsity of


the Null hypothesis we have already seen that the variance of
our grouped data with degree of freedom.

??C
S2=
/)-0

Which is an unbiased estimate of σ2 when Ho is true. It is


important know the sum of squares identity has not only
partitioned the total variability of data but also the Total
number if degree of freedom i.e Nk-1 = K-1+k(n-1) when Ho is
true the ratio

F= S12

S22

Fcal>Fd.f, k(n-1)

In practice one usually computes SST & SSC and then locate
the sum of squares identity SSE

SSE= SST-SSC

The computation in an analysis of variance problem are


usually summarize in tabular form as shown in Table 2.
ANALYSIS OF VARIANCE FOR ONE WAY CLASSIFICATION

TABLE 2

Source Of Sum of Degree of Mean Square F


Variation Squares Freedom

Column SSC K-1 S21=SSC


Mean K-1

S21

S22

Error SSE K(n-1) S22=SSE


K(n-1)

Total SST Nk-1


Example 1.
An experiment was conducted to compare three computer
keyboard design with respect to their effect on repetitive
stress injuries (RSI) fifteen Businesses of comparable size
participated in a study to compare the three keyboard
design, five of the fifteen Business were randomly selected
and their computer were equipped with designs One
Keyboards, five of the remaining ten were selected and
equipped with design Two Keyboard and the remaining Five
used design Three Keyboard. After one Year the number of
R.S.I were recorded for each Company the results are shown
in the Table below "

Design 1 Design 2 Design 3

10 24 17

10 22 17

8 24 15

10 24 19

12 26 17
Mean 10 24 17
Solution

K=3
N=15

SSC = ∑ 𝑛(X9 − X9i)2


Xg= 17

SSC = ∑ 𝑛(X9 − X9. . )2 + 𝑛(X̄ 2 − X̄ . . )2 + 𝑛(X̄ 3 − X̄ )2

= ∑ 5(10 − 17)2+5(24 − 17)2+5(17 − 17)2

= 245 + 245 + 0

= 490
Lecture 3
Course: STA 305

Another Method to solve one way Classification;


SST = ∑)320 ∑/120 𝑥2ij – T2
nk

T= 10+10+8+…+17 =255
T2= 2552= 65025
T2 = 65025 = 4335
nk 5*3

ΣΣ X2ij = 102+102+82+… +192+192=4849


Then;
SST = 4849 – 4335
= 514

To find SSC

SSC = ∑)320 T2 - T2
n nk

∑)320 T2 = 502 + 1202 + 852 = 4825


nk 5
But, T2 =4335
nk
SSC= 4825 – 4335
= 490

Then;
SST = SST – SSC
= 514 – 490 = 24

Terms;
Mean Square Column = MSC
Mean Square Error = MSE

MSC = SSC = 490 = 490 = 245


K-1 3-1 2

MSC = SSC = 24 = 24 =2
n(K-1) 3(5-1) 12

F ratio is the test statistic use to test the above statistic

F= MSC = 245 = 122.5


MSE 2

∴ F ≃ 123

The test statistic has an F distribution with;


d.f = k-1 = 3-1 = 2
d.f = k(n-1) = 3(5-1) = 15-13 = 12
Note;
df1 – horizontal
df2 – horizontal

the 5% right hand tail critical values is F0.05 (2,12) = 3.87


F> F𝛼. (𝑘 − 1, 𝑘(𝑛 − 1)
123 > 3.87
Table 1

Source Of Sum of Degree of Mean Square F


Variation Squares Freedom MSC/MSE =
245/2=
123bn

Column SSC=4825- K-1= 3-1= 2 S21= SSC


Mean 4335= 490 K-1
490 = 245
2
S21

S22

K(n1)=
Error SSE= 514- 3(5-1)=12 S22=SSE
490= 24 K(n-1)
=2

Total SST= 4849- Nk=1


4335= 514 3(5)-1= 14
The extreme large value of the test statistic suggest that the
population mean are not equal and that the null hypothesis
should be rejected. The difference in the sample means are
significant and indicates that the design 1 will reduce the
number of RSI.

TWO WAY CLASSIFICATION


SINGLE OBSERVATION PERCELL
A set of observation may be classified according to two
criterions at once by mean of rectangular array in which the
columns represent one criterion of classification and the
rows represent the second criterion of classification.
For example, the rectangular array of observation might be
the yields of three varieties wheat, discussed in the section
one using four different four kind of fertilizer. The yields are
given in the table below. Each treatment combination
defines a cell in our array for which we have obtained a single
observation.
Table 2
Yields Of Wheat In Bushels Per Acre
Fertilizer Varieties Of Wheat Total
Treatment V1 V2 V3
T1 64 72 74 210
T2 55 57 47 159
T3 59 66 58 183
T4 58 57 53 168
Total 236 252 232 720
Table 3
Columns

1 2 ... j . . .c
Total Mean
Row
1 x11 x12 .......... x1j. . . x1c T1 Xg1

2 X11 x12 . . . . . . . . . . x2j. . . x2c T2 Xg2


. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. X11 x12 xij. . . xic Ti Xgi
. . .
.
. . .
.
. . .
.
. . .
.
. . .
.
. . .
.
.
X r1 xr2 xrj. . . xrc
Total T1 T2 Tj Tc T.. Xg..

Mean Xg1 Xg2 Xgj Xgc


In the section, we can derive formula that will enable us to
test whether the variation in our yields is cause by the
different variety of wheats, different kind of fertilizer or
differences in both. We shall now generalize and consider a
rectangular array consisting of r-row and c-columns as in
table 3, where xi denotes an observation in ith rows and jth
columns. It will be assumed that the Xij are values of
independent random variables having normal distribution
with means μij and common variance σ2 . in table, T.j and X9 .j
are the Total and mean of the observations in the jth column,
and T.. and X9.. are the total and mean of all (r.c) observations.
the average of the population means for the ith rows, μi is
defined by;

T #$,
μ.i= ∑120
T

similarly, the average of the population means for the jth


column, μij is defined by;

U #$,
μ.j= ∑320
U

the average of r.c population means μi is defined by;

U T #$,
μ= ∑320 ∑120
UT
to determine the part of the variation in our observation is
due to difference among rows, we consider the test.

Ho: μ1=μ2=...=μr=μ
H1: the μi are not all equal.
Also, to determine if part of the variation is due to difference
among the columns, we consider the test.

Ho: μ1=μ2=...=μc=μ
H1: the μj are not all equal.

You might also like