0% found this document useful (0 votes)
17 views12 pages

Topic 3

Uploaded by

kimaninduta49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

Topic 3

Uploaded by

kimaninduta49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

TOPIC 3: PARAMETTRIC METHODS OF ANALYSIS

3.1 Introduction

3.2 Analysis of Variance (ANOVA)

Analysis of Variance (or ANOVA for short) allows us to compare means across more than
two groups. ANOVA is a test of hypothesis that is appropriate to compare means of a
continuous variable in two or more independent comparison groups. For example, in some
clinical trials there are more than two comparison groups. In a clinical trial to evaluate a new
medication for asthma, investigators might compare an experimental medication to a placebo and
to a standard treatment.

It is normally used to breakdown variations into various components of the different variables. It
enables us to test the significance of the differences among more than two sample means. That is,
testing whether different samples have been drawn from the same population with the same
characteristics. If x 1 = x 2 = x 3, then the three samples have been drawn from a similar
population.

3.3 Basis of using ANOVA over other Techniques

If one is examining the means observed among, say three groups, it might be tempting to
perform three separate group to group comparisons, but this approach is incorrect because each
of these comparisons fails to take into account the total data, and it increases the likelihood of
incorrectly concluding that there are statistically significant differences, since each comparison
adds to the probability of a type I error. ANOVA avoids these problems by asking a more global
question, that is, whether there are significant differences among the groups, without addressing
differences between any two groups in particular. The fundamental strategy of ANOVA is to
systematically examine variability within groups being compared and also examine variability
among the groups being compared.

3.4 Types of ANOVA

3.4.1 One-Way Analysis of Variance


It is suitable for experiments with only one independent variable (factor) with two or more
levels. In this case there is therefore just one explanatory variable. A one-way ANOVA assumes:
 Independence: The value of the dependent variable for one observation is independent of
the value of any other observations.
 Normalcy: The value of the dependent variable is normally distributed
 Variance: The variance is comparable in different experiment groups.
 Continuous: The dependent variable is continuous and can be measured on a scale which
can be subdivided.
For example assume we are interested in the relationship between number of hours worked per
week and health as measured by BMI. Five categories for number of hours worked are: 0-10,
10-25, 25-35, 35-45, 45+. In this case Health as measured by BMI is the dependent variable and
Number of Hours Worked is the independent variable divided into different levels or groups.

For n sample:
Ho: µ1 = µ2 = µ3 …= µn (equal)
H1: (not equal)

Based on the comparison of two different estimates of the variances of all the samples,
determine:
1. Variance among the sample means ( )
2. Variance within samples
If ( )= ( ) then the samples must have been drawn from the same population.

Variance among the sample means is given by:

∑ ( ̅ ̿)

Where ni = sample size


th
̅ = mean of the i sample
K= number of samples/groups

̿ = the combine mean ̿ ∑


K-1 = Degrees of freedom


Variance within the sample is given by:

( )

Where
= variance of the sample i

( ) is the degrees of freedom


If the samples are drawn from the same population then the variances are equal

This ratio of variance among and variance within is called the Fisher’s test statistic.

The Theoretical F-statistic ( ) is found by getting the degrees of freedom for the numerator (k-
1) and of the denominator( ) at different levels of significance.
If Reject Ho
Accept Ho

Example
Agriculture extension officers would like test the effectiveness of 4 different fertilizers on the
yields of tomatoes. They prepare samples of 5 plots for each and applied the fertilizers
accordingly. The yields per plot for each of the fertilizers are given below. At 5% level of
significance, test whether the fertilizers are equally effective.
A B C D
2 3 5 6
3 4 5 8
1 3 5 7
3 5 3 4
1 0 2 10
Total 10 15 20 35
Means 2 3 4 7
Variance 0.8 2.8 1.6 4

∑ ( ̅ ̿)
̅ (̅ ̿) (̅ ̿) (̅ ̿)
2 -2 4 20
3 -1 1 5
4 0 0 0
7 3 9 45
70

Variance within the sample is given by:

( )

Where
= variance of the sample i

( ) is the degrees of freedom

( ) ( ) ( ) ( )

The Theoretical F-statistic ( ) is found by getting the degrees of freedom for the numerator (k-
1) and of the denominator( ) at different levels of significance.

( )
Since we reject the Ho.
That is, the means are different. Hence the samples are drawn from different populations.
The F Distribution Table
The F distribution is a right-skewed distribution used most commonly in Analysis of Variance.
3.4.2 Two-Way Analysis of Variance
Used when there are two or more independent variables. Each of these factors can have multiple
levels. This two-way ANOVA not only measures the independent vs the independent variable,
but if the two factors affect each other. A two-way ANOVA assumes:

 Continuous assumption of the dependent variable: The same as a one-way ANOVA, the
dependent variable should be continuous.
 Independence: Each sample is independent of other samples, with no crossover.
 Variance: The variance in data across the different groups is the same.
 Normalcy: The samples are representative of a normal population.
 Categories: The independent variables should be in separate categories or groups.

Example
The student marks are as presented in the table below and the effect on it determined based on
Noise and gender:
Students Low Noise Medium Noise Loud Noise
Male students 10 7 4
12 9 5
11 8 6
9 12 5

Female students 12 13 6
13 15 6
10 12 4
13 12 4

Dependent variable: Student marks scores


Independent Variable: Gender and Noise

Required:
 Does noise have an effect on the marks of a student scores
 Does gender have an effect on the marks of a student scores
 Does gender effect how a student reacts to noise (Collective effect of gender and noise on
the marks)
Solution
State the null hypothesis:
Ho: There is no significant effect of noise on marks of a student scores
Ho: There is no significant effect of gender on marks of a student scores
Ho: There is no significant effect of gender and noise on marks of a student scores
Students Low Noise Medium Noise Loud Noise Row Total
Male students 10 7 4 R1 = 98
12 9 5
11 8 6
9 12 5

Female students 12 13 6 R2 = 120


13 15 6
10 12 4
13 12 4

Column Total C1 = 90 C2 = 88 C3 = 40

(∑ )
( )

= (10+12+11+9+7+9+8+12+4+5+6+5+12+13+10+13+13+15+12+12+6+6+4+4)^2 / 24
= 1980
( ) ∑

For Variation in Noise

= 200
For Variation in Gender
= 16.33

Residual Sum of Squares ( ) Error

= 37.67

Source Degrees of Sum of Squares Mean of Sum F Ratio


Freedom (d.f) (SS) Squares

Noise ( C-1) = 2 200 48.73

Gender ( R-1) = 1 20 20 9.81

Interaction ( C-1) x ( R-1) = 2 16.33 8.167 3.97

Residual C x R x (n-1) = 18 37.67 2.09

Total ( n-1) = 23

C- No. of columns (noise categories) = 3


R- No of Rows (Student categories) = 2
N – Total number of students = 24
n – No. of students in a group = 4
For Noise:

Since we reject the Ho. Hence noise does have effect on marks obtained

For Gender:
Since we reject the Ho. Hence gender does have effect on marks obtained

For interactive between Noise and Gender:


Since we reject the Ho. Hence different genders behave differently to noise

3.3 Multivariate Analysis of Variance (MANOVA)


Multivariate analysis of variance is an extension of the analysis of variance (ANOVA).
MANOVA) extends the capabilities of analysis of variance (ANOVA) by assessing multiple
dependent variables simultaneously. ANOVA statistically tests the differences between three or
more group means. For example, if you have three different teaching methods and you want to
evaluate the average scores for these groups, you can use ANOVA. However, ANOVA does
have a drawback. It can assess only one dependent variable at a time. This limitation can be an
enormous problem in certain circumstances because it can prevent you from detecting effects
that actually exist. MANOVA provides a solution by testing multiple dependent variables at the
same time hence offering several advantages over ANOVA.

3.3.1 Illustration of MANOVA

For instance, Tumor size is determined by Treatment A, Treatment B and Treatment C.


MANOVA being a multivariate version of ANOVA, it tests whether there is a difference
between two or more independent groups based on two or more dependent variables. For
instance in our example we have two dependent variables, that is, Tumor Size and PSA, both of
them depending on Treatment A, Treatment B and Treatment C. The null hypothesis of the
MANOVA in this case states that the mean tumor size is equal for all treatment groups and that
PSA concentration is also equal across the treatments. According to the null hypothesis, the three
treatments should have the same effect on two variables.

Example

Factor DV1-Tumor Size DV2-PSA


1 4 1
1 2 4
1 1 3
1 4 1
2 5 4
2 6 5
2 5 4
2 4 6
3 6 8
3 8 7
3 8 8
3 6 6
Null Hypothesis
Ho: There is no significant difference for all treatment groups to the mean tumor size
Ho: There is no significant difference for all treatment groups to the mean PSA

Sum Squares of the model ( )=∑ (̅ ̅)

Factor DV1 DV2 ̅ for DV1 ̅ for DV1 ̅ for DV2 ̅ for DV2

1 4 1 2.75 4.917 2.25 4.75


1 2 4 2.75 4.917 2.25 4.75
1 1 3 2.75 4.917 2.25 4.75
1 4 1 2.75 4.917 2.25 4.75
2 5 4 5 4.917 4.75 4.75
2 6 5 5 4.917 4.75 4.75
2 5 4 5 4.917 4.75 4.75
2 4 6 5 4.917 4.75 4.75
3 6 8 7 4.917 7.25 4.75
3 8 7 7 4.917 7.25 4.75
3 8 8 7 4.917 7.25 4.75
3 6 6 7 4.917 7.25 4.75

( )=∑ (̅ ̅)

= 4(2.75-4.917)^2 + 4(5-4.917)^2 + 4(7-4.917)^2 = 18.7836 + 0.0276 + 17.3556 = 36.17


= 4(2.25-4.75)^2 + 4(4.75-4.75)^2 + 4(7.25-4.75)^2 = 25 + 0 + 25 = 50

Sum Squares of the error ( )=∑ ( )

Factor DV1 DV2 ̅ for DV1 (DV1- ̅ )^2 ̅ for DV2 (DV2- ̅ )^2
1 4 1 2.75 1.5625 2.25 1.5625
1 2 4 2.75 0.565 2.25 3.0625
1 1 3 2.75 3.0625 2.25 0.5625
1 4 1 2.75 1.5625 2.25 1.5625
2 5 4 5 0 4.75 0.5625
2 6 5 5 1 4.75 0.0625
2 5 4 5 0 4.75 0.5625
2 4 6 5 1 4.75 1.5625
3 6 8 7 1 7.25 0.5625
3 8 7 7 1 7.25 0.0625
3 8 8 7 1 7.25 0.5625
3 6 6 7 1 7.25 1.5625
12.75 12.25

= 12.75
= 12.25
Calculate covariance to determine how the two variables are related both for the model and the
error. In this case we calculate the cross product.
Cross product for the model:

∑(̅̅̅ ̅ )( ̅ ̅ )

Cross product for the error:

∑( ̅̅̅̅̅ )( ̅̅̅̅̅ )

Factor DV1 DV2 ̅ for ̅ for DV1 ̅ for ̅ for DV2

DV1 DV2
1 4 1 2.75 4.917 2.25 4.75 5.4175
-1.5625
1 2 4 2.75 4.917 2.25 4.75 5.4175
-1.3125
1 1 3 2.75 4.917 2.25 4.75 5.4175
-1.3125
1 4 1 2.75 4.917 2.25 4.75 5.4175
-1.5625
2 5 4 5 4.917 4.75 4.75 0
0
2 6 5 5 4.917 4.75 4.75 0
0.25
2 5 4 5 4.917 4.75 4.75 0
0
2 4 6 5 4.917 4.75 4.75 0
-1.25
3 6 8 7 4.917 7.25 4.75 2.5
-0.75
3 8 7 7 4.917 7.25 4.75 2.5
-0.25
3 8 8 7 4.917 7.25 4.75 2.5
0.75
3 6 6 7 4.917 7.25 4.75 2.5
1.25
42.5 -5.75

Make the cross product matrices for the model (H) and for the error (E).

[ ]

[ ]

Calculate the F-value. Ratio between the model and the error (Ratio between matrix H and
Matrix E)
Since matrices cannot be divided, what we do is that we multiply by the inverse
Inverse of E:

[ ]

Multiply H with inverse of E

[ ] x [ ]=[ ]

Get the Einegen values

You might also like