0% found this document useful (0 votes)
49 views17 pages

Last Lecture 1

Uploaded by

dinaelkordy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views17 pages

Last Lecture 1

Uploaded by

dinaelkordy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

‫مادة تصميم‬

‫وتحليل‬
‫التجارب‬
‫د‪ .‬سوزان عبد الرحمن‬
‫مدرس بكلية الدراسات العليا للبحوث اإلحصائية ‪Hypothesis Testing:‬‬
‫‪Comparing‬‬

‫‪2‬‬
Means Between More than Two
Populations with One Test: Analysis
of Variance (ANOVA)
(Hypothesis Testing)
Comparing Means Between More Than Two
Populations: Analysis of Variance
(ANOVA)
▪ One-way analysis of variance (ANOVA) is a
statistical method for testing for differences
in the means of three or more groups.

3
▪ One-way ANOVA can only be used when investigating a single factor (independent variable )and a
single dependent variable. When comparing the means of three or more groups.
▪ ANOVA can tell us if at least one pair of means is significantly different, but it can’t tell us which
pair.
▪ ANOVA requires that the dependent variable be normally distributed in each of the groups and
that the variability within groups is similar across groups.
null hypothesis (H0) that three or more

One-Way population means are equal vs. the


alternative hypothesis (Ha) that at least
one of the K population mean is
ANOVA
𝝁𝒊 is the population mean of the I th
different.
This test is also known as:
group (i=1,2,3,,,,k)
• One-Factor ANOVA
• One-Way Analysis of Variance
• Between Subjects ANOVA

4
▪ The variables used in this test are known as:
• Dependent variable
• Independent variable (also known as the grouping variable, or factor)
• This variable divides cases into two or more mutually exclusive levels, or groups.
▪ Both the One-Way ANOVA and the Independent Samples tTest can compare the means for two groups. However, only the One-Way
ANOVA can compare the means across three or more groups.
▪ Note: If the grouping variable has only two groups, then the results of a one-way ANOVA and the independent samples t test will be
equivalent. In fact, if you run both an independent samples t test and a one-way ANOVA in this situation, you should be able to
confirm that t2=F.

Assumptions of ANOVA
1. Dependent variable that is continuous (i.e., interval or ratio level)
2. Independent variable that is categorical (i.e., two or more groups- nominal or ordinal)
3. Cases that have values on both the dependent and independent variables
4. Independent samples/groups (i.e., independence of observations). There is no relationship between the subjects in each sample
and across samples (Random sample of data from the population)
5
5. Normal distribution (approximately) of the dependent variable for each group (i.e., for each level of the factor). Non-normal
population distributions, especially those that are thick-tailed or heavily skewed, considerably reduce the power of the test
6. Homogeneity of variances (i.e., variances approximately equal across groups) 7. No outliers

Important Note
▪Note: When the normality, homogeneity of variances, or outliers assumptions for OneWay ANOVA
are not met, you may want to run the nonparametric Kruskal-Wallis test instead.
• Balanced designs (i.e., same number of subjects in each group) are ideal; extremely unbalanced designs
increase the possibility that violating any of the requirements/assumptions will threaten the validity of
the ANOVA Ftest
• The One-Way ANOVA indicates whether the model is significant overall—i.e., whether there
areanysignificant differences in the means between anyof the groups. However, it does not indicate
whichmean is different. Determining which specific pairs of means are significantly different requires
t.tests.

6
Example

7
Analysis of Variance (ANOVA)
𝑛𝑖 = Number of observations for treatment 𝑖 (in our example, line 𝑖 )
𝑁= Total number of observations

𝑌𝐼𝐽 = The jth observation on the ith treatment (in our example, line 𝑖 )
𝑌ത = The sample mean for the ith treatment (in our example, line 𝑖 )
𝑖

𝑦ധ = The mean of all observations (grand mean)


Sum of Squares
The sum of squares quantifies variability in a data set by focusing on the
difference between each data point and the mean of all data points in that data set (SST)
the overall variability (SST) can be divided into two parts: the variability due to the model or the factor levels

8
(SSF), and the variability due to random error(SSE).

Analysis of Variance (ANOVA)


▪ Between group variation (which corresponds to variation due to the factor or treatment) ▪ Within-
group variation (which corresponds to variation due to chance or error).
▪ So our sum of squares formula is essentially calculating the sum of variation due to differences
between the groups (the treatment effect) and variation due to differences within each group
(unexplained differences due to chance).
let’s calculate Sum of squares calculation

9
line

10
Analysis of Variance (ANOVA)
Degrees of Freedom (DF)
▪ Associated with each sum of squares is a quantity called degrees of freedom (DF).
▪ The degrees of freedom indicates the number of independent pieces of information used to
calculate each sum of squares.
▪ For a one-factor design with a factor at k levels (five lines in our example) and a total of N
observations (five jars per line for a total of 25), the degrees of freedom are as follows:

11
Analysis of Variance (ANOVA)
Mean Squares (MS) and F Ratio
▪ We divide each sum of squares by the corresponding degrees of freedom to obtain mean squares. (MS (Factor) and MS
(Error) )
▪ When the null hypothesis is true (i.e. the means are equal), MS (Factor) and MS (Error) would be about the same size. Their
ratio, or the F ratio, would be close to one.
▪ The test statistic for a One-Way ANOVA is denoted as F. For an independent variable with k groups, the F statistic evaluates
whether the group means are significantly different.
▪ When the null hypothesis is not true then the MS (Factor) will be larger than MS (Error) and their ratio greater than 1.
▪ In our example, the computed F ratio, 6.90, presents significant evidence against the null hypothesis that the means are
equal.
Calculating mean squares and F ratio

Analysis of Variance (ANOVA)


▪ The ratio of MS(factor) to MS(error)—the F ratio—has an F distribution.

12
▪ The F distribution is the distribution of F values that we'd expect to observe when the null
hypothesis is true (i.e. the means are equal).

▪ F distributions have different shapes based on two parameters. For an


ANOVA test, the degrees of freedom associated with the MS(factor) and the
degrees of freedom associated with the MS(error).

▪ If your computed F ratio exceeds the expected value from the corresponding
F distribution, then, assuming a sufficiently small p-value, you would reject
the null hypothesis that the means are equal.

▪ The p-value in this case is the probability of observing a value greater than
the F ratio from the F distribution when in fact the null hypothesis is true.

Analysis of Variance (ANOVA)


An ANOVA table includes:
1. Source: the sources of variation including the factor being examined (in our case, lines),
error and total.
13
2. DF: degrees of freedom for each source of variation.
3. Sum of Squares: sum of squares (SS) for each source of variation along with the total
from all sources.
4. Mean Square: sum of squares divided by its associated degrees of freedom.
5. F Ratio: the mean square of the factor (line) divided by the mean square of the error.

line

6. Prob > F: the p-value.

14
Let's apply on SPSS
Example

15
In the same dataset, the variable Sprint is the respondent's time (in seconds) to sprint a given distance,

and Smoking is an indicator of whether or not the respondent smokes (0 = Nonsmoker, 1 = Past smoker, 2

= Current smoker). Test if there is a statistically significant difference in sprint time to smoking status.

Note Sprint time will serve as the dependent variable, and smoking status will act as the independent variable.

output
ANOVA
respondent's time (in seconds) to sprint a given distance
Sum of Mean
Squares Df Square F Sig.
Between Groups 22.710 2 11.355 9.049 .000
Within Groups 513.239 409 1.255

16
Total 535.949 411

Comment:
This test is comparing means between more than two populations with one test.

1. H0 : 𝝁 non smoker = 𝝁 past smoker = 𝝁 current smoker ( we assume that H0 is true )


Suppose that

HA : 𝝁 non smoker ≠ 𝝁 past smoker ≠ 𝝁 current smoker


a) The sum of squares ( total ) is 535.9 which is the summation of sum of squares SSFactor (between groups of
smokers ) and the sum of squares SSError ( within groups of smokers )
b) The degree of freedom between the three groups of smokers is 2 and within the group ( in the same group such
as nonsmokers ) is 409
c) The degree of freedom for the total of squares is 411 or the summation of df (between groups) and df (within
groups).
d) The mean squares between groups (MS Factor ) is 11.355 and the mean square within groups ( MS Error ) is 1.255
*Mean Square: sum of squares divided by its associated degrees of freedom. e) The F ratio is 9
*F Ratio: the mean square of the factor (line) divided by the mean square of the error.
f) The p- value is approximately zero which is less than 0.05 so we reject the null (𝝁𝒅𝒊𝒇𝒇 = 0) in favor of the
alternative (𝝁𝒅𝒊𝒇𝒇 ≠ 0) in other words
When the null hypothesis is not true then the MS (Factor) will be larger than MS (Error) and their ratio (F)
will be greater than 1.
17

You might also like