Last Lecture 1
Last Lecture 1
وتحليل
التجارب
د .سوزان عبد الرحمن
مدرس بكلية الدراسات العليا للبحوث اإلحصائية Hypothesis Testing:
Comparing
2
Means Between More than Two
Populations with One Test: Analysis
of Variance (ANOVA)
(Hypothesis Testing)
Comparing Means Between More Than Two
Populations: Analysis of Variance
(ANOVA)
▪ One-way analysis of variance (ANOVA) is a
statistical method for testing for differences
in the means of three or more groups.
3
▪ One-way ANOVA can only be used when investigating a single factor (independent variable )and a
single dependent variable. When comparing the means of three or more groups.
▪ ANOVA can tell us if at least one pair of means is significantly different, but it can’t tell us which
pair.
▪ ANOVA requires that the dependent variable be normally distributed in each of the groups and
that the variability within groups is similar across groups.
null hypothesis (H0) that three or more
4
▪ The variables used in this test are known as:
• Dependent variable
• Independent variable (also known as the grouping variable, or factor)
• This variable divides cases into two or more mutually exclusive levels, or groups.
▪ Both the One-Way ANOVA and the Independent Samples tTest can compare the means for two groups. However, only the One-Way
ANOVA can compare the means across three or more groups.
▪ Note: If the grouping variable has only two groups, then the results of a one-way ANOVA and the independent samples t test will be
equivalent. In fact, if you run both an independent samples t test and a one-way ANOVA in this situation, you should be able to
confirm that t2=F.
Assumptions of ANOVA
1. Dependent variable that is continuous (i.e., interval or ratio level)
2. Independent variable that is categorical (i.e., two or more groups- nominal or ordinal)
3. Cases that have values on both the dependent and independent variables
4. Independent samples/groups (i.e., independence of observations). There is no relationship between the subjects in each sample
and across samples (Random sample of data from the population)
5
5. Normal distribution (approximately) of the dependent variable for each group (i.e., for each level of the factor). Non-normal
population distributions, especially those that are thick-tailed or heavily skewed, considerably reduce the power of the test
6. Homogeneity of variances (i.e., variances approximately equal across groups) 7. No outliers
Important Note
▪Note: When the normality, homogeneity of variances, or outliers assumptions for OneWay ANOVA
are not met, you may want to run the nonparametric Kruskal-Wallis test instead.
• Balanced designs (i.e., same number of subjects in each group) are ideal; extremely unbalanced designs
increase the possibility that violating any of the requirements/assumptions will threaten the validity of
the ANOVA Ftest
• The One-Way ANOVA indicates whether the model is significant overall—i.e., whether there
areanysignificant differences in the means between anyof the groups. However, it does not indicate
whichmean is different. Determining which specific pairs of means are significantly different requires
t.tests.
6
Example
7
Analysis of Variance (ANOVA)
𝑛𝑖 = Number of observations for treatment 𝑖 (in our example, line 𝑖 )
𝑁= Total number of observations
𝑌𝐼𝐽 = The jth observation on the ith treatment (in our example, line 𝑖 )
𝑌ത = The sample mean for the ith treatment (in our example, line 𝑖 )
𝑖
8
(SSF), and the variability due to random error(SSE).
9
line
10
Analysis of Variance (ANOVA)
Degrees of Freedom (DF)
▪ Associated with each sum of squares is a quantity called degrees of freedom (DF).
▪ The degrees of freedom indicates the number of independent pieces of information used to
calculate each sum of squares.
▪ For a one-factor design with a factor at k levels (five lines in our example) and a total of N
observations (five jars per line for a total of 25), the degrees of freedom are as follows:
11
Analysis of Variance (ANOVA)
Mean Squares (MS) and F Ratio
▪ We divide each sum of squares by the corresponding degrees of freedom to obtain mean squares. (MS (Factor) and MS
(Error) )
▪ When the null hypothesis is true (i.e. the means are equal), MS (Factor) and MS (Error) would be about the same size. Their
ratio, or the F ratio, would be close to one.
▪ The test statistic for a One-Way ANOVA is denoted as F. For an independent variable with k groups, the F statistic evaluates
whether the group means are significantly different.
▪ When the null hypothesis is not true then the MS (Factor) will be larger than MS (Error) and their ratio greater than 1.
▪ In our example, the computed F ratio, 6.90, presents significant evidence against the null hypothesis that the means are
equal.
Calculating mean squares and F ratio
12
▪ The F distribution is the distribution of F values that we'd expect to observe when the null
hypothesis is true (i.e. the means are equal).
▪ If your computed F ratio exceeds the expected value from the corresponding
F distribution, then, assuming a sufficiently small p-value, you would reject
the null hypothesis that the means are equal.
▪ The p-value in this case is the probability of observing a value greater than
the F ratio from the F distribution when in fact the null hypothesis is true.
line
14
Let's apply on SPSS
Example
15
In the same dataset, the variable Sprint is the respondent's time (in seconds) to sprint a given distance,
and Smoking is an indicator of whether or not the respondent smokes (0 = Nonsmoker, 1 = Past smoker, 2
= Current smoker). Test if there is a statistically significant difference in sprint time to smoking status.
Note Sprint time will serve as the dependent variable, and smoking status will act as the independent variable.
output
ANOVA
respondent's time (in seconds) to sprint a given distance
Sum of Mean
Squares Df Square F Sig.
Between Groups 22.710 2 11.355 9.049 .000
Within Groups 513.239 409 1.255
16
Total 535.949 411
Comment:
This test is comparing means between more than two populations with one test.