Understanding The One-Way ANOVA
Understanding The One-Way ANOVA
The One-way Analysis of Variance (ANOVA) is a procedure for testing the hypothesis that K population means are equal, where K > 2. The One-way ANOVA compares the means of the samples or groups in order to make inferences about the population means. The One-way ANOVA is also called a single factor analysis of variance because there is only one independent variable or factor. The independent variable has nominal levels or a few ordered levels. PROBLEMS WITH USING MULTIPLE t TESTS To test whether pairs of sample means differ by more than would be expected due to chance, we might initially conduct a series of t tests on the K sample means however, this approach has a major problem (i.e., inflating the Type I Error rate). The number of t tests needed to compare all possible pairs of means would be K(K 1)/2, where K = number of means. When more than one t test is run, each at a specified level of significance (such as = .05), the probability of making one or more Type I errors in the series of t tests is greater than . The Type I Error Rate is determined as 1 (1 ) c Where
= =
Sir Ronald A. Fisher developed the procedure known as analysis of variance (ANOVA), which allows us to test the hypothesis of equality of K population means while maintaining the Type I error rate at the pre-established (a priori) level for the entire set of comparisons. THE VARIABLES IN THE ONE-WAY ANOVA In an ANOVA, there are two kinds of variables: independent and dependent. The independent variable is controlled or manipulated by the researcher. It is a categorical (discrete) variable used to form the groupings of observations. However, do not confuse the independent variable with the levels of an independent variable. In the One-way ANOVA, only one independent variable is considered, but there are two or more (theoretically any finite number) levels of the independent variable. The independent variable is typically a categorical variable. The independent variable (or factor) divides individuals into two or more groups or levels. The procedure is a One-way ANOVA, since there is only one independent variable. There are two types of independent variables: active and attribute. If the independent variable is an active variable then we manipulate the values of the variable to study its affect on another variable. For example, anxiety level is an active independent variable. An attribute independent variable is a variable where we do not alter the variable during the study. For example, we might want to study the effect of age on weight. We cannot change a persons age, but we can study people of different ages and weights.
The (continuous) dependent variable is defined as the variable that is, or is presumed to be, the result of manipulating the independent variable. In the One-way ANOVA, there is only one dependent variable and hypotheses are formulated about the means of the groups on that dependent variable. The dependent variable differentiates individuals on some quantitative (continuous) dimension. The ANOVA F test (named after Sir Ronald A. Fisher) evaluates whether the group means on the dependent variable differ significantly from each other. That is, an overall analysis-of-variance test is conducted to assess whether means on a dependent variable are significantly different among the groups. MODELS IN THE ONE-WAY ANOVA In an ANOVA, there are two specific types of models that describe how we choose the levels of our independent variable. We can obtain the levels of the treatment (independent) variable in at least two different ways: We could, and most often do, deliberately select them or we could sample them at random. The way in which the levels are derived has important implications for the generalization we might draw from our study. For a one-way analysis of variance, the distinction is not particularly critical, but it can become quite important when working with more complex designs such as the factorial analysis of variance. If the levels of an independent variable (factor) were selected by the researcher because they were of particular interest and/or were all possible levels, it is a fixed-model (fixed-factor or effect). In other words, the levels did not constitute random samples from some larger population of levels. The treatment levels are deliberately selected and will remain constant from one replication to another. Generalization of such a model can be made only to the levels tested. Although most designs used in behavioral science research are fixed, there is another model we can use in single-factor designs. If the levels of an independent variable (factor) are randomly selected from a larger population of levels, that variable is called a randommodel (random-factor or effect). The treatment levels are randomly selected and if we replicated the study we would again choose the levels randomly and would most likely have a whole new set of levels. Results can be generalized to the population levels from which the levels of the independent variable were randomly selected. HYPOTHESES FOR THE ONE-WAY ANOVA The null hypothesis (H0) tested in the One-way ANOVA is that the population means from which the K samples are selected are equal. Or that each of the group means is equal. H0: 1 = 2 = ... = K H0: 1 = 2 = 3
If the independent variable has five levels we would write Where K is the number of levels of the independent variable. For example: If the independent variable has three levels we would write
H0: 1 = 2 = 3 = 4 = 5
The subscripts could be replaced with group indicators. For example: H0: Method1 = Method 2 = Method 3 The alternative hypothesis (Ha) is that at least one group mean significantly differs from the other group means. Or that at least two of the group means are significantly different.
Ha: i k
for some i, k
Alternative Hypothesis:
The assumption of independence is commonly known as the unforgiving assumption (r.e., robustness), which simply means that if the K groups are not independent of each other, one cannot use the one-way analysis of variance. That is, if the groups (categories) are independent of each other, which is typically assessed through an examination of the research design, the assumption of independence has been met. If this assumption is not met, the oneway ANOVA is an inappropriate statistic.
Shapiro-Wilk df 10 10 10
For the above example, where = .001, given that p = .445 for the Secure Group, p = .314 for the Anxious Group, and p = .876 for the Avoidant Group we would conclude that each of the levels of the Independent Variable (Attachment Style) are normally distributed. Therefore, the assumption of normality has been met for this sample. The a priori alpha level is based on sample size where .05 and .01 are commonly used. Tabachnick and Fidell (2007) report that conventional but conservative (.01 and .001) alpha levels are commonly used to evaluate the assumption of normality.
NOTE: Since the Shapiro-Wilks test is rather conservative, most statisticians will agree that the Shapiro-Wilks Test should not be the sole determination of normality. We typically supplement our assessment of normality with an examination of skewness (in excess of +3.29 is a concern), kurtosis (in excess of +3.29 is a concern), an examination of the
THE ONE-WAY ANOVA PAGE 4
histogram graph (non-normal shape is a concern), an examination of box plots (extreme outliers is a concern), and an examination of the Normal Q-Q Plots (a non-linear relationship is a concern). These procedures are typically done during data screening. In examining skewness and kurtosis, we divide the skewness (kurtosis) statistic by its standard error. We want to know if this standard score value significantly departs from normality. Concern arises when the skewness (kurtosis) statistic divided by its standard error is greater than z +3.29 (p < .001, two-tailed test) (Tabachnick & Fidell, 2007). We have several options for handling non-normal data, such as deletion and data transformation (based on the type and degree of violation as well as the randomness of the missing data points). Any adjustment to the data should be justified (i.e., referenced) based on solid resources (e.g., prior research or statistical references). As a first step, data should be thoroughly screened to ensure that any issues are not a factor of missing data or data entry errors. Such errors should be resolved prior to any data analyses using acceptable procedures (see for example Howell, 2007 or Tabachnick & Fidell, 2007).
For Example: For the SCORE variable (shown above), the F value for Levenes test is 1.457 with a Sig. (p) value of .244. Because the Sig. value is greater than our alpha of .05 (p > .05), we retain the null hypothesis (no difference) for the assumption of homogeneity of variance and conclude that there is not a significant difference between the three groups variances. That is, the assumption of homogeneity of variance is met.
Test of Homogeneity of Variances VISUAL Levene Statistic 17.570 df1 1 df2 498 Sig. .000
For Example: For the VISUAL variable (shown above), the F value for Levenes test is 17.570 with a Sig. (p) value of .000 (< .001). Because the Sig. value is less than our alpha of .05 (p < .05), we reject the null hypothesis (no difference) for the assumption of homogeneity of variance and conclude that there is a significant difference between the two groups variances. That is, the assumption of homogeneity of variance is not met. VIOLATION OF THE ASSUMPTIONS OF THE ONE-WAY ANALYSIS OF VARIANCE
If a statistical procedure is little affected by violating an assumption, the procedure is said to be robust with respect to that assumption. The One-way ANOVA is robust with respect to violations of the assumptions, except in the case of unequal variances with unequal sample sizes. That is, the ANOVA can be used when variances are only approximately equal if the number of subjects in each group is equal (where equal can be defined as the larger group size not being more then 1 times the size of the smaller group). ANOVA is also robust if the dependent variable data are even approximately normally distributed. Thus, if the assumption of homogeneity of variance (where the larger group variance is not more than 4 or 5 times that of the smaller group variance), or even more so, the assumption of normality is not fully met, you may still use the One-way ANOVA. Generally, failure to meet these assumptions changes the Type I error rate. Instead of operating at the designated level of significance, the actual Type I error rate may be greater or less than , depending on which assumptions are violated. When the population sampled are not normal, the effect on the Type I error rate is minimal. If the population variances differ, there may be a serious problem when sample sizes are unequal. If the larger variance is associated with the larger sample, the F test will be too conservative. If the smaller variance is associated with the larger sample, the F test will be too liberal. (If the level is .05, conservative means that the actual rate is less than .05.) If the sample sizes are equal, the effect of heterogeneity of variances (i.e., violating the assumption of homogeneity of variance) on the Type I error is minimal. In other words, the effects of violating the assumptions vary somewhat with the specific assumptions violated. If there are extreme violations of these assumptions with respect to normality and homogeneity of variance an alternate test such as the Kruskal-Wallis test should be used instead of the one-way analysis of variance test. The Kruskal-Wallis test is a nonparametric test that is used with an independent groups design employing K groups. It is used as a substitute for the parametric one-way ANOVA, when the assumptions of that test are seriously violated. The Kruskal-Wallis test does not assume population normality nor homogeneity of variance, as does the parametric ANOVA, and only requires ordinal scaling of the dependent variable. It is used when violations of population normality and/or homogeneity of variance are extreme or when interval or ratio scaling are required and not met by the data.
df1 4 4
a. Asymptotically F distributed.
The output from the above table is only valid if the equal variance assumption has been violated. From this example, using the Welch statistic, we find that F(4, 21.814) = 9.037, p < .001. If for example, our a priori alpha level were set at .05, we would conclude that the adjusted F ratio is significant. Since the p value is smaller than we would reject the null hypothesis and we would have permission to proceed and compare the group means. The difference between the adjusted F ratio (devised by Welch and Brown and Forsythe) and the ordinary F ratio is quite similar to that of the adjusted t and ordinary t found in the independent-samples t test. In both cases it is only the denominator (i.e., error term) of the formula that changes.
SUMMARY TABLE FOR THE ONE-WAY ANOVA Summary ANOVA Source Between Sum of Squares SSB Degrees of Freedom K1 Variance Estimate (Mean Square) MSB =
SS B K 1
F Ratio
MS B MSW
Within Total
NK N1
MSW =
SSW N K
Fobt =
MS B MSW
This first component (MSW) reflects the differences observed among subjects exposed to the same treatment. It is assumed that within-groups variation of a similar magnitude exists in each of the groups. This variation within any one group is a function of the specific subjects selected at random for the group or allocated at random to the group. Therefore, we can attribute variation within a group to random sampling fluctuation (i.e., sampling error which is why MSW is also referred to as ERROR). The second component (MSB) has to do with the differences among group means. Even if there were absolutely no treatment effects, it would be unlikely that the sample means for the groups would be identical. A more reasonable expectation is that the group means will differ, even without treatment, simply as a function of the individual difference of the subjects. Thus, we would expect the group means to vary somewhat due to the random selection (assignment) process in the formation of the groups. If, in addition, different treatments that do have an effect on the dependent variable are applied to the different groups, we can expect even larger differences among the group means. Thus, the between-groups variation reflects variation due to the treatment plus variation attributable to the random process by which subjects are selected and assigned to groups. That is, the treatment effect of the independent variable plus error. Which means
Fobt =
When the null hypothesis is true (no difference between the group means or no treatment effect), we would expect F to be equal to 1. Note that the observed mean squares merely estimate the parameters and that these estimates may be larger or smaller than the corresponding parameters. Therefore, it is possible to have an observed F ratio less than 1.00, even though conceptually the ratio cannot be less than 1.00. F increases with the effect of the independent variable. Thus, the larger the F ratio is, the more reasonable it is that the independent variable has had a real effect. If the F ratio is less than 1.00, we dont even need to compare it with the critical value of F (Fcrit). It is obvious that the treatment has not had a significant effect, and we can immediately conclude by retaining H0.
2 =
SS B ( K 1) MSW SS T + MSW
For example, if we calculated 2 = .3928, this means that the independent variable in the ANOVA accounts for approximately 39.28% of the total variance in the dependent variable.
EFFECT SIZE
Effect size, broadly, is any of several measures of association or of the strength of a relation, such as Pearsons r or eta ( ). Effect size is thought of as a measure of practical significance. It is defined as the degree to which a phenomenon exists. Keep in mind that there are several acceptable measures of effect size. The choice should be made based on solid references based on the specific analysis being conducted. So why bother with effect size at all? Any observed difference between two sample means can be found to be statistically significant when the sample sizes are sufficiently large. In such a case, a small difference with little practical importance can be statistically significant. On the other hand, a large difference with apparent practical importance can be nonsignificant when the sample sizes are small. Effect sizes provide another measure of the magnitude of the difference expressed in standard deviation units in the original measurement. Thus, with the test of statistical significance (e.g., the F statistic) and the interpretation of the effect size (ES), the researcher can address issues of both statistical significance and practical importance. When we find significant pairwise differences we will need to calculate an effect size for each of the significant pairs, which will need to be calculated by hand. An examination of the group means will tell us which group performed significantly higher than the other did. For example, using the following formula:
ES =
Xi X MSW
Note that X i X j (which can also be written as X i X k ) is the mean difference of the two groups (pairs) under consideration. This value can be calculated by hand or found in the Mean Difference (I-J) column on the Multiple Comparison table in SPSS. MSW is the Within Groups Mean Square value (a.k.a. Mean Square Within or ERROR), which is found on the ANOVA Summary Table. Suppose that the mean for the Red Group = 16.60 and the mean for the Green Group = 11.10, and the Mean Square Within (MSW) found in the ANOVA table = 16.136, we would find that the ES = 1.37. That is, the mean difference of 5.50 is 1.37 standard deviation units away from the hypothesized mean difference of 0. Recall that H0: 1 - 2 = 0. For Red / Green, we find ES = 16.60 11.10 16.136
2. SET THE CRITERION FOR REJECTING H0 3. TEST THE ASSUMPTIONS FOR THE ONE-WAY ANOVA
Using the obtained statistic value (compared to the critical value) If Fobt < Fcrit If Fobt > Fcrit
7. DECIDE WHETHER TO RETAIN OR REJECT H0 FOR EACH OF THE PAIRWISE COMPARISONS (I.E., CONDUCT POST HOC PROCEDURES)
If the null hypothesis is rejected, use the appropriate post hoc procedure to determine whether unique pairwise comparisons are significant. Choice of post hoc procedures is based on whether the assumption of homogeneity of variance was met (e.g., Tukey HSD) or not (e.g, Games-Howell). Calculate an effect size for each significant pairwise comparison
9. INTERPRET THE RESULTS 10. WRITE A RESULTS SECTION BASED ON THE FINDINGS
n
15 15 15 45
Mean
82.80 88.53 92.67 88.00
SD
9.59 8.73 6.22 9.09
An alpha level of .05 was used for all analyses. The test for homogeneity of variance was not significant [Levene F(2, 42) = 1.46, p > .05] indicating that this assumption underlying the application of ANOVA was met. The one-way ANOVA of standardized test score (see Table 2) revealed a statistically significant main effect [F(2, 42) = 5.34, p < .01] indicating that not all three groups of the teaching methods resulted in the same standardized test score. The 2 = .162 indicated that approximately 16% of the variation in standardized test score is attributable to differences between the three groups of teaching methods. Table 2
SS
736.53 2895.47 3632.00
df
2 42 44
MS
368.27 68.94
F
5.34
p
.009
Post hoc comparisons using Tukey procedures were used to determine which pairs of the three group means differed. These results are given in Table 3 and indicate that students who had received the lecture and hands-on teaching method (M = 92.67) scored significantly higher on the standardized test than did students who had received the lecture only teaching method (M = 82.80). The effect size for this significant pairwise difference was 1.19. Table 3
Tukey Post Hoc Results and Effect Size of Standardized Test Scores by Teaching Method
Mean Differences ( X i X k ) (Effect Size is indicated in parentheses) Group 1. Lecture Only 2. Hands-On Only 3. Lecture and Hands-On
**
2.
3.
p < .01
REFERENCES
Green, S. B., & Salkind, N. J. (2003). Using SPSS for Windows and Macintosh: Analyzing and Understanding Data (3rd ed.). Upper Saddle River, NJ: Prentice Hall. Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied Statistics for the Behavioral Sciences (5th ed.). New York: Houghton Mifflin Company. Howell, D. C. (2002). Statistical Methods for Psychology (5th ed.). Pacific Grove, CA: Duxbury. Morgan, G. A., Leech, N. L., Gloeckner, G. W., & Barrett, K. C. (2004). SPSS for Introductory Statistics: Use and Interpretation (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Pagano, R. R. (2004). Understanding Statistics in the Behavioral Sciences (7th ed.). Belmont, CA: Thomson/Wadsworth. Tabachnick, B. G., & Fidell, L. S. (2007). Using Multivariate Statistics (5th ed.). Needham Heights, MA: Allyn & Bacon.
THE ONE-WAY ANOVA PAGE 13