What Is Analysis of Variance
What Is Analysis of Variance
First,
What Is ANOVA?
ANOVA stands for Analysis of Variance. It is a statistical method used to analyze the
differences between the means of two or more groups or treatments. It is often
used to determine whether there are any statistically significant differences
between the means of different groups.
An ANOVA test is a way to find out if survey or experiment results are significant. In other
words, they help you to figure out if you need to reject the null hypothesis or accept the
alternate hypothesis.
1. One-Way ANOVA: This is used when you have one independent variable with
more than two levels or groups. It tests whether there are significant
differences in the means of these groups.
This is the simplest type of ANOVA, which involves one independent variable.
For example, We took three different groups of ten randomly selected students (all of the
same age) from three different classrooms. Each classroom was provided with a different
environment for students to study. Classroom A had constant music being played in the
background, classroom B had variable music being played, and classroom C was a regular
class with no music playing. After one month, we collected their test scores. The test scores
that we obtained were as follows:
Looking at the 2nd table, we might assume that the mean score of students from Group A is
definitely greater than the other two groups, so the treatment must be helpful.
A one-way ANOVA tells us that at least two groups are different from each other. But it
won’t tell us which groups are different. If our test returns a significant f-statistic, we may
need to run a post-hoc test to tell us exactly which groups differ in means.
Now, Let us perform one-way ANOVA in Microsoft Excel along with a post-hoc test.
Step 1: Input your data into columns or rows in Excel. For example, if three groups of
students for music treatment are being tested, spread the data into three columns.
Step 2: Click the “Data” tab and then click “Data Analysis.” If you don’t see Data
Analysis, load the ‘Data Analysis Toolpak’ add-in.
Step 3: Click “ANOVA Single Factor” and then click “OK.”
Step 4: Type an input range into the Input Range box. For example, if the data is in cells A1
to C10, type “A1:C10” into the box. Check the “Labels in the first row” if we have column
headers, and select the Rows radio button if the data is in rows.
Step 5: Select an output range. For example, click the “New Worksheet” radio button.
Step 6: Choose an alpha level. For most hypothesis tests, 0.05 is standard.
Step 7: Click “OK.” The results from ANOVA will appear in the worksheet.
The results for our example look like this:
Here, we can see that the F-value is greater than the F-critical value for the alpha level
selected (0.05). Therefore, we have evidence to reject the null hypothesis and say that at least
one of the three samples have significantly different means and thus belongs to an entirely
different population.
Another measure for ANOVA is the p-value. If the p-value is less than the alpha level
selected (which it is, in our case), we reject the Null Hypothesis.
There are methods for finding out which samples represent two different populations.
1. Bonferroni approach
2. Least significant difference test
3. Tukey’s HSD
We won’t be covering all of these here in this article, but I suggest you go through them.
Now to check which samples had different means, we will take the Bonferroni approach and
perform the post hoc test in Excel.
Step 8: Again, click on “Data Analysis” in the “Data” tab and select “t-Test: Two-Sample
Assuming Equal Variances,” and click “OK.”
Step 9: Input the range of Class A column in Variable 1 Range box and range of Class B
column in Variable 2 Range box. Check the “Labels” if you have column headers in the first
row.
Step 10: Select an output range. For example, click the “New Worksheet” radio button.
Step 11: Perform the same steps (Step 8 to step 10) for Columns of Class B – Class C and
Class A – Class C.
Here, we can see that the p-value of (A vs B) and (A vs C) is less than the alpha level
selected (alpha = 0.05). This means that groups A and B & groups A and C have less than a
5% chance of belonging to the same population. Whereas for (B vs C), it is much greater
than the significance level. This means that B and C belong to the same population. So, it is
clear that A (constant music group) belongs to an entirely different population. Or we can
say that the constant music had a significant effect on students’ performance.
Another effect size measure for one-way ANOVA is called Eta squared. It works in the
same way as R2 for t-tests. It is used to calculate how much proportion of the variability
between the samples is due to the between-group difference. Eta squared is calculated as:
Eta squared
Hence 60% of the difference between the scores is because of the approach that was used.
Rest 40% is unknown. Hence, the Eta square helps us conclude whether the independent
variable really impacts the dependent variable or whether the difference is due to chance or
any other factor.
There are commonly two types of ANOVA tests for univariate analysis – One-Way ANOVA
and Two-Way ANOVA. One-way ANOVA is used when we are interested in studying the
effect of one independent variable (IDV)/factor on a population. In contrast, Two-way
ANOVA is used for studying the effects of two factors on a population simultaneously.
Two-Way ANOVA
Using one-way ANOVA, the treatment was conducted on students of the same age. What if
the treatment was to affect different age groups of students in different ways?
For such cases, when the outcome or dependent variable (in our case, the test scores) is
affected by two independent variables/factors, we use a slightly modified technique called
two-way ANOVA.
In the one-way ANOVA test, we found that the group subjected to ‘variable music’ and ‘no
music at all’ performed more or less equally. It means that the variable music treatment did
not have any significant effect on the students.
So, while performing two-way ANOVA, we will not consider the “variable music” treatment
for simplicity of calculation. Rather a new factor, age, will be introduced to find out how the
treatment performs when applied to students of different age groups. This time our dataset
looks like this:
Here, there are two factors – class and age groups with two and three levels, respectively. So
we now have six different groups of students based on different permutations of class groups
and age groups, and each different group has a sample size of 5 students.
Two-way ANOVA tells us about the main effect and the interaction effect. The main effect is
similar to a one-way ANOVA where the effect of music and age would be measured
separately. In comparison, the interaction effect is the one where both music and age are
considered at the same time.
That’s why a two-way ANOVA can have up to three hypotheses, which are as follows:
Two null hypotheses will be tested if we have placed only one observation in each cell. For
this example, those hypotheses will be:
H1: All the music treatment groups have an equal mean score.
H2: All the age groups have an equal mean score.
The table shown is known as a contingency table. Here, it represents the total of the
samples based only on factor 1 and represents the total of samples based only on
factor 2. We will see in some time that these two are responsible for the main effect
produced. Also, a term is introduced representing the subtotal of factor 1 and factor 2.
This term will be responsible for the interaction effect produced when both the factors are
considered simultaneously. And we are already familiar with the , which is the
sum of all the observations (test scores), irrespective of the factors.
We have calculated all the means – sound class mean, age group mean, and mean of every
group combination in the above table.
Now, calculate the sum of squares (SS) and degrees of freedom (df) for sound class, age
group, and interaction between factor and levels.
We already know how to calculate SS (within)/df (within) in our one-way ANOVA section,
but in two-way ANOVA, the formula is different. Let’s look at the calculation of two-way
ANOVA:
In two-way ANOVA, we also calculate SSinteraction and dfinteraction, which defines the combined
effect of the two factors.
Since we have more than one source of variation (main effects and interaction effects), it is
obvious that we will have more than one F-statistic also.
Using these variances, we compute the value of F-statistic for the main and interaction effect.
So, the values of f-statistic are,
F1 = 12.16
F2 = 15.98
F12 = 0.36
We can see the critical values from the table
Fcrit1 = 4.25
Fcrit2 = 3.40
Fcrit12 = 3.40
Suppose for a particular effect, its F value is greater than its respective F-critical value
(calculated using the F-Table). In that case, we reject the null hypothesis for that particular
effect.
Step 1: Click the “Data” tab and then click “Data Analysis.” If you don’t see the Data
analysis option, install the Data Analysis Toolpak.
Step 2: Click “ANOVA two factor with replication” and then click “OK.” The two-way
ANOVA window will open.
Step 3: Type an Input Range into the Input Range box. For example, if your data is in cells
A1 to A25, type “A1:A25” into the Input Range box. Ensure you include all of your data,
including headers and group names.
Step 4: Type a number in the “Rows per sample” box. Rows per sample is actually a bit
misleading. What this is asking you is how many individuals are in each group. For example,
if you have 5 individuals in each age group, you would type “5” into the Rows per Sample
box.
Step 5: Select an Output Range. For example, click the “new worksheet” radio button to
display the data in a new worksheet.
Step 6: Select an alpha level. In most cases, an alpha level of 0.05 (5 percent) works for most
tests.
Step 7: Click “OK” to run the two-way ANOVA. The data will be returned in your specified
output range.
Step 8: Read the results. To figure out if you are going to reject the null hypothesis or not,
you’ll basically be looking at two factors:
1. If the F-value (F)is larger than the f critical value (F crit)
2. If the p-value is smaller than your chosen alpha level.
The results for two-way ANOVA test on our example look like this:
As you can see in the highlighted cells in the image, the F-value for sample and column, i.e.,
factor 1 (music) and factor 2 (age), respectively, are higher than their F-critical values. This
means that the factors significantly affect the students’ results, and thus we can reject the null
hypothesis for the factors.
Also, the F-value for interaction effect is quite less than its F-critical value, so we can
conclude that music and age did not have any combined effect on the population.
REFERENCE:
https://www.investopedia.com/terms/a/anova.asp#citation-2
https://www.analyticsvidhya.com/blog/2018/01/anova-analysis-of-variance/