Analysis of Variance

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

Analysis of Variance

(ANOVA)
Instructor: Weikang Kao, Ph.D.
When to use ANOVA?

 While simple tests (like t-tests) are useful (and the best way to go in many
situations), most systems we want to examine in the real world have more than
two levels and likely contain more than two factors.
 ANOVA is one type of statistical model that can handle: Predictors with multiple
levels, multiple predictors, interactions between predictors, and as will see
towards later in this course it can deal with designs containing random and nested
factors (repeated measurement).
 Will review the basics of ANOVA and how to use it for data containing fixed
categorical effects with 3 or more levels (if just two levels in most cases a t-test
will get the job done easier).
Preview of ANOVA

1. An ANOVA test assesses a statistical difference of some quantitative variables


between two or more independent group means.
2. Perform a (or more) hypotheses
3. Critical value (F) is from the F-table
4. Observed test statistic
5. Make a conclusion based on the finding.
** It should be noted that the F-test does NOT specify which group(s) is/are
different from the others.
**Question: why not do multiple t-tests?
Introduction of ANOVA

 Hypothesis of ANOVA:
 H0:
 H1:
 Degrees of freedom:
 p = number of groups
 n = observations in each groups
 dfT: df degrees of freedom, total df: n*p - 1
 dfB: df between groups, numerator df: p - 1
 dfW: df within groups, denominator df: (n1 -1) + (n2-1) + …. (nX-1)
 Critical value, degrees of freedom and the F-table
 http://www.statisticslectures.com/tables/ftable/
Introduction of ANOVA

 Test Statistic: F
 ANOVA = Analysis of variance, which means that the __________ of the data will be investigated.
 Recall: what is variance?
 Sum of the squares (SS) / df
 Sum of the squares in ANOVA:
 Sum of the squares total: A measure of the variability between each of the observations and the
grand mean of all observations.
 Sum of the squares between: A measure of the variability between each of the group mean and the
grand mean of all observations.
 Sum of the squares within: A measure of the variability between each of the observation and its
respective group mean.
Introduction of ANOVA

So…..
 Sum of the squares total = Sum of the squares between + Sum of the squares within.
 SST = SSB + SSW
 SSB: the effect which can be explained by our treatment/ observation.
 SSW: the effect which can NOT be explained by our treatment/ observation. (error term)
 Mean square of the F-test:
 What is mean square? An estimate of the average variation in between group and within group.
 It helps to formulate the F-test:
 F = variability (between) / variability (within)
Introduction of ANOVA

F-Statistic
 When checking the F-value, we try to examine the ratio between MSB and MSW,
which is the variability explained by the difference between the groups to the
variability unexplained by the groups.
 Remember we just mentioned about treatment and error term, how do those
concepts influence the F-value and what is the sequence?
 If variability between is greater than the variability within, that means the statistic
will be greater than one: what does that mean?
 If variability between is smaller than the variability within, the result will be less
than one: what does that mean?
Introduction of ANOVA

After the previous four steps, now we can try to make some conclusions based on
them.
Two different conclusions:
 Statistical Conclusion: We check the observed (F) to see if it falls in the rejection
area, and make the conclusion. The conclusion can be either “reject the null
hypothesis” or “fail to reject the null hypothesis”.
 Research conclusion: based on our original research question, we make
conclusion which will be stated in terms relating to it. For example, when having
more sugar, individual tends to gain weights.
One-way ANOVA

With only one “Predictor” and more than two groups


Some important steps which should be performed.
1. A statement of the research/ study purpose
2. The type of analysis conducted, i.e. D’Agostino test, Scatterplot of residuals, Bartlett test.
etc.
3. Descriptive statistics: basic information of the data, i.e. age and gender of the participants.
4. The ANOVA test
5. Post-hoc analysis
6. Effect size
7. Conclusions
ANOVA: Assumptions
ANOVA Assumptions

There are several assumptions underlying ANOVA that should be considered (some
violations are less serious than others).
1. “Normal” data – mainly we are concerned with skew and kurtosis.
2. Independence of observations – we assume that each observation is independent
from other observations (e.g., no repeated measurement). If this assumption is
violated we have a few options but a standard ANOVA is not one of them (no
check we just have to know our data).
3. Equality of variances – our factor levels are associated with the same level of
variance in our dependent variable (most important if we have unequal
observations in each level).
Normality

 Observations are drawn from a normally distributed population.


There are some different tests help us to check the normality of the sample.
 Second, we also have to make sure that there is no OUTLIERS in the data.
standardized residuals are within in the range of positive 2.5 to negative 2.5.
 What if normality assumption is violated?
IT’s okay, the F is only slightly affected.
The F-test is mostly very robust to this assumption if the following condition is met:
the data is identically distributed and the sizes of each group are EQUAL.
If our n is small, then the power of the test might be influenced.
 The log correction
Independence of observations

 Observations are randomly sampled from the population , or subjects are


randomly assigned to treatment groups, so that all the observations in the within
groups and the between groups are independent.
 Checking:
 Make sure that the data are randomly selected from a population or are randomly
assigned to the groups.
 Scatterplot of residuals
 It’s very important to know that the F-test is NOT robust to the violation, both
type I and type II errors are affected.
Variance Equality

 The observations have equal variances across groups.


 This assumption is also referred as the homogeneity of variance.
 Levene’s test, Bartlett’s test and scatterplot can be used to check the variance
equality.
 What if this assumption is not met?
 When first, sample sizes are equal, second, the population is normally distributed, and
third, the ratio of the largest variance to the smallest variance is less than three.
 If all those conditions are satisfied, we have the confidence that the F-test is
robust.
Post-Hoc and Effect Size
Post-Hoc Test

Now we know there is a significant difference among the targeting sample, what’s
next?
We would like to make comparisons between each groups so to understand what makes
the difference.
 The advantages of Post Hoc test:
 Controlling type I error: think of if we do five t-tests, what will happened?
 Single-step post hoc tests:
 We apply the single-step post hoc tests for all pairwise comparisons when equal
variances are assumed.
 What if the variances of groups are not equal?
Post-Hoc Test

Single-step post hoc tests:


 Bonferroni
 Tests all pairwise contrast, conservative, lack of power.
 Tukey’s Test
 The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant
Difference,” a number that represents the distance between groups, to compare every mean with every other
mean (Statistics How To, 2019)
 Kurskal-Wallis
 one-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same
distribution.
 Dunnett
 Like Tukey’s this post-hoc test is used to compare means. Unlike Tukey’s, it compares every mean (Statistics
How To, 2019).
Effect Size

As we discussed p-values tell us there is a difference but don’t tell us the size of the
difference (i.e., the effect).
Many researchers focus on effect sizes since large effects are more likely to be
robust (replicable).
 The effect size is a standardized difference in the means across group.
 What is the importance of effect size?
A measure of the PRACTICE SIGNIFICANCE of the effect of the treatment.
 These are large effects: <=.1 is small, .25 is medium, .4 is large (Cohen, 1988)
Application
A working example

To make things easier to follow will use a real data set where individuals provided
valuations of an items worth either as buyers, sellers, or choosers (someone who
decides the price that would make them indifferent between that amount or and an
item).
In this example our IV is perspective and it has three levels making our analysis a
One-Way ANOVA: Buyer (1), Chooser (0), Seller (2)
We can see that we want to test whether there is variation between valuations for our
three levels of perspective, and we might have some predictions about which levels
will differ (e.g., sellers > buyers, choosers = sellers, etc.)
A working example

An ANOVA tests the following hypothesis in our example:


H0: Seller = Chooser = Buyer
H1: at least one pair of levels differ from one another.
Assumption: Normality

library("moments")
First look at the overall distribution:
plot(density(ANOVAExample$Valuation))
qqnorm(data$Valuation)
D'Agostino skewness test:
agostino.test(data$Valuation)
Normality test (Shapiro-Wilks):
shapiro.test(ANOVAExample$Valuation)
Assumption: Independence of observations

Observations are randomly sampled from the population , or subjects are randomly
assigned to treatment groups, so that all the observations in the within groups and the
between groups are independent.
Scatterplot of residuals
eruption.lm = lm(Variable ~ Group, data=data)
eruption.res = resid(eruption.lm)
plot(data$Variable, eruption.res, ylab="Residuals", xlab=“group", main=“Title")
abline(0, 0)
It’s very important to know that the F-test is NOT robust to the violation, both type I and
type II errors are affected.
Assumption: Variance Equality

Equality of variance (Bartlett test):


bartlett.test(ANOVAExample$V2, ANOVAExample$Condition)
A general rule of thumb is that if we have balanced data (the # of observations in
each level is equal) then small violations (largest variance/smallest variance < 3) we
are ok with ANOVA.
tapply(ANOVAExample$V2, ANOVAExample$Condition, var)
Do the Analysis

Perform the model:


summary(aov(Valuation ~ Condition, data = data))
We can also create a model:
model <- aov(Valuation ~ Condition, data)
Doing so we find a significant effect of Condition, F(2, 237) = 19.61, p < .001. But
where is the difference?
Post-Hoc Tests

Bonferroni:
pairwise.t.test(ANOVAExample $IV, ANOVAExample $Group, paired = FALSE,
p.adjust.method = “method")
**method can be "none", "bonferroni", "holm", "hochberg", "hommel", "BH", or
"BY“
Kurskal-Wallis:
kruskalmc(V2~factor(Condition), data = ANOVAExample)
Tukey’s Test:
TukeyHSD(model)
Effect Size

The effect size is a standardized difference in the means across group.


install two packages: "pastecs" and "compute.es“
First get the relevant stats for each factor level (only relevant output pasted below):
by(ANOVAExample$V2, ANOVAExample$Condition, stat.desc)
Condition 0 Condition 1 Condition 2
n M SD n M SD n M SD
80 2.03 .52 80 1.71 .56 80 2.19 .43
Use these values to compute a few standard measures of effect size for any pair of interest (will compare
1 and 2 ~ Buyers vs Sellers).
mes(2.19, 1.71, .43, .56, 80, 80)
Cohen’s d = .96, Hedge’s g = .96, r = .43
Results

What do we find?
1. Research Question and hypothesis?
2. Test of the assumptions?
3. Result of the ANOVA test?
4. Result of the post hoc tests?
5. Effect size?
6. Conclusion?
Summary write up

Observations from the study were analyzed by conducting a one-way


analysis of variance using R version 3.6.1. First, all assumptions are met and there is
no adjustment made. Results suggest that the valuation of an item’s worth was
affected by the status of (as buyers, sellers, or choosers) of the participants (F(2,
237) = 19.61, p < .001)
Continue the discussion with specifically which groups differed, a Tukey’s
hoc test was established. The result suggested that there is a significant difference
between buyers and choosers (p < .001) and buyers and sellers (p < .001), in terms
of the valuation of an item’s worth. The effect was large, Cohen’ D = .96.
In class Practice
Practice

 We are now dealing with a dataset (an XXX bank costumer info). The bank wants
to know how different jobs influence individual’s annual income, purchasing
ability and how many credit cards do they have?
Download the Bankinfo data.
 In the Group variable, 0 represents students, 1 represents teachers, and 2
represents taxi driver.
 What are the research questions?
 What are the hypothesis?
Practice

 First, we use Annual income as our first IV.


 The research question is “There is a significant difference among the following
three jobs: students, teachers and taxi drivers, in terms of their annul income”
 Hypothesis:
 H0: there is no difference in terms of annual income among the three jobs.
 H1: there is at least one group differs than the others in terms of annual income
among the three jobs.
 What if we want to check their purchasing ability and/or numbers of credit cards?
Practice: Results

 check the ANOVA assumptions:


 Normality: Bad
 Independence of observations: Not good
 Variance Equality: Good

 Is there any significant result? NO.


Practice: Summary

Observations from the study were analyzed by conducting a one-way


analysis of variance using R version 3.6.1. First, the F-test is robust when the sample
size is large (n = 292) and each groups have the same number even the assumption
of normality is not met. Results suggests that there is no statistical significant among
different jobs’ annual income. (F(2, 290) = .905, p = .406)

 Do we need any post hoc tests?


Practice

 Now, we use number of credit cards as our IV.


 The research question is “There is a significant difference among the following
three jobs: students, teachers ,and taxi drivers, in terms of how many credit cards
they hold”
 Hypothesis:
 H0: there is no difference in terms of numbers of credit card among all of the three
jobs.
 H1: there is at least one group differs than the others in terms of numbers of credit
cards among the three jobs.
Practice: Results

 Normality: fair
 Independence of observations: fair
 Variance Equality: Good
 Is there any significant result? Yes.
 Do we need a post hoc? Yes.
 The result of the post hoc? There is a difference between group 0 (students) and
group 1 (teachers)
 Effect size? large.
Practice: Summary

Observations from the study were analyzed by conducting a one-way


analysis of variance using R version 3.6.1. First, all assumptions are met and there is
no adjustment made. Results suggest that the job of an individual has an influence
on how many credit cards the individual has (F(2, 290) = 4.79, p < .001)
Continue the discussion with specifically which groups differed, a Tukey’s
hoc test was established. The result suggested that there is a significant difference
between students and teachers, in terms of the numbers of credit cards they hold.
The effect was large, Cohen’ D = .49.
Practice

 Lastly, we use purchasing ability as our IV.


 The research question is “ ____________________ ?”
 Hypothesis:
 H0:
 H1:
Practice: Results

 Normality: bad
 Independence of observations: fair
 Variance Equality: fair
 Is there any significant result? Yes.
 Do we need a post hoc? Yes.
 The result of the post hoc? There is a difference between group 0 (students) and
group 1 (teachers) and group 1 (teachers) and group 2 (taxi drivers)
 Effect size? large.
Weekly Lab

 Three brew methods with a measure


of the crème on the top. Find the
method producing the most crème.

 Use the EspressoData.

You might also like