0% found this document useful (0 votes)

11 views

Testing The Assumptions

Uploaded by

estian.maritz1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Testing The Assumptions

Uploaded by

estian.maritz1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Testing the assumptions of ANOVA

Introduction
In this section, we will discuss the theory of the assumptions necessary for an ANOVA and look at methods
to test these assumptions in R. We will also consider the general steps in R to perform an ANOVA.

Overview of previous work

ANOVA is a linear model to decompose each observation into a global mean, an effect on the mean of some
group, and a random error. Formally, each observation is a realisation of the random variable Xij , where

Xij = µ + αj + ij .

Our goal is to determine if all αj , j = 1, 2, ..., k are equal to zero or if at least one of them differs significantly
from zero. This led to decomposing the total variation (SST) into the sum of the variation within each group
(SSW or SSE) and the variation of the group means around the global mean. Finally, we computed a test
statistic given by
M SB
Fcalc = .
M SE
However, which assumptions are required such that the test statistic is an Fk−1,n−k distribution?

Assumptions necessary for ANOVA

The ANOVA test relies on the following assumptions:

1. We assume that the response variable from each group is normally distributed, i.e. X | G is normally
distributed.

2. Homoscedasticity. The population variances of the groups are equal, i.e. σ12 = σ22 = ... = σk2 = σ 2 .
3. The observations are random and independent. An observation of X should not depend on any other
observation in the sample. Difficult to test for the Independence

Naturally, we would like to test these assumptions and methods will be shown shortly. However, at this
stage it is worthwhile to note that not all assumptions are equally important. It is well known that the
F-test is robust with respect to violations of the normality assumption. This implies that even if there are
(small) departures from normality, the ANOVA will give a credible solution. However, if there are large
deviations from normality, especially in the case of small group sample sizes, then there are some remedial
measures that can be taken. One may try a transformation (to make the data normally distributed) or an
appropriate non-parametric test. Non-parametric tests are roughly equivalent tests that do not rely on any
distributional assumptions. The non-parametric equivalent of the one-factor ANOVA is called the Kruskal-
Wallis test and will be covered in later sections. We will test the normality assumptions using any of the

1
previously developed normality tests (e.g. Lilliefors), except that each group has to be tested for normality
separately and all groups have to be normally distributed for the assumption to hold.
The F-test though is much more sensitive to departures from homogeneity. This is not usually a problem
where there are equal group sizes, but in cases where there are unequal group sizes, a violation of the equal
variances assumption may distort the results significantly. This assumption is usually tested using Bartlett’s
test for homogeneity and will be illustrated later on. If the assumption is violated, certain remedial measures,
such as transformation of variables, can again be applied in order to rectify the situation.
The last assumption of independence is not very easy to test with a single test. It is usually present in
experiments where there are repeated measures taken on the same experimental unit over time. In this case,
one would perform a repeated measures ANOVA, but this is beyond the scope of this course. However, if the
data are collected randomly and with an appropriate sampling strategy, it is generally acceptable to assume
that the data are independent.

Testing the assumption of normalityAlso the Shapiro TEST

The general test for normality is the Lilliefors (Kolmogorov-Smirnov) test. Although we will only use R to
perform this test, a quick description of the test is given below.
Consider a sample {x1 , x2 , ..., xn } from an unknown distribution. Let F (x) be the cumulative distribution
function (CDF) of the normal distribution and let F (z) be the CDF of the standard normal distribution.
We test the following hypothesis.

H0 : X ∼ F (x) H0 is always status quo

H1 : X 6∼ F (x)

We can also write the hypothesis test in words:

H0 : The random variable X is normally distributed.

H1 : The random variable X is not normally distributed.

In this case, the test statistic is the maximum difference between the standard normal distribution and the
empirical distribution function of the (standardised) data.
Firstly, compute the sample mean x̄ and sample standard deviation s. Then, compute all the z-values with
xi − x̄
zi = , i = 1, ..., n.
s

Next, we can compute the empirical distribution of the zi values with

n
X I(zi ≤ z)
F̂ (z) = .
i=1
n

Sometimes the denominator is rather n + 1 to avoid probabilities of 1.

Finally, the test statistic is computed as the maximum distance from F̂ (z) to F (z). Hence, the test statistic
is
D = sup | F̂ (z) − F (z) | .
zi

This test statistic follows a Lilliefors (Kolmogorov-Smirnov) distribution and can only be computed with
software or using an appropriate table. Note that the Lilliefors test is the extension of the Kolmogorov-
Smirnov test when the population mean and variance are unknown. Although the same test statistic is

2
used in both tests, the sampling distributions differ. The sampling distributions of both tests can only be
computed numerically. Therefore, we rely on statistical software, like R, to compute the critical value or
p-value of the test. This will be demonstrated in later sections.
It is known that the power of this test is quite low compared to similar tests. Notice that, since the maximum
deviation between the expected and observed distribution is used, the test statistic will be severely affected
by outliers. A single outlier can therefore cause a deviation from normality.

Testing the assumption of equal variances

Each group has some unknown population variance. The test statistic of an ANOVA follows an F-distribution
only if all the population group variances are equal. Hence, we want to test the following hypothesis.

H0 : σ12 = σ22 = ... = σk2 = σ 2

H1 : At lease one σj2 6= σ 2 for some j = 1, 2, ..., k.

To test this hypothesis, we use Bartlett’s test for homogeneity. The test statistic of this test is given by
Pk
2
(n − k) ln(M SE) − j=1 (nj − 1) ln(s2j )
K = P .
1 k 1 1
1 + 3(k−1) j=1 nj −1 − n−k

It can be shown that this test statistic follows a χ2 (k − 1) distribution. Furthermore, we reject the null
hypothesis for large values of the test statistic. This can be compared to the critical value χ2α,k−1 .

Testing the assumption of independence

This assumption cannot be tested easily. However, if our samples are random and the groups are homoge-
neous, it is valid to assume that the observations are independent.
The only case where we can really consider this assumption is if we have time-series data. Since then the
order in which the data appears is known and we can determine if observations are dependent on previous
ones.
We will not explicitly test for independence. However, if this assumption is clearly violated, alternative tests
that take this dependence into account must be used.

ANOVA in R
In this section we use R libraries to perform ANOVA and to test the assumptions. We will use the same
dataset as before considering the number of weekly sales of a juice for different advertising strategies.
Please download the correct dataset from SUNLearn and import the data with the code below.

# Check if data is stored in current WD

any(list.files() == 'ExampleDataNarrow.txt')

## [1] TRUE

3
# Import data
dat = read.table('ExampleDataNarrow.txt', header = TRUE)
str(dat)

## ’data.frame’: 60 obs. of 2 variables:

## $ Population: chr "Convenience" "Convenience" "Convenience" "Convenience" ...
## $ Sales : int 529 658 793 514 663 719 711 606 461 529 ...

Next, we can visualise the three groups’ sales data.

library(ggplot2)

## Warning: package ’ggplot2’ was built under R version 4.0.5

library(ggpubr)

plt1 = ggplot(data = dat, aes(x = Population, y = Sales)) +

geom_boxplot(width=0.5) +
theme_pubr()
plt1

800

700
Sales

600

500

400

Convenience Price Quality

Population

plt2 = ggplot(data = dat, aes(x = Population, y = Sales)) +

geom_violin(trim = F) +
geom_boxplot(width=0.1) +
theme_pubr()
plt2

4
800
Sales
600

400

200
Convenience Price Quality
Population

data_summary = function(x){
m = mean(x)
ymin = m-sd(x)
ymax = m+sd(x)
return(c(y=m,ymin=ymin,ymax=ymax))
}

plt3 = ggplot(data = dat, aes(x = Population, y = Sales)) +

geom_violin(trim = F, fill = 'lightgrey') +
stat_summary(fun.data=data_summary) +
theme_pubr()
plt3

800
Sales

600

400

200
Convenience Price Quality
Population

5
We want to test the following hypothesis.

H0 : µ1 = µ2 = µ3
H1 : At least one µi 6= µj for some i 6= j.

This can be achieved in R as follows.

# Method 1
fit = aov(Sales~Population, data = dat)
summary(fit)

## Df Sum Sq Mean Sq F value Pr(>F)

## Population 2 57512 28756 3.233 0.0468 *
## Residuals 57 506984 8894
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

# Method 2
linModel = lm(Sales~Population, data = dat)
anova(linModel)

## Analysis of Variance Table

##
## Response: Sales
## Df Sum Sq Mean Sq F value Pr(>F)
## Population 2 57512 28756.1 3.233 0.04677 *
## Residuals 57 506984 8894.4
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

As before, we obtain a test statistic of Fcalc = 3.233 and a p-value of P (F2,57 > 3.233) = 0.0468. Therefore,
there is sufficient evidence to reject the null hypothesis. Hence, at a 5% level of significance, the population
average sales of the three different advertising strategies are not all equal.
Next, we want to test our assumptions. The above result is valid only if the assumptions are satisfied.

Testing normality

Notice that each group must be normally distributed. Hence, if we have k groups, we must perform k tests.
Although this would inflate the type I error, we do not consider multiple comparisons for this test.
The following code performs the Lilliefors test for normality.

conv = dat$Sales[dat$Population == 'Convenience']

qual = dat$Sales[dat$Population == 'Quality']
price = dat$Sales[dat$Population == 'Price']

library(nortest)

# Convenience
lillie.test(conv)

6
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: conv
## D = 0.12847, p-value = 0.5235

# Quality
lillie.test(qual)

##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: qual
## D = 0.13836, p-value = 0.4027

# Price
lillie.test(price)

##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: price
## D = 0.15565, p-value = 0.2309

Testing homoscedasticity
Here we are testing if all the group variances are equal. Hence, we only perform one test.
The following code performs Bartlett’s test for equal variances.
# Method 1
bartlett.test(x = dat$Sales, g = dat$Population)

##
## Bartlett test of homogeneity of variances
##
## data: dat$Sales and dat$Population
## Bartlett’s K-squared = 0.73887, df = 2, p-value = 0.6911

# Method 2
bartlett.test(formula = Sales~Population, data = dat)

##
## Bartlett test of homogeneity of variances
##
## data: Sales by Population
## Bartlett’s K-squared = 0.73887, df = 2, p-value = 0.6911

Next steps: multiple comparisons

If we conclude that there is a significant difference between some of the population group means, the next
step is to see which groups have different means. This is done with the multiple comparisons t-test which
we will discuss in the next section.

08 01 Fruit Fly Lab Report
No ratings yet
08 01 Fruit Fly Lab Report
6 pages
Refresher Course: Preboard Examination Nursing Practice I: Basic Foundation of Nursing and Professional Practice
No ratings yet
Refresher Course: Preboard Examination Nursing Practice I: Basic Foundation of Nursing and Professional Practice
9 pages
ANOVA
0% (1)
ANOVA
26 pages
Analysis of Variance
No ratings yet
Analysis of Variance
40 pages
2012-Assumption and Data Transformationnew
No ratings yet
2012-Assumption and Data Transformationnew
57 pages
Class5 Lecture
No ratings yet
Class5 Lecture
53 pages
One-Way Analysis of Variance: Using The One-Way
No ratings yet
One-Way Analysis of Variance: Using The One-Way
25 pages
ANOVA Lectures Slides 2021
No ratings yet
ANOVA Lectures Slides 2021
33 pages
ANova & experiemntal design
No ratings yet
ANova & experiemntal design
40 pages
Ttest
No ratings yet
Ttest
16 pages
ANOVA 2023 Aa 2564896
No ratings yet
ANOVA 2023 Aa 2564896
26 pages
Final Exam
No ratings yet
Final Exam
47 pages
A Nova Sumner 2016
No ratings yet
A Nova Sumner 2016
23 pages
04 Assumptions
No ratings yet
04 Assumptions
53 pages
Live Class - Inferential Statistics & Hypothesis Testing
No ratings yet
Live Class - Inferential Statistics & Hypothesis Testing
14 pages
Analysis of Variance - ANOVA: Eleisa Heron Eleisa Heron
No ratings yet
Analysis of Variance - ANOVA: Eleisa Heron Eleisa Heron
43 pages
Week5 Assumptions 1
No ratings yet
Week5 Assumptions 1
41 pages
ANOVA
No ratings yet
ANOVA
38 pages
Unit 5.3 ANOVA
No ratings yet
Unit 5.3 ANOVA
21 pages
Analysing Your Data: Eva A.M. Van Poppel, MSC
No ratings yet
Analysing Your Data: Eva A.M. Van Poppel, MSC
55 pages
Stats Final Notes
No ratings yet
Stats Final Notes
18 pages
Interpreting The One-Way ANOVA
0% (1)
Interpreting The One-Way ANOVA
7 pages
T-And F-Tests: Testing Hypotheses
No ratings yet
T-And F-Tests: Testing Hypotheses
26 pages
Nonparametric Tests and Anovas:: What You Need To Know
No ratings yet
Nonparametric Tests and Anovas:: What You Need To Know
40 pages
T Test Assumptions
No ratings yet
T Test Assumptions
11 pages
Understanding The Independent-Samples T Test
No ratings yet
Understanding The Independent-Samples T Test
8 pages
Anova-Ppt For Sonia Kalra Ma'Am
No ratings yet
Anova-Ppt For Sonia Kalra Ma'Am
31 pages
Analysis of Variance
No ratings yet
Analysis of Variance
6 pages
CH V Anova
No ratings yet
CH V Anova
22 pages
5 Single Sample T JASP
No ratings yet
5 Single Sample T JASP
10 pages
Hypothesis Testing - Analysis of Variance
No ratings yet
Hypothesis Testing - Analysis of Variance
19 pages
lecture12
No ratings yet
lecture12
67 pages
ANOVA
No ratings yet
ANOVA
39 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
QSCI 381 Lecture 8
No ratings yet
QSCI 381 Lecture 8
35 pages
RM Module 4
No ratings yet
RM Module 4
22 pages
Delacre Et Al 2019 Taking Parametric Assumptions Seriously Arguments For The Use of Welchs F-Test Classical F-Test in One-Way ANOVA
No ratings yet
Delacre Et Al 2019 Taking Parametric Assumptions Seriously Arguments For The Use of Welchs F-Test Classical F-Test in One-Way ANOVA
12 pages
F Test and T Test
No ratings yet
F Test and T Test
26 pages
One-Way ANOVA
No ratings yet
One-Way ANOVA
28 pages
BNIS - Mayana Leaves DPPH - ANOVA
No ratings yet
BNIS - Mayana Leaves DPPH - ANOVA
6 pages
Lesson 22 Hypothesis Testing For Three or More Means
No ratings yet
Lesson 22 Hypothesis Testing For Three or More Means
9 pages
Principles of The T-Test and ANOVA
No ratings yet
Principles of The T-Test and ANOVA
64 pages
23MT2013 DSS CO4 Session 20 Statistical Tests
No ratings yet
23MT2013 DSS CO4 Session 20 Statistical Tests
40 pages
14 Anova1
No ratings yet
14 Anova1
31 pages
Analysis of Continuous and Categorical Variables: January 28, 2020
No ratings yet
Analysis of Continuous and Categorical Variables: January 28, 2020
31 pages
Non-Normal Data Testing Options
No ratings yet
Non-Normal Data Testing Options
16 pages
Basic Anova PDF
No ratings yet
Basic Anova PDF
6 pages
Anova
No ratings yet
Anova
6 pages
One and Two Way ANOVA
No ratings yet
One and Two Way ANOVA
11 pages
Inferential Statistics: Draw Inferences About The Larger Group
No ratings yet
Inferential Statistics: Draw Inferences About The Larger Group
60 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Assumption of Anova
No ratings yet
Assumption of Anova
8 pages
Where Are We and Where Are We Going?: Purpose IV DV Inferential Test
No ratings yet
Where Are We and Where Are We Going?: Purpose IV DV Inferential Test
36 pages
Anova
No ratings yet
Anova
46 pages
ANOVA
No ratings yet
ANOVA
23 pages
An Nova 2
No ratings yet
An Nova 2
16 pages
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
No ratings yet
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
30 pages
Ed Aaaaaaa
No ratings yet
Ed Aaaaaaa
7 pages
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
No ratings yet
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
30 pages
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
No ratings yet
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
30 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Systematic Testing of Systematic Trading Strategies - Emlyn James Flint
No ratings yet
Systematic Testing of Systematic Trading Strategies - Emlyn James Flint
25 pages
Reference List: Print Sources
No ratings yet
Reference List: Print Sources
5 pages
(DM) Categorical Data Analysis (Agresti, 2002) Companion (S-PLUS and R)
No ratings yet
(DM) Categorical Data Analysis (Agresti, 2002) Companion (S-PLUS and R)
265 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
Quantitative Diagnosis of Rotator Cuff Tears Based On Sonographic Pattern Recognition
No ratings yet
Quantitative Diagnosis of Rotator Cuff Tears Based On Sonographic Pattern Recognition
13 pages
4-5 - 810 - Liem - Hien (F.P)
No ratings yet
4-5 - 810 - Liem - Hien (F.P)
16 pages
Uncertainty, Data & Judgement: Extra Exercises
No ratings yet
Uncertainty, Data & Judgement: Extra Exercises
53 pages
A Comparison of The Power of The T Test Mann-Kenda
No ratings yet
A Comparison of The Power of The T Test Mann-Kenda
19 pages
BS Final Assignment
No ratings yet
BS Final Assignment
3 pages
Serum Magnesium and Insulin Resistance
No ratings yet
Serum Magnesium and Insulin Resistance
5 pages
BRM Report - Group 1 - One-Way ANOVA
No ratings yet
BRM Report - Group 1 - One-Way ANOVA
35 pages
Advanced Statistics Manual PDF
100% (3)
Advanced Statistics Manual PDF
258 pages
T-Test For Independent Samples: West Visayas State University College of Education Graduate School
No ratings yet
T-Test For Independent Samples: West Visayas State University College of Education Graduate School
7 pages
Research Mock Exam
No ratings yet
Research Mock Exam
27 pages
MATH 1281 - Unit 3 MA
No ratings yet
MATH 1281 - Unit 3 MA
6 pages
Effect of Teaching and Learning Resources On Students Academic Performance at Selected Schools Kagogo Sector, Burera DistrictRlyanda
No ratings yet
Effect of Teaching and Learning Resources On Students Academic Performance at Selected Schools Kagogo Sector, Burera DistrictRlyanda
7 pages
Stat 252-Practice Final-Greg-Solutions
No ratings yet
Stat 252-Practice Final-Greg-Solutions
14 pages
A Study of Weibull Mixture ROC Curve With Constant Shape Parameter PDF
No ratings yet
A Study of Weibull Mixture ROC Curve With Constant Shape Parameter PDF
10 pages
A Flexible Copula Regression Model With Bernoulli and Tweedie Margins For Estimating The Effect of Spending On Mental Health
No ratings yet
A Flexible Copula Regression Model With Bernoulli and Tweedie Margins For Estimating The Effect of Spending On Mental Health
34 pages
Sand (Dep Soil Moisture, Ind Permitivity) Outlier
No ratings yet
Sand (Dep Soil Moisture, Ind Permitivity) Outlier
13 pages
Philippine Folk Dance 1 Update 01
No ratings yet
Philippine Folk Dance 1 Update 01
14 pages
Tests in R
No ratings yet
Tests in R
25 pages
Riscuri DP Duodenopancreatectomie
No ratings yet
Riscuri DP Duodenopancreatectomie
7 pages
College of Liberal Arts University of Luzon
No ratings yet
College of Liberal Arts University of Luzon
9 pages
Module 9: Statistical Inference of Two Samples: The Z-Test
No ratings yet
Module 9: Statistical Inference of Two Samples: The Z-Test
11 pages
The Influence of Financial Behavior Towards The Household's Financial Stress
No ratings yet
The Influence of Financial Behavior Towards The Household's Financial Stress
16 pages

Testing The Assumptions

Uploaded by

Testing The Assumptions

Uploaded by

Testing the assumptions of ANOVA

Overview of previous work

Assumptions necessary for ANOVA

Testing the assumption of normalityAlso the Shapiro TEST

H0 : X ∼ F (x) H0 is always status quo

We can also write the hypothesis test in words:

H0 : The random variable X is normally distributed.

Next, we can compute the empirical distribution of the zi values with

Sometimes the denominator is rather n + 1 to avoid probabilities of 1.

Testing the assumption of equal variances

H0 : σ12 = σ22 = ... = σk2 = σ 2

Testing the assumption of independence

# Check if data is stored in current WD

## ’data.frame’: 60 obs. of 2 variables:

Next, we can visualise the three groups’ sales data.

## Warning: package ’ggplot2’ was built under R version 4.0.5

plt1 = ggplot(data = dat, aes(x = Population, y = Sales)) +

Convenience Price Quality

plt2 = ggplot(data = dat, aes(x = Population, y = Sales)) +

plt3 = ggplot(data = dat, aes(x = Population, y = Sales)) +

This can be achieved in R as follows.

## Df Sum Sq Mean Sq F value Pr(>F)

## Analysis of Variance Table

conv = dat$Sales[dat$Population == 'Convenience']

Next steps: multiple comparisons

You might also like