0% found this document useful (0 votes)
73 views16 pages

Post-Hoc ANOVA Test

Uploaded by

Gabriel Lagrana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views16 pages

Post-Hoc ANOVA Test

Uploaded by

Gabriel Lagrana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Statistical Analysis

Group 6 – Written Report

Post-hoc ANOVA Test:


Tukey’s, Tukey-Kramer’s, and Bonferroni’s

Submitted to:

Professor Ramse C. Osano Jr.

Submitted by:

Ico, Rhea Mae G.

Jamon, Arabella G.

Magat, Nadine S.

Manalo, Al-Yesha T.

April 2024
A. Test/Statistic Name
Post-hoc ANOVA Test: Tukey’s, Tukey-Kramer’s, and Bonferroni’s

B. Etymology

In Latin, post-hoc means “after this.” In the context of statistical analysis, a post
hoc test is conducted after an initial analysis, often to further examine specific
differences or relationships that were not initially addressed. ANOVA stands for
Analysis of Variance, a statistical method used to compare means among multiple
groups. When ANOVA indicates that there are significant differences between groups,
post hoc tests are typically employed to determine which specific groups differ from
each other.

C. Purpose of the Test

When a one-way ANOVA test leads to a significant result, it is common to then


follow up with post-hoc tests to see which particular groups are significantly different
from each other. There are three different methods of post hoc tests:

First is the Tukey’s method or Tukey's Honest Significant Difference (HSD)


test, this is a post hoc test commonly used to assess the significance of differences
between pairs of group means. Tukey HSD is often a follow up to one-way ANOVA,
when the F-test has revealed the existence of a significant difference between some of
the tested groups.

Next is the Tukey-Kramer which is an extension of Tukey’s test that is


performed to find out the specific pair(s) of groups that cause the difference after an
ANOVA Test had shown that there is a significant difference in the means of the groups
tested.

Lastly, Bonferroni’s correction aims to have a more conservative approach in


controlling the family-wise error rate when conducting multiple comparisons. This is
to be used when you have a set of planned comparisons you would like to make
beforehand.

D. Null and Alternative Hypothesis to the Test

Null Hypothesis:

The null hypothesis states that there are no significant differences between
the means of the groups being compared. In the context of post-hoc tests, it asserts
that any observed differences in means between specific pairs of groups are due to
random variability or chance.
Alternative Hypothesis:

The alternative hypothesis suggests that there is at least one pair of group means
that is significantly different from each other. It implies that the differences observed
between specific pairs of groups are not due to random chance but are instead
indicative of true differences in the population means.

E. Test Formula and Calculations

Tukey Test

According to Glen (2021) To test all pairwise comparisons among means using
the Tukey HSD, calculate HSD for each pair of means using the following formula:

Where:

Mi - Mj is the difference between the pair of means. To calculate this. Mi should be


larger than Mj.
MSw is the Mean Square Within, and n is the number in the group or treatment.
Bonferroni’s Procedure

According to Glen (2018). The formula to calculate the Holm-Bonferroni is:

F. Test execution steps in SPSS – R – Python

SPSS:

Run your ANOVA analysis in SPSS.

After obtaining your ANOVA results, go to "Post Hoc" options. Select the desired post-
hoc test from the available options (such as Tukey's, Tukey-Kramer's, or
Bonferroni's). Input your significance level (usually 0.05). Execute the post-hoc test,
and interpret the results.

R: Install and load the necessary packages, such as "stats" for ANOVA and "agricolae"
for post-hoc tests. Run your ANOVA analysis using the aov() function.

After obtaining your ANOVA results, perform the post-hoc test using functions like
TukeyHSD() for Tukey's test or HSD.test() from the "agricolae" package for other tests.

Interpret the post-hoc test results.

Python (using statsmodels): Install and import the necessary libraries, such as
"statsmodels" for ANOVA and post-hoc tests.

Run your ANOVA analysis using the ols() function from "statsmodels".
After obtaining your ANOVA results, perform the post-hoc test using the
pairwise_tukey hsd() function.

Interpret the post-hoc test results.

G. Decision rules of the test (computed vs critical value of the test; p-value vs
alpha)
Tukey

Post Hoc tests show the significant difference of group means. They also control the
experiment-wise error rate. The experiment-wise error rate is when you perform a
hypothesis test, there is a type I error rate, which the significance level or alpha
defines. There is a possibility of rejection to the null hypothesis which is actually true
or in simple words a false positive. When you perform only one test, the type I error
rate equals your significance level, which is often 5%. However, as you conduct more
tests, your chance of a false positive increases. If you perform enough tests, you are
technically guaranteed a false positive. The error rate for a family of tests is always
higher than an individual test.

If your ANOVA test shows that the means are not all equal, your next step is to
determine which means are different, to your level of significance. Performing a series
of t tests is impractical as that would greatly increase your likelihood of a Type I error.
Moreover, you must only perform a post-hoc analysis if the ANOVA test shows a p-
value less than your alpha. If p>a, you do not know whether the means are all the same
or not, and you cannot go fishing for unequal means.

By how much the effect size of the significantly different means is also important. To
do this, one must compute the confidence interval first followed by an interpretation
of whether there is significant difference or no significance difference in the means. If
the endpoints of the CI have the same sign (both positive or both are negative), then
0 is not in the interval and you can conclude that the means are different. If the
endpoints of the CI have opposite signs, then 0 is in the interval and you can't
determine whether the means are equal or different. Compute that confidence
interval similarly to the confidence interval for the difference of two means, but using
the q distribution which avoids the problem of inflating a:

where xi and xj are the two-sample means, ni and nj are the two sample sizes, MSW is
the within-groups mean square from the ANOVA table, and q is the critical value of
the studentized range for a, the number of treatments or samples r, and the within-
groups degrees of freedom dews. The square-root term is called the standardized
error.

Tukey-Kramer

The Tukey-Kramer Method is almost similar to the Tukey Analysis, but this is used
when you have unequal sample sizes. You have to calculate the estimated standard
deviation for each pairwise comparison.

Bonferroni

In a Bonferroni Analysis, the p-value for each test must be equal to its alpha divided
by the number of tests performed. The test is performed by taking a random sample
of a population or group. While the null hypothesis is tested, the alternative
hypothesis is also tested, whereby the two results mutually exclusive. However, with
any testing of a null hypothesis, there is the expectation that a false positive or type I
error could occur. (Hayes, 2020)

For example, an error rate of 5% might typically be assigned to a statistical test,


meaning that 5% of the time there will likely be a false positive. This 5% error rate is
called the alpha level. However, when many comparisons are being made in an
analysis, the error rate for each comparison can impact the other results, creating
multiple false positives. To address this, Bonferroni designed a method of correcting
for the increased error rates in hypothesis testing that had multiple comparisons.
Bonferroni's adjustment is calculated by taking the number of tests and dividing it
into the alpha value. Using the 5% error rate from our example, two tests would yield
an error rate of 0.025 or (.05/2) while four tests would therefore have an error rate
of .0125 or (.05/4).

H. Possible outcomes and interpretation

Tukey

The studentized range developed by Tukey overcomes the problem of inflated


significance level. If sample sizes are equal, the risk of a Type I error is exactly o, and
if sample sizes are unequal, it is less than a: the procedure is conservative. In terms of
confidence intervals, if the sample sizes are equal then the confidence level is the
stated 1-a, but if the sample size are unequal then the actual confidence level is greater
than 1-a
Some assumptions for the test also include: observations are independent within and
among groups, the groups for each mean in the test are normally distributed, there is
equal within-group variance across the groups associated with each mean in the test
or homogeneity of variance.

The first step in the interpretation of a Tukey post-hoc analysis is to present a table of
means and standard deviation, followed by an overall report of ANOVA, next is to
report which pairs were significantly different at a given alpha level, and lastly report
which pairs were not significantly different. You could also use a graph of means
rather than a table; or you could incorporate post-hoc test results into a table using
the asb<c style notation.

Bonferroni

Suppose that a post hoc analysis consists of m separate tests, to ensure that the total
probability of making any Type I errors at all is at most 4.210 the Bonferroni
correction just says "multiply all your raw p- values by m". If we let p denote the
original p-value, and let p'j be the corrected value, then the Bonferroni correction tells
that:

p'=mxp

And therefore, if you are using the Bonferroni correction, you would reject the null
hypothesis if p'<a. The logic behind this correction is very straightforward. We're
doing m different tests so if we arrange it so that each test has a Type I error rate of at
most a/m, then the total Type I error rate across these tests cannot be larger than a.

I. Type of questions the test answer

Tukey’s HSD (Honestly Significant Difference)

Which specific pairs of means are significantly different from each other?

Tukey-Kramer

Can researchers still perform post-hoc tests to compare means when groups
have different sample sizes?

Bonferroni’s

How can researchers adjust the p-values to control for type I errors due to
multiple comparisons?

J. Comon errors and misconceptions in using the test


1. Conducting post-hoc tests without first establishing results in ANOVA.

2. Assuming post-hoc tests control all possible comparisons, which is not entirely
accurate.

3. Incorrectly interpreting significance levels.

K. Limitations of the test (when NOT to use the test)

1. Assuming of homogeneity of Variances

2. Sensitivity to outliers

3. Increased type I error rate with multiple comparisons

4. Sample size considerations

5. Interpretation of Multiple Tests

L Complementary tests or post hoc procedures (if applicable)

Not Applicable

M. Case Problem

Tukey’s Method

The Turkey confidence limits for all pairwise comparisons with confidence coefficient of
at least 1-a are:

Notice that the point estimator and the estimated variance are the same as those for a
single pairwise comparison that was illustrated previously. The only difference between
the confidence limits for simultaneous comparisons and those for a single comparison is
the multiple of the estimated standard deviation.

Also note that the sample sizes must be equal when using the studentized range
approach.
Step 1: Set of all pairwise comparisons

The set of all pairwise comparisons consists of:

Step 2: Confidence intervals for each pair

Assume we want a confidence coefficient of 95 percent, or 0.95. Since r = 4 and n = 20,


the required percentile of the studentized range distribution is 90.05: 4, 16. Using the
Tukey method for each of the six comparisons yields:

The simultaneous pairwise comparisons indicate that the differences μ₁ - μ₁ and µε µ₃


are not significantly different from 0 (their confidence intervals include 0), and all the
other pairs are significantly different.
Tukey-Kramer Test

Suppose we perform a one-way ANOVA on three groups: A, B, and C. It will be done in


Excel.

The p-value from the ANOVA table is 0.000588. Since this p-value is less than .05, we can
reject the null hypothesis and conclude that the means between the three groups are not
equal.

To determine exactly which group means are different, we can perform a Turkey-Kramer
post hoc test using the following steps:

Step 1: Find the absolute mean difference between each group.

First, we’ll find the absolute mean difference between each group using the averages
listed in the first table of the ANOVA output:
Step 2: Find the Q critical value.

Next, we need to find the Q critical value using the following formula:

Q critical value = Q*√(s²pooled/n.)

Where:
• Q = Value from Studentized Range Q Table
• S²pooled = Pooled variance across all groups
• n = Sample size for a given group

To find the Q value, you can refer to the Studentized Range Q Table which looks like this:
In our example, k = the number of groups, which is k = 3. The degrees of freedom is calculated
as n-k = 30 – 3 = 27. Since 27 is not shown in the table above, we can use a conservative
estimate of 24. Based on k = 3 and df = 24, we find that Q = 3.53.

The pooled variance can be calculated as the average of the variances for the groups, which
turns out to be 19.056.
Step 3: Determine which group means are different.

Lastly, we can compare the absolute mean difference between each group to the Q critical
value. If the absolute mean difference is larger than the Q critical value, then the difference
between the group means is statistically significant:
Based on the Turkey-Kramer post hoc test, we found the following:
• The difference in means between group A and group B is statistically significant.
• The difference in means between group B and group C is not statistically significant.
• The difference in means between group A and group C is statistically significant.

Bonferroni Method

We wish to estimate, as we did using the Scheffe method, the following linear combinations
(contrasts):

And construct 95% confidence intervals around the estimates.

Step 1: Compute the point estimates of the individual contrasts

Step 2: Compute the point estimate and variance of C

As before, for both contrasts, we have

Where o² = 1.331 was computed in our previous example. The standard error is 0.5158 (the
square root of 0.2661).
Step 3: Compute the Bonferroni simultaneous confidence interval

For a 95% overall confidence coefficient using the Bonferroni method, the t value is t1-
0.05/(2-2), 16t0.9875, 16 = 2.473 (from the t table in Chapter 1). Now we can calculate the
confidence intervals for the two contrasts. For C₁ we have confidence limits -0.5 2.473
(0.5158) and for C2 we have confidence limits 0.34 2.473 (0.5158).

Thus, the confidence intervals are:

N. References:

Frost, J. (2023, October 26). Using Post Hoc Tests with ANOVA. Statistics by Jim.
https://statisticsbyjim.com/anova/post-hoc-tests-
anova/#:~:text=In%20Latin%2C%20post%20hoc%20means,the%20experiment%2Dwise
%20error%20rate

L. (2023, January 2). 2.3: Tukey Test for Pairwise Mean Comparisons. Statistics LibreTexts.
https://stats.libretexts.org/Bookshelves/Advanced_Statistics/Analysis_of_Variance_and_De
sign_of_Experiments/02%3A_ANOVA_Foundations/2.03%3A_Tukey_Test_for_Pairwise_Mea
n_Comparisons

Post Hoc Tests – Tukey HSD – bioST@TS. (n.d.). https://biostats.w.uib.no/post-hoc-tests-


tukey-
hsd/#:~:text=Tukey’s%20Honest%20Significant%20Difference%20(HSD,some%20of%20
the%20tested%20groups.

Rynearson. (2023, October 30). Tukey-Kramer-Test – Excel and Google Sheets. Automate
Excel. https://www.automateexcel.com/stats/tukey-kramer-
test/#:~:text=Tukey%2DKramer%20Test%20is%20performed,of%20the%20groups%20
are%20equal

Z. (2020, December 24). Tukey vs. Bonferroni vs. Scheffe: Which Test Should You Use?
Statology. https://www.statology.org/tukey-vs-bonferroni-vs-
scheffe/#:~:text=The%20Tukey%20post%2Dhoc%20test,as%20the%20Tukey%2DKrame
r%20test.

https://statacumen.com/teach/S4R/PDS_book/post-hoc-tests.html. (n.d.). In Wikepedia.


https://statacumen.com/teach/S4R/PDS_book/post-hoc-tests.html

You might also like