Analysis of Variance Tutorials (Key)
Analysis of Variance Tutorials (Key)
Task 1: T-Test 2
This tutorial can be found in the Tutorial folder. It shall prepare you to be successful in
performing appropriate analysis in an empirical controlled experiment.
All data files can be found in Course GDrive > Downloads > ps4hci-Wobbrock.
The file contains a hypothetical study of 40 college students’ Facebook posting behavior using
one of two platforms: Apple’s iOS or Google’s Android OS. The data show the number of
Facebook posts subjects made during a particular week using their mobile platform.
Here, this is a one-way design with two levels and thus T-test is appropriate
For Mann-Whitney,
wilcox.test(posts$Posts ~ posts$Platform)
To show descriptives, do
summary(posts$Posts)
mean(posts$Posts)
sd(posts$Posts)
sd(posts$Posts) / sqrt(length(posts$Posts))
Answer the followings:
1. Is this a between-subjects or within-subjects experiment? Why?
a. Between subject
2. What is the independent variable named?
a. Platform
3. How many levels does the independent variable have? What are they?
a. 2
4. How many subjects were in this experiment?
a. 40
5. How many subjects were exposed to each level of the independent variable? Is the
design balanced (i.e., are the numbers equal)?
a. 20
6. What are the mean and standard deviation number of posts for each level of the
independent variable?
a. iOS - M = 24.950, SD = 7.045
b. Andriod - M = 30.100, SD = 8.795
7. Assuming equal variances, what is the t statistic for this t-test? (Hint: this is also called
the t Ratio.)
a. -2.044
8. How many degrees of freedom are there (dfs)?
a. 36.271
9. What is the two-tailed p-value resulting from this t-test? Is it significant at the α = .05
level?
a. Yes, p < .05
10. The formulation for expressing a significant t-test result is: t(dfs) = test
statistic, p < p-value thresholds, d = cohen d. For an insignificant
result, it is: t(dfs) = t-statistic, n.s. Write your result just like you would do in
a research paper. Read
https://shengdongzhao.com/newSite/how-to-report-statistics-in-apa-format/ for more
detail how to report.
We found a significant effect of platform ( t(36.271) = -2.044, p < .05, d = -0.646) ( t36.271
= -2.044, p < .05, d = -0.646) on the number of facebook posts between iOS (M =
24.950, SD = 7.045) and Android (M = 30.100, SD = 8.795) (see Figure X)
11. The equivalent of a between-subjects (independent samples) t-test using nonparametric
statistics is the Mann-Whitney U test. The Mann-Whitney U test is a test for an
experiment containing one between-subjects factor with two-levels. The formulation for
expressing a Mann-Whitney U-test is U = test statistic, p < p-value
thresholds, r = rank biserial correlation. Note that when you use a
non-parametric test, report the median instead of the mean. Write your result just like
you would do in a research paper.
F-test (another name for analysis of variances or ANOVA), which can do everything a t-test can
do, and more. An F-test, which is the most common analysis of variance, can handle multiple
independent variables, or factors, and these factors can have more than two levels. By
comparison, a t-test can only have one factor with two levels, which is not very useful for many
experiment designs.
R: Do:
For ANOVA:
aov <- aov(posts$Posts ~ posts$Platform)
summary(aov)
2
We found a main effect of Platform (F(1, 38) = 4.177, p < .05, η𝑝 = 0. 099) between iOS
(M = 24.950, SD = 7.045) and Android (M = 30.100, SD = 8.795) (see Figure X).
7. In case your data cannot use ANOVA, you can use Kruskal-Wallis test, where the
formulation is H(df) = statistics, p < threshold. Report just like you
would in a research paper.
We fail to find a significant effect of Platform (H(1) = 3.053, n.s.) on Posts between iOS (Md =
24.500) and Android (Md = 27.500) (see Figure X).
- Step 1: Find whether a significant difference on some factor (IV) between some levels
using ANOVA
- Step 1.1a: If no, do nothing
- Step 1.1b: If yes, and if you have more than 2 levels, you have to do post hoc,
why? → because you need to know exactly which level is really different
between one another. -> Tukey or Bonferroni (highly recommended)
As noted, the F-test can handle more than one factor, and also, more than two levels per factor.
A one-way ANOVA refers to a single factor design. Similarly, a two-way ANOVA refers to a
two-factor design, i.e., two independent variables. In this part, we will still conduct a one-way
ANOVA, but this time, our factor will have three levels. Thus, it cannot be analyzed with a t-test,
which can only handle two levels of a single factor.
Open postsctrl.sav. This data set is the same for the iOS and Android levels, but now has added
20 new college students as a control group who did not use a mobile device for posting on
Facebook but were told to use their desktop computer instead. Thus, the Platform factor now
has three levels: iOS, Android, and desktop.
JASP: Do everything as Task 2. In addition, since Platform has three levels, if the ANOVA
has significance in Platform (i.e., p < 0.05), then inside ANOVA, under Post Hoc tests,
choose Platform, and tick Tukey and Bonferroni.
R: Do:
Set the levels (unfortunately, R does not recognize the levels of Platforms):
postsctrl$Platform <- ordered(postsctrl$Platform, levels = c("1", "2", "3"))
Perform anova:
aov <- aov(Posts ~ Platform, data = postsctrl)
summary(aov)
A post hoc analysis with bonferroni corrections confirms the difference between iOS and
Desktop (p < .05), between iOS and Android (p < .05), but not Android-Desktop.
It is often the case that we wish to examine the effects of more than one factor, and we also
care about the interaction among factors. Because multiple factors are involved, this is called a
factorial design, expressed as N1 × N2 × … × Nn for an arbitrary number n of factors, and
where each Ni is an integer indicating the number of levels of that factor. In practice, it is difficult
to interpret experiments with more than three factors, especially if those factors each have more
than two levels. For this part, we will examine an augmented version of our current study that
adds another factor. Open postsbtwn.sav. You will see another column labeled Day with
values “weekday” and “weekend.” These values correspond to the days of the week the subject
was allowed to post to Facebook.
JASP: Very similar to previous homework on ANOVA. You have to simply add one more
factor "Day" under Fixed Factors.
R: Do:
Anova(lm(Posts ~ Platform * Day, data = postsbtwn, type="III"))
FYI: For the car library, it explicitly requires us to build a linear model in order to use type 3
model. Type 3 is used here as it does not depend on the order of the factors (Type 1 does!).
By default, aov is of type 1 while lm can specify the type. In the previous homework, we can
use aov since we only have one factor.
We fail to find a significant effect of both platform (H(1) = 3.053, n.s.) and day (H(1) =
0.535, n.s.) on the number of facebook posts between iOS (Md = 24.500) and Android
(Md = 27.500) and between weekday (Md = 27.000) and weekend (Md = 28.000) (see
Figure X).
11. Interpret these results and craft three sentences describing the results of this
experiment, one for each factor and one for the interaction. What can we say about the
findings from this study? (Hint: p-values between .05 and .10 are often called “trends” or
“marginal results,” and are often reported, although they cannot be considered strong
evidence. Be wary of ever calling such results “marginally significant.” A result is either
significant or it is not; there is no “marginal significance.”)
2
We found a marginally significant effect of Platform (F1,36 = 4.058, p = .051, η𝑝 = 0. 101)
between iOS (M = 24.950, SD = 7.045) and Android (M = 30.100, SD = 8.795). We
failed to find a significant effect of Day (F1,36 = 0.918, n.s.) between weekday (M =
26.300, SD = 8.430) and weekend (M = 28.750, SD = 8.168) nor the interaction effect of
Day * Platform (F1,36 = 0.0004, n.s.) on number of facebook posts.
Thus far, we have only considered experiments where one subject was measured once on only
one level of each factor. But often we wish to measure a subject more than once, perhaps for
different levels of our factor(s), or over time, in which case time itself becomes a factor. Such
designs are called “repeated measures” designs, and the factors on which we obtain repeated
measures are called within-subjects factors (as opposed to between-subjects factors). For
repeated measures studies, we can still use an ANOVA, but now we use a “repeated measures
ANOVA,” and our data table inevitably looks different: for a wide-format table, there are now
multiple measures per row (each row still corresponds to just one subject, as it has thus far).
Our current hypothetical study on Facebook posts has been modified to be a purely
within-subjects study. Imagine that each college student was issued either an iOS or Android
device for one week, and then the other device for the next week. Also, each college student’s
posts were counted separately on weekdays and weekends. Instead of needing 40 college
students as before, we now only need 10 students for the same data, which is shown in
postswthn.sav. Open those files and see the wide-format data tables.
R:
First, let R understand your table structure:
Platform <- factor(c("iOS", "iOS", "Android", "Android"))
Day <- factor(c("weekday", "weekend", "weekday", "weekend"))
factor <- data.frame(Platform, Day)
factor
The first row will be iOS weekday, second row will be iOS weekend, third row will be Android
weekday and so on. The first row must match with the first column of the table, for example,
the first column is iOS_weekday, which match with the factor first row.
Third, run the anova (idata is the structure, while idesign is the DV)
aov <- Anova(model, idata=factor, idesign=~Platform*Day,
type=3)
summary(aov, multivariate = FALSE)
Fourth, run the Mauchly test to confirm the anova did not violate the sphericity assumptions.
We have to do for Platform, Day, and Platform:Day separately:
mauchly.test(model, M = ~Day, X = ~1, idata=factor)
mauchly.test(model, M = ~Platform, X = ~1, idata=factor)
mauchly.test(model, M = ~Platform:Day, X = ~Platform+Day, idata=factor)
1. What is the output of the normality test? Is the data normal? How about the test of
sphericity?
a. Normality: All four combinations are normal, as shown by the Shapiro-Wilk test
(p > .05).
b. Sphericity: Well, since we have only two levels, sphericity will not be a concern.
Sphericity is a test of difference among variances which cannot be done when
there are only two levels. Similarly, homogeneity test cannot be done since
there are no between-subject factors.
2. Was this a one-way, two-way, or three-way analysis of variance? What is/are the
factor(s)? What are each factor’s levels? Express the design using N1 × N2 × … × Nn
notation.
a. A Two-way Repeated Measures ANOVA because there are two IV
b. A 2 (Day) x 2 (Platform) within-subject design is conducted.
3. For each identified factor, was it between-subjects or within-subjects? How do you
know?
a. Both Day and Platform are within-subject factors. We can easily seen from the
format of the table (short table instead of long table)
4. Write the statistical result for the Platform factor
2
a. F(1, 9) = 22.097, p < .01, η𝑝 = 0. 711
5. Write the statistical result for the Day factor.
a. F(1, 9) = 0.368, n.s.
6. Write the statistical result for the Platform*Day interaction.
a. F(1, 9) = 0.004, n.s.
7. What is the effect size in terms of partial eta squared? What is the interpretation?
a. For platform, the partial eta squared is 0.711 which is considered large, implying
a practical significance.
8. Write the result in APA format.
a. A Shapiro-Wilk test confirms the normality of the data (p > .05). A Two-Way
Repeated Measures ANOVA shows the main effect of Platform (F(1, 9) = 22.097,
2
p < .01, η𝑝 = 0. 711) on Posts between iOS (M = 49.900 , SD = 8.863) and
Android (M = 60.200 , SD = 10.347). Anyhow, the test fails to find any
significance of Day (F(1, 9) = 0.368, n.s.) on Posts between weekday (M =
52.600, SD = 15.918) and weekend (M = 57.500, SD = 15.299 ) nor the
interaction effect of Platform * Day (F(1, 9) = 0.004, n.s.).
Here, the platform is a between subject factor. Use a similar approach as Task 5, but consider
the platform as between subject factors.
1. What is the output of the normality test? Is the data normal? How about the test of
sphericity?
a. Normality: Both combinations are normal, as shown by the Shapiro-Wilk test (p >
.05).
2. Was this a one-way, two-way, or three-way analysis of variance? What is/are the
factor(s)? What are each factor’s levels? Express the design using N1 × N2 × … × Nn
notation.
a. A Two-Way Repeated Measures ANOVA
b. A 2 (Platform) x 2 (Day) mixed experimental design is conducted with Platform as
between-subject factor and Day as within-subject factor.
3. For each identified factor, was it between-subjects or within-subjects? How do you
know?
a. See 2
4. Write the statistical result for the Platform factor
2
a. F(1, 18) = 5.716, p < .05, η𝑝 = 0. 241
5. Write the statistical result for the Day factor.
a. F(1, 18) = 0.712, n.s.
6. Write the statistical result for the Platform*Day interaction.
a. F(1, 18) = 0.00003, n.s. (n.s. ⇒ not significant)
7. What is the effect size in terms of partial eta squared? What is the interpretation?
a. For platform, the partial eta squared is 0.241 which is considered large, implying
a practical significance.
8. Write the result in APA format.
a. A Shapiro-Wilk test confirms the normality of the data. *** A Two-Way Repeated
Measures ANOVA with Platform as the between-subject factor shows the main
2
effect of Platform (F(1, 18) = 5.716, p < .05, η𝑝 = 0. 241) on Posts between iOS
(M = 49.900 , SD = 8.863) and Android (M = 60.200 , SD = 10.347). Anyhow,
the test fails to find any significance of Day (F(1, 18) = 0.712, n.s.) on Posts
between weekday (M = 26.300, SD = 8.430) and weekend (M = 28.750, SD =
8.168 ) nor the interaction effect of Platform * Day (F(1, 18) = 0.00003, n.s.).
b. What if sphericity test say that we violate the assumption?
i. A Two-Way Repeated Measures ANOVA with Greenhouse-Geisser
Correction and Platform as the between-subject factors shows…..
c. How to write to confirm that your ANOVA passes the sphericity
check/homogeneity test:
i. *** A sphericity test confirms the assumption of homogeneity of variances
among groups (p < .05) -> RP ANOVA
ii. *** A homogeneity test confirms the assumption of homogeneity of
variances among groups (p < .05) -> between subject ANOVA
d. How to report the Posthocs, if the levels are more than three
i. A Two-Way Repeated Measures ANOVA with Platform as the
between-subject factor shows the main effect of Platform (F(1, 18) =
2
5.716, p < .05, η𝑝 = 0. 241) on Posts between iOS (M = 49.900 , SD =
8.863), Android (M = 60.200 , SD = 10.347) and ChakyOS (M = , SD = ).
A posthoc test with Bonferroni correction confirms the difference between
ChakyOS and Android (p < .05), ChakyOS and iOS (p < .05), but not
between iOS and Android.
ii. A posthoc test with Bonferroni correction > A Tukey posthoc test
e. What if your data is not normal?
i. A Shapiro-Wilk found that our data is not normal (p < .05) thus
Kruskal-Wallis test is used. The test found …...