Carlos - Willis - Problem Set 2, Spring 2023
Carlos - Willis - Problem Set 2, Spring 2023
Carlos Willis
# Load any packages, if any, that you use as part of your answers here
# For example:
library(ggplot2)
library(tidyverse)
Question 1 - 5 points
You may need to process your data before you begin your analysis. Specifically, you will need to make sure
that the variable type is set to ‘factor’ for both of your grouping variables and ‘num’ for your outcome
variable.
doughnuts.factorial <- read.csv("doughnutsfactorial.csv", header=TRUE, sep=",") # Loads the CSV file int
Like in Problem Set 1, please create two new variables in the doughnuts.factorial data set. The first new
variable will be called fat_type_factor and will contain the same values as in the fat_type variable but will
have a variable type of factor. The second new variable will be called flour_type_factor and will contain the
same values as in the flour_type variable but will also have a variable type of factor.
# Complete the lines to properly create the two new variables
Check your work by running the following code chunk. Be sure that fat_type_factor and flour_type_factor
are factor-type variables before you complete the rest of the problem set.
1
str(doughnuts.factorial)
Question 2 - 5 points
Provide a visual assessment and a quantitative assessment for the assumption of normality for each cell.
Hint: Remember that a cell contains the observations that make up a particular combination of two factors.
Therefore, there will be as many graphs/quantitative tests as are unique combinations of flour and fat types.
Code for your visual assessment of normality
# Code for visual assessment of normality for each cell
doughnuts.factorial %>%
ggplot(aes(x = sim_tot_fat)) +
geom_density() +
theme_bw() +
facet_grid(fat_type_factor ~ flour_type_factor)
ap gf ww
0.100
0.075
Canola
0.050
0.025
0.000
0.100
0.075
Peanut
0.050
0.025
density
0.000
0.100
0.075 Shortening
0.050
0.025
0.000
0.100
Sunflower
0.075
0.050
0.025
0.000
40 60 80 40 60 80 40 60 80
sim_tot_fat
doughnuts.factorial %>%
ggplot(aes(sample = sim_tot_fat)) +
stat_qq() +
stat_qq_line() +
2
theme_bw() +
facet_grid(fat_type_factor ~ flour_type_factor)
ap gf ww
80
Canola
60
40
80
Peanut
60
40
y
Shortening
80
60
40
Sunflower
80
60
40
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
x
# Be sure to display your visual assessment in your knitted document!
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
3
##
## filter
library(flextable)
##
## Attaching package: 'flextable'
## The following object is masked from 'package:purrr':
##
## compose
doughnuts.factorial %>%
group_by(fat_type_factor, flour_type_factor) %>%
shapiro_test(sim_tot_fat) %>%
select(-variable) %>%
mutate(across(where(is.numeric), round, 4)) %>%
flextable()
fat_type_factor
flour_type_factor
statistic p
Canola ap 0.9852 0.9745
Canola gf 0.9346 0.6161
Canola ww 0.7822 0.0404
Peanut ap 0.9420 0.6754
Peanut gf 0.8776 0.2583
Peanut ww 0.8773 0.2570
Shortening ap 0.9097 0.4341
Shortening gf 0.9617 0.8325
Shortening ww 0.9001 0.3745
Sunflower ap 0.8876 0.3059
Sunflower gf 0.8535 0.1681
Sunflower ww 0.9890 0.9866
4
# Be sure to display your quantitative assessment in your knitted document!
Question 3 - 5 points
Provide a visual assessment and a quantitative assessment for the assumption of equality of variances for
each cell.
Code for your visual assessment of equality of variances
# Code for visual assessment of equality of variances for each cell
doughnuts.factorial %>%
unite("Combination", fat_type_factor, flour_type_factor, sep = "_") %>%
ggplot(aes(x = reorder(Combination, sim_tot_fat, FUN = median), y = sim_tot_fat)) +
geom_boxplot() +
theme_bw() +
coord_flip()
Peanut_ap
reorder(Combination, sim_tot_fat, FUN = median)
Shortening_ap
Shortening_ww
Shortening_gf
Canola_ap
Peanut_ww
Peanut_gf
Canola_ww
Canola_gf
Sunflower_ap
Sunflower_ww
Sunflower_gf
40 60 80
sim_tot_fat
doughnuts.factorial %>%
unite("Combination", fat_type_factor, flour_type_factor, sep = "_") %>%
ggplot(aes(x = sim_tot_fat, fill = Combination)) +
geom_density(alpha = 0.5) +
theme_minimal() +
labs(fill = NULL) +
theme(legend.position = "top")
5
Canola_ap Peanut_ap Shortening_ap Sunflower_ap
Canola_gf Peanut_gf Shortening_gf Sunflower_gf
Canola_ww Peanut_ww Shortening_ww Sunflower_ww
0.100
0.075
density
0.050
0.025
0.000
40 60 80
sim_tot_fat
# Be sure to display your visual assessment in your knitted document!
6
Question 4 - 10 points
Before conducting your two-way ANOVA with an interaction, start by conducting one-way ANOVAs for each
of your factors. You wouldn’t do this in practice - you would just conduct the two-way ANOVA - but you’ll
do it here to allow you to make some comparisons between one-way ANOVA and two-way ANOVA with an
interaction in Question 7. You do not need to interpret these ANOVAs, but be sure to display the output in
your knitted document.
Your one-way ANOVA for testing if the means in total fat (sim_total_fat) are the same across fat types:
fat.aov <- aov(sim_tot_fat ~ fat_type_factor, data = doughnuts.factorial)# Complete this line
summary(fat.aov)
summary(flour.aov)
Question 5 - 10 points
Conduct a two-way ANOVA with an interaction between fat type and flour type. Use sim_total_fat as the
outcome and fat_type_factor and flour_type_factor as the grouping variables. Please be sure to display
your ANOVA results using the summary() function.
fat_flour_int.aov <- aov(sim_tot_fat ~ fat_type_factor*flour_type_factor, data = doughnuts.factorial)# C
summary(fat_flour_int.aov)
7
Question 6 - 10 points
Be sure to have completed the two-way ANOVA with an interaction analysis before answering the following
four questions.
Question 7 - 5 points
You conducted 2 one-way ANOVAs in Question 4 and 1 two-way ANOVA with an interaction in Question 5.
In this question, you will answer four questions comparing the results of these analyses.
A) Look at the lines for fat_type_factor in both the one-way ANOVA with fat_type_factor (fat.aov in Ques-
tion 4) used as the grouping variable and the two-way ANOVA with an interaction (fat_flour_int.aov
in Question 5). Is there any difference in the degrees of freedom or the sums of squares between these
lines?
Your answer here (yes/no): No
B) Looking at the same lines as the previous question, is there a difference between the F test statistic or
the p-values?
8
Your answer here (yes/no): Yes
C) Look at the lines for flour_type_factor in both the one-way ANOVA with flour_type_factor
(flour.aov in Question 4) used as the grouping variable and the two-way ANOVA with an interaction
(fat_flour_int.aov in Question 5). Is there any difference in the degrees of freedom and the sums of
squares between these lines?
Your answer here (yes/no): No
D) Looking at the same lines as the previous question, is there a difference between the F test statistic or
the p-values?
Your answer here (yes/no): Yes
summary(flour.aov)