0% found this document useful (0 votes)
10 views9 pages

Carlos - Willis - Problem Set 2, Spring 2023

The document outlines a problem set for a statistics course, focusing on a factorial experiment conducted by a doughnut shop owner to minimize fat absorption in fried doughnuts. It includes data processing steps, visual and quantitative assessments for normality and equality of variances, and instructions for conducting one-way and two-way ANOVAs. The document emphasizes the importance of statistical analysis in understanding the effects of different fat and flour types on fat absorption in doughnuts.

Uploaded by

williscarl79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Carlos - Willis - Problem Set 2, Spring 2023

The document outlines a problem set for a statistics course, focusing on a factorial experiment conducted by a doughnut shop owner to minimize fat absorption in fried doughnuts. It includes data processing steps, visual and quantitative assessments for normality and equality of variances, and instructions for conducting one-way and two-way ANOVAs. The document emphasizes the importance of statistical analysis in understanding the effects of different fat and flour types on fat absorption in doughnuts.

Uploaded by

williscarl79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Problem Set 2, Spring 2023

Carlos Willis

# Load any packages, if any, that you use as part of your answers here
# For example:

library(ggplot2)
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --


## v tibble 3.1.8 v dplyr 1.1.0
## v tidyr 1.3.0 v stringr 1.5.0
## v readr 2.1.3 v forcats 0.5.2
## v purrr 1.0.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Donna is the owner of a boutique doughnut shop. Because many of her customers are conscious of their fat
intake but want the flavor of fried doughnuts, she decided to develop a doughnut recipe that minimizes the
amount of fat that the doughnuts absorb from the fat in which the doughnuts are fried.
She conducted a factorial experiment that had a similar procedures as Lowe (1935). Like Lowe, she used four
types of fats (fat_type). She also used three types of flour (flour_type): all-purpose flour, whole wheat flour,
and gluten-free flour. For each combination of fat type and flour type, she cooked six identical batches of
doughnuts. Each batch contained 24 doughnuts, and the total fat (in grams) absorbed by the doughnuts in
each batch was recorded (sim_tot_fat).

Question 1 - 5 points
You may need to process your data before you begin your analysis. Specifically, you will need to make sure
that the variable type is set to ‘factor’ for both of your grouping variables and ‘num’ for your outcome
variable.
doughnuts.factorial <- read.csv("doughnutsfactorial.csv", header=TRUE, sep=",") # Loads the CSV file int

Like in Problem Set 1, please create two new variables in the doughnuts.factorial data set. The first new
variable will be called fat_type_factor and will contain the same values as in the fat_type variable but will
have a variable type of factor. The second new variable will be called flour_type_factor and will contain the
same values as in the flour_type variable but will also have a variable type of factor.
# Complete the lines to properly create the two new variables

doughnuts.factorial$fat_type_factor <- as.factor(doughnuts.factorial$fat_type)

doughnuts.factorial$flour_type_factor <- as.factor(doughnuts.factorial$flour_type)

Check your work by running the following code chunk. Be sure that fat_type_factor and flour_type_factor
are factor-type variables before you complete the rest of the problem set.

1
str(doughnuts.factorial)

## 'data.frame': 72 obs. of 5 variables:


## $ fat_type : chr "Canola" "Canola" "Canola" "Canola" ...
## $ flour_type : chr "ap" "ap" "ap" "ap" ...
## $ sim_tot_fat : int 78 71 80 88 62 72 78 75 89 74 ...
## $ fat_type_factor : Factor w/ 4 levels "Canola","Peanut",..: 1 1 1 1 1 1 3 3 3 3 ...
## $ flour_type_factor: Factor w/ 3 levels "ap","gf","ww": 1 1 1 1 1 1 1 1 1 1 ...

Question 2 - 5 points
Provide a visual assessment and a quantitative assessment for the assumption of normality for each cell.
Hint: Remember that a cell contains the observations that make up a particular combination of two factors.
Therefore, there will be as many graphs/quantitative tests as are unique combinations of flour and fat types.
Code for your visual assessment of normality
# Code for visual assessment of normality for each cell
doughnuts.factorial %>%
ggplot(aes(x = sim_tot_fat)) +
geom_density() +
theme_bw() +
facet_grid(fat_type_factor ~ flour_type_factor)

ap gf ww
0.100
0.075

Canola
0.050
0.025
0.000
0.100
0.075

Peanut
0.050
0.025
density

0.000
0.100
0.075 Shortening
0.050
0.025
0.000
0.100
Sunflower

0.075
0.050
0.025
0.000
40 60 80 40 60 80 40 60 80
sim_tot_fat
doughnuts.factorial %>%
ggplot(aes(sample = sim_tot_fat)) +
stat_qq() +
stat_qq_line() +

2
theme_bw() +
facet_grid(fat_type_factor ~ flour_type_factor)

ap gf ww

80

Canola
60
40

80

Peanut
60
40
y

Shortening
80
60
40

Sunflower
80
60
40

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
x
# Be sure to display your visual assessment in your knitted document!

Code for your quantitative assessment of normality


# Code for quantitative assessment of normality for each cell
norm_test <- tapply(
doughnuts.factorial$sim_tot_fat,
list(
doughnuts.factorial$fat_type_factor,
doughnuts.factorial$flour_type_factor
),
shapiro.test
)

p_values <- sapply(norm_test, function(x) x$p.value)


p_values

## [1] 0.97446970 0.67535809 0.43412376 0.30586021 0.61605794 0.25834140


## [7] 0.83245247 0.16811082 0.04037248 0.25695906 0.37454689 0.98661062
library(rstatix)

##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':

3
##
## filter
library(flextable)

##
## Attaching package: 'flextable'
## The following object is masked from 'package:purrr':
##
## compose
doughnuts.factorial %>%
group_by(fat_type_factor, flour_type_factor) %>%
shapiro_test(sim_tot_fat) %>%
select(-variable) %>%
mutate(across(where(is.numeric), round, 4)) %>%
flextable()

## Warning: There was 1 warning in `mutate()`.


## i In argument: `across(where(is.numeric), round, 4)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
## Warning: fonts used in `flextable` are ignored because the `pdflatex` engine
## is used and not `xelatex` or `lualatex`. You can avoid this warning by using
## the `set_flextable_defaults(fonts_ignore=TRUE)` command or use a compatible
## engine by defining `latex_engine: xelatex` in the YAML header of the R Markdown
## document.

fat_type_factor
flour_type_factor
statistic p
Canola ap 0.9852 0.9745
Canola gf 0.9346 0.6161
Canola ww 0.7822 0.0404
Peanut ap 0.9420 0.6754
Peanut gf 0.8776 0.2583
Peanut ww 0.8773 0.2570
Shortening ap 0.9097 0.4341
Shortening gf 0.9617 0.8325
Shortening ww 0.9001 0.3745
Sunflower ap 0.8876 0.3059
Sunflower gf 0.8535 0.1681
Sunflower ww 0.9890 0.9866

4
# Be sure to display your quantitative assessment in your knitted document!

Question 3 - 5 points
Provide a visual assessment and a quantitative assessment for the assumption of equality of variances for
each cell.
Code for your visual assessment of equality of variances
# Code for visual assessment of equality of variances for each cell
doughnuts.factorial %>%
unite("Combination", fat_type_factor, flour_type_factor, sep = "_") %>%
ggplot(aes(x = reorder(Combination, sim_tot_fat, FUN = median), y = sim_tot_fat)) +
geom_boxplot() +
theme_bw() +
coord_flip()

Peanut_ap
reorder(Combination, sim_tot_fat, FUN = median)

Shortening_ap

Shortening_ww

Shortening_gf

Canola_ap

Peanut_ww

Peanut_gf

Canola_ww

Canola_gf

Sunflower_ap

Sunflower_ww

Sunflower_gf

40 60 80
sim_tot_fat
doughnuts.factorial %>%
unite("Combination", fat_type_factor, flour_type_factor, sep = "_") %>%
ggplot(aes(x = sim_tot_fat, fill = Combination)) +
geom_density(alpha = 0.5) +
theme_minimal() +
labs(fill = NULL) +
theme(legend.position = "top")

5
Canola_ap Peanut_ap Shortening_ap Sunflower_ap
Canola_gf Peanut_gf Shortening_gf Sunflower_gf
Canola_ww Peanut_ww Shortening_ww Sunflower_ww

0.100

0.075
density

0.050

0.025

0.000
40 60 80
sim_tot_fat
# Be sure to display your visual assessment in your knitted document!

Code for your quantitative assessment of equality of variances


library(car)

## Loading required package: carData


##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
# Code for quantitative assessment of equality of variances for each cell
lev.test <- leveneTest(sim_tot_fat ~ interaction(fat_type_factor, flour_type_factor), data = doughnuts.f
lev.test

## Levene's Test for Homogeneity of Variance (center = median)


## Df F value Pr(>F)
## group 11 0.4064 0.9478
## 60
# Be sure to display your quantitative assessment in your knitted document!

6
Question 4 - 10 points
Before conducting your two-way ANOVA with an interaction, start by conducting one-way ANOVAs for each
of your factors. You wouldn’t do this in practice - you would just conduct the two-way ANOVA - but you’ll
do it here to allow you to make some comparisons between one-way ANOVA and two-way ANOVA with an
interaction in Question 7. You do not need to interpret these ANOVAs, but be sure to display the output in
your knitted document.
Your one-way ANOVA for testing if the means in total fat (sim_total_fat) are the same across fat types:
fat.aov <- aov(sim_tot_fat ~ fat_type_factor, data = doughnuts.factorial)# Complete this line

# Don't forget to display your results!

summary(fat.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## fat_type_factor 3 6967 2322.5 20.17 1.86e-09 ***
## Residuals 68 7831 115.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Your one-way ANOVA for testing if the means in total fat (sim_total_fat) are the same across flour types:
flour.aov <- aov(sim_tot_fat ~ flour_type_factor, data = doughnuts.factorial)

# Don't forget to display your results!

summary(flour.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## flour_type_factor 2 1063 531.3 2.669 0.0765 .
## Residuals 69 13736 199.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Question 5 - 10 points
Conduct a two-way ANOVA with an interaction between fat type and flour type. Use sim_total_fat as the
outcome and fat_type_factor and flour_type_factor as the grouping variables. Please be sure to display
your ANOVA results using the summary() function.
fat_flour_int.aov <- aov(sim_tot_fat ~ fat_type_factor*flour_type_factor, data = doughnuts.factorial)# C

# Don't forget to display your results!

summary(fat_flour_int.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## fat_type_factor 3 6967 2322.5 21.976 1.01e-09 ***
## flour_type_factor 2 1063 531.3 5.028 0.00958 **
## fat_type_factor:flour_type_factor 6 427 71.2 0.674 0.67095
## Residuals 60 6341 105.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7
Question 6 - 10 points
Be sure to have completed the two-way ANOVA with an interaction analysis before answering the following
four questions.

Main effects hypotheses - two questions to answer


A) Please select the statement that is the best interpretation of the p-value associated with the main effect
of fat type.
Statement 1: I reject the null hypothesis and conclude that at least one fat type has a statistically significantly
different mean fat absorption than the other groups.
Statement 2: I fail to reject the null hypothesis and conclude that there is no statistically significant difference
in the mean amount of fat absorbed among fat types.
Your answer here: Statement 1
B) Please select the statement that is the best interpretation of the p-value associated with the main effect
of flour type.
Statement 1: I reject the null hypothesis and conclude that at least one flour type has a statistically
significantly different mean fat absorption than the other groups.
Statement 2: I fail to reject the null hypothesis and conclude that there is no statistically significant difference
in the mean amount of fat absorbed among flour types.
Your answer here: Statement 1

Interaction hypothesis - 2 questions to answer


C) Please select the statement that is the best interpretation of the p-value associated with the interaction
between fat type and flour type.
Statement 1: The interaction between fat type and flour type is statistically significant.
Statement 2: The interaction between fat type and flour type is not statistically significant.
Your answer here: Statement 2
D) Based on your response to the previous question about the interaction, can you interpret the main
effects in a straightforward fashion? Put differently, is it justifiable to make a conclusion about the
effect of fat type while ignoring the effect of flour type (and vice versa)?
Your answer here (yes or no): Yes

Question 7 - 5 points
You conducted 2 one-way ANOVAs in Question 4 and 1 two-way ANOVA with an interaction in Question 5.
In this question, you will answer four questions comparing the results of these analyses.
A) Look at the lines for fat_type_factor in both the one-way ANOVA with fat_type_factor (fat.aov in Ques-
tion 4) used as the grouping variable and the two-way ANOVA with an interaction (fat_flour_int.aov
in Question 5). Is there any difference in the degrees of freedom or the sums of squares between these
lines?
Your answer here (yes/no): No
B) Looking at the same lines as the previous question, is there a difference between the F test statistic or
the p-values?

8
Your answer here (yes/no): Yes
C) Look at the lines for flour_type_factor in both the one-way ANOVA with flour_type_factor
(flour.aov in Question 4) used as the grouping variable and the two-way ANOVA with an interaction
(fat_flour_int.aov in Question 5). Is there any difference in the degrees of freedom and the sums of
squares between these lines?
Your answer here (yes/no): No
D) Looking at the same lines as the previous question, is there a difference between the F test statistic or
the p-values?
Your answer here (yes/no): Yes
summary(flour.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## flour_type_factor 2 1063 531.3 2.669 0.0765 .
## Residuals 69 13736 199.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fat_flour_int.aov)

## Df Sum Sq Mean Sq F value Pr(>F)


## fat_type_factor 3 6967 2322.5 21.976 1.01e-09 ***
## flour_type_factor 2 1063 531.3 5.028 0.00958 **
## fat_type_factor:flour_type_factor 6 427 71.2 0.674 0.67095
## Residuals 60 6341 105.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You might also like