Homework 9: Independent and Paired Samples T-Tests: Information 1
Homework 9: Independent and Paired Samples T-Tests: Information 1
Information 1
(1) On Canvas, download the zip file “HW9.zip” under “Files > Homeworks”. This
compressed folder includes everything you need for this homework.
(2) Unzip this file in a folder of your choice on your computer. This unzipped folder contains
the R markdown file “HW9.Rmd” that you will work with. Open that file in RStudio to start
working on your homework. (You can browse the files within that folder, including the data
files, by making use of the “Files” tab on the bottom-right panel of your RStudio.)
INSTRUCTIONS: You must edit only this Markdown document (“HW9.Rmd”), entering
your responses within the designated blocks (within the regions that start with “# Write
your answer within this block.”). Then, generate an MS Word document using the “Knit”
function in RStudio. Then, export a PDF from that MS Word document, without editing the
Word document itself. Submit that PDF document as your homework.
(3) In the following questions, you will work with (simulated) data underlying an example we
discussed very early on in the course, asking whether hunger impacts the productivity of
professors. Each row in this dataset consists of the number of words written by a professor
on a morning. Half of the professors (N=30) made breakfast (breakfast = Yes) on the
morning when they recorded the number of words they wrote. The other, non-overlapping,
half of professors (N=30) skipped breakfast (breakfast = No) on the morning when they
recorded the number of words they wrote. We predict that making breakfast will be
associated with writing more words on that morning.
Load this dataset into a dataframe using the following code block.
# load your data
writing.profs <- read.csv("writing_profs.csv")
# take a peak at your data
head(writing.profs)
## IDs words breakfast
## 1 1 505 Yes
## 2 2 393 Yes
## 3 3 314 Yes
## 4 4 364 Yes
## 5 5 389 Yes
## 6 6 489 Yes
Question 1
1a
[0.25pt]
Let’s start with visualizing our data using a bar chart with error bars. We use a command from the
Hmisc package (mean_cl_boot) to create these error bars, which show 95% CI of the mean for each
group. (More info: These error bars show bootstrapped confidence intervals of the mean – these are
confidence intervals determined using a simulation procedure instead of the analytical formula we
learned in classes. It has certain advantages, which are outside of the scope of the HW/PSYC200 to
discuss. You can treat them as your usual 95% CIs for all practical purposes.)
# Uncomment and complete
#ggplot(writing.profs, aes(x = ___, y = ___)) +
# stat_summary(fun = mean, geom = "bar") +
# stat_summary(fun.data = mean_cl_boot, geom = "pointrange")
1b
[0.25pt]
Just by looking at the confidence intervals in your visualization above, what can you conclude about
the null hypothesis that there is no difference between these two groups? Write your reasoning.
(HINT: Judge whether the CIs are touching and what that’d mean for the null hypothesis that there
is no difference.)
Answer:
Question 2
Using linear regression, examine whether there is a statistically significant difference between the
two breakfast conditions. That is, we would like to build a linear model in which we will use the
breakfast condition (the independent variable; X) to predict the number of words written (the
dependent variable; Y).
2a
[0.5pt]
Create a linear model using the lm function; then use the summary function to have R report the
results of this regression model.
# Uncomment and complete
#newModel <- ___(___ ~ ___, data = ___)
#summary(___)
2b
[1pt]
Report your results including significance, direction of the effect, the slope coefficient, its SE, t-
score, df, and p-value. (1 sentence)
Also report the variance explained. (You can report either the regular R-square or the adjusted R-
square from your output – both are fine.) (1 sentence)
Answer:
Question 3
3a
[0.25pt]
Using dplyr’s summarise and group_by functions, provide the descriptive statistics for each group
(mean and SD of the number of words written).
# Uncomment and complete
#writing.profs %>%
# group_by(___) %>%
# summarise(mean = mean(___), sd = sd(___))
3b
[0.25 pt]
Which of the following is the appropriate variant of the two-sample t-test to examine whether
professors wrote more words when they had breakfast?
i. Independent samples t-test
ii. Paired samples t-test
Answer:
3c
[0.75 pt]
Perform this appropriate test in R. Notice that as in Q2, we wish to predict the number of words
based on the breakfast behavior.
# Uncomment and complete
#myBreakfastModel <- ___(___, data = ___)
#myBreakfastModel
3d
[1pt]
Write up the results in one sentence. Make sure to indicate the test you used, whether the result is
significant, the direction of any significant effect, the t-score (including df), and the p-value. Also
include the means and SDs for both groups (using your answer to 3a).
Answer:
3e
[0.5pt]
What is the effect size of our manipulation in terms of r-value and R-square? You will need t and df
to calculate the r-value; use myBreakfastModel to obtain those statistics as shown in the codeblock
below.
# First, extract t and df from myBreakfastModel
# Uncomment
#t <- myBreakfastModel$statistic[[1]]
#df <- myBreakfastModel$parameter[[1]]
# Uncomment and complete. (Enter t and df into the effect size calculation formu
la.)
#r <- ___
(Notice that your R-square value should be comparable to the regular R-square value from Q2.)
Information 2
In the following questions, you will look at a dataset of average, per-minute blink-rates of 12
subjects (i.e., how often they blinked per minute) as they performed two different visuomotor tasks
[Drew, G.C. (1951) Variations in blink-rate during visual-motor tasks. Quarterly Journal of
Experimental Psychology, 3, 73-88].
We will ask whether the blink rates of individuals varied as a function of task difficulty. In a within-
subject design, Drew (1951) measured how often each subject blinked when they had to steer a
pencil along a moving track. The track could be moving linearly in a straight line (easier task) or it
could be moving with an oscillatory pattern (harder task). Drew (1951) predicted that people would
blink less often when they are performing a harder task.
In your dataset:
• The “Subject” column shows the ID of each subject participated in this study. Note that this
is a within subject design, meaning that each subject participated in each of the two
conditions.
• The “task” column shows which track they were being tested on: “Straight” (easier task) or
“Oscillating” (harder task).
• The “blinks.per.minute” column shows the average blink rate.
Load this data into a dataframe using the following code block.
# load your data
tracking.blinks <- read.csv("tracking_blinks.csv")
# take a peak at your data
head(tracking.blinks)
## Subject task blinks.per.minute
## 1 S01 Straight 19.0
## 2 S01 Oscillating 19.3
## 3 S02 Straight 16.7
## 4 S02 Oscillating 9.0
## 5 S03 Straight 2.7
## 6 S03 Oscillating 1.1
Question 4
4a
[0.25pt]
Use dplyr’s summarise and group_by functions to obtain the descriptive statistics (mean and SD) of
the blink rates in each task condition.
#tracking.blinks %>%
# group_by(___) %>%
# summarise(mean = mean(___), sd = sd(___))
4b
[0.5pt]
Using the appropriate variant of the t-test, which is a paired samples t-test, examine whether subjects
blinked at a significantly different rate between the easier Straight and harder Oscillating tasks.
Unfortunately, the syntax of the paired test got a little cumbersome recently. So the nice line of code
we learned in the lecture does not work anymore. It is a tiny bit more involved but nothing we
haven’t learned before.
The following is the pattern of code we will need to write to perform a paired samples t-test (notice
the lack of formula syntax with ~).
t.test(x, y, paired = TRUE)
where x will be the blinking rates in the Straight condition and y will be the blinking rates in the
Oscillating condition. We will first use dplyr’s filter and select commands to get the data for
each condition, storing them in the variables straight and oscillating, respectively.
We will then implement our t-test using these newly created variables.
# Uncomment and complete
#straight <-
# tracking.blinks %>%
# filter(task == '____') %>%
# select(blinks.per.minute)
#oscillating <-
# tracking.blinks %>%
# filter(task == '____') %>%
# select(blinks.per.minute)
4c
[1pt]
Write up the results in one sentence. Make sure to indicate the test you used, whether the result is
significant, the direction of any significant effect, the t-score (including df), and the p-value. Also
include the means and SDs for both conditions (using your answer to 4a).
Anwer:
4d
[0.5pt]
Calculate the effect size in terms of the r-value and R-square. You will need t and df to calculate the
r-value; use myDepModel to obtain those statistics as shown in the codeblock below.
# You can obtain t and df using myDepModel.
# Uncomment
#t <- myDepModel$statistic[[1]]
#df <- myDepModel$parameter[[1]]
# Uncomment and complete. (Enter t and df into the effect size calculation formu
la.)
#r <- ___
# Uncomment to report r
#r
4e
[1pt]
List one advantage of using a within-subject design in this study (1-3 sentences).
Answer:
Question 5
Read this post by Prof. Andrew Gelman (of Columbia University):
https://bankunderground.co.uk/2016/08/24/balancing-bias-and-variance-in-the-design-of-
behavioral-studies-the-importance-of-careful-measurement-in-randomized-experiments/.
5a
[0.5pt]
Was Drew (1951) studying a within-subject phenomenon? Yes/No
Answer:
5b
[0.5pt]
If Drew (1951) had performed a between-subjects design, he would have probably needed to
increase his number of subjects to find the effect he documented. Yes/No
Answer:
Question 6
[1pt]
You could also analyze the results of blink study using a one-sample t-test. You do not need to do
the actual test, but what additional variable would you need to create to do a one-sample t-test instead?
Answer: