Week 7 Slides
Week 7 Slides
2 0 2 2 S UMMER T E R M
S ESSI ON 7
JU N E 1 7 T H
Agenda
Any questions/remarks about last meeting?
Worksheet 7
df <- fread("SpiderLongdat.sec")
New command from the data.table package → install the package, then
library(data.table)
Exercise 1b)
How many participants hold a spider and how many were shown a picture of a spider?
Using the length() command and the square brackets to specify the
characteristics we want to analyze
length(df$Group[df$Group=="Picture"])
length(df$Group[df$Group=="Real Spider"])
Output will be 12 for both lines → same result
Exercise 1c)
Plot a histogram (ideally using ggplot) of anxiety values for each group.
ggplot uses different colors to subset the data into its groups
Exercise 1c)
Initial command to create a basic histogram for two subgroups:
ggplot(mapping = aes(x=Anxiety)) +
geom_histogram(data = filter(df, Group=="Picture"), binwidth =5,
fill =3, alpha =0.7) +
geom_histogram(data = filter(df, Group=="Real Spider"), binwidth =5,
fill =6, alpha =0.7)
Uses the normal additive structure of ggplot, except this time the x-aesthetic is
specified in the beginning and then two histograms are added, filtering by “Group”
fill specifies the color, alpha specifies how opaque/translucent the shading is
Exercise 1c)
Output of this code, using fill
colors 3 and 6 and an alpha
of 0.7
Exercise 1c)
We can make the diagram a little fancier by specifying an own aesthetic, which
allows for labeling the colors too
ggplot(mapping = aes(x=Anxiety)) +
geom_histogram(data = filter(df, Group=="Picture"), mapping =
aes(fill ="Picture"), binwidth =5, alpha =0.7) +
geom_histogram(data =filter(df, Group=="Real Spider"), mapping =
aes(fill ="Real Spider"), binwidth =5, alpha =0.7) +
scale_fill_manual(values = c("Picture"=4,"Real Spider"=7)) +
labs(fill ="Group")
Exercise 1c)
scale_fill_manual(values = c("Picture"=4,"Real Spider"=7)) +
labs(fill ="Group")
All the information on color fillings is now contained in this part and a small box
containing a legend on what the colors mean will be added to the histogram
Running this command (with new colors 4 and 7) will yield a diagram that looks like
this:
Exercise 1d)
What are the minimum and maximum anxiety values for each group?
These results can be obtained quickly using the summarise() and group_by()
commands (obviously, using the simple min()/ max() on the subgroups would
work as well)
df %>% group_by(Group) %>% summarise(Min = min(Anxiety), Max =
max(Anxiety))
Both the minimums and the maximums are higher for those people who held a
real spider than those who only had a picture of one
Exercise 2
Exercise 2
Want to perform a two-sample T-test
Recall the worksheet on T-tests two weeks ago: Two-sample tests split the data
into two groups based on a certain characteristic and measure a variable of
interest in both of those groups
Tells R to split data based on the Group individuals were part of and test for the
mean value for Anxiety
Exercise 2
Code output after running
the command
People who held the real spider experienced higher average anxiety values than
those who were only given a photo of a spider
The difference wasn’t large enough to achieve statistical significance though
This is likely also a result of the small sample size → only 24 observations have
been made; at a higher n a difference in means of 7 units might very well have
achieved high statistical significance levels
Exercise 3
Exercise 3
Creating boxplots in R with the tidyverse package → ggplot + geom_boxplot
as the root of the command
Example:
ggplot(data = df) + geom_boxplot(aes(x=Group, y=Anxiety,
color=Group))
ggplot commands can be structured in various ways, can also start out with
df %>% ggplot(…) + …
Exercise 3
Result should look
something like this though,
regardless of structure used
Exercise 4
Exercise 4
First time working with functions in R!
The function() command tells R you want to program your own function
◦ In the first normal brackets, you specify which/how many elements are considered in
the function
◦ To the left of the <- arrow or the equation sign, you enter a name for your function
→ here: ttestfromMeans
◦ In the swoopy brackets {} you specify what exactly the function is supposed to do
with the elements you put in, so that once you have it programmed, you can enter
your own number values and it will run the specified operations on those
Calculation of the t-
Exercise 4 value: Measure (mean
difference) divided by
the standard error
ttestfromMeans <- function(x1, x2, sd1, sd2, n1, n2){
df <- n1 + n2 – 2 n-2 degrees of freedom (n1+n2)-2
Significance level
Cumulative variance of all the data in
the set
Code output to print:
t(df = dfval) = tval, p = sigval
Exercise 4
To be able to run the command and check whether the results are equal to
those from the integrated t.test command, we first need to calculate our values
for x1, x2, sd1, sd2, n1, n2
These can then be saved as variables with any given name or just entered as
number values, the only important thing is that they’re entered in the specified
order
Exercise 4
How to get those values? Once again, several ways → easiest probably the
group_by() %>% summarise() command
Another approach could be splitting the data into two subsets using the filter()
command and then using the simple mean(), n(), sd() for both subsets
group.pic <- filter(df, Group=="Picture")
group.real <- filter(df, Group=="Real Spider")
These results are equal to what we were given by the t.test command
t ≈ -1.681, p ≈ 0.107 → no statistical significance given α = 0.05