0% found this document useful (0 votes)

35 views29 pages

Week 7 Slides

This document summarizes an R tutorial session that covered conducting statistical tests and creating visualizations using the tidyverse package in R. Exercises included loading data, summarizing groups, plotting histograms, performing two-sample t-tests, and creating boxplots and functions. Key results showed no significant difference in anxiety between those who saw a spider picture versus held a real spider, though mean anxiety was higher in the latter group. A student-created function for conducting t-tests from group means reproduced results from R's built-in t.test function.

Uploaded by

Shahad Hussain Kavassery Sakeer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views29 pages

Week 7 Slides

Uploaded by

Shahad Hussain Kavassery Sakeer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

DNI Tutorial

2 0 2 2 S UMMER T E R M
S ESSI ON 7
JU N E 1 7 T H
Agenda
Any questions/remarks about last meeting?

Worksheet 7

Time for working on group projects & asking questions

Exercise 1
Exercise 1a)
Preparation: Load the data into your R working environment

Once again several possible commands:

df <- read_tsv("SpiderLongdat.sec")
df <- read.delim("SpiderLongdat.sec") Using the default or tidyverse
commands

df <- fread("SpiderLongdat.sec")
New command from the data.table package → install the package, then
library(data.table)
Exercise 1b)
How many participants hold a spider and how many were shown a picture of a spider?

Different approaches for this one too:

group_by command: df %>% group_by(Group) %>% summarise(n())

Output will look like this, summarizing the
number of people n() in both possible
values for the Group variable
Exercise 1b)
Second approach:

Using the length() command and the square brackets to specify the
characteristics we want to analyze

length(df$Group[df$Group=="Picture"])
length(df$Group[df$Group=="Real Spider"])
Output will be 12 for both lines → same result
Exercise 1c)
Plot a histogram (ideally using ggplot) of anxiety values for each group.

Creating a histogram that distinguishes between the different subgroups results

is a pretty large command

ggplot uses different colors to subset the data into its groups
Exercise 1c)
Initial command to create a basic histogram for two subgroups:

ggplot(mapping = aes(x=Anxiety)) +
geom_histogram(data = filter(df, Group=="Picture"), binwidth =5,
fill =3, alpha =0.7) +
geom_histogram(data = filter(df, Group=="Real Spider"), binwidth =5,
fill =6, alpha =0.7)
Uses the normal additive structure of ggplot, except this time the x-aesthetic is
specified in the beginning and then two histograms are added, filtering by “Group”
fill specifies the color, alpha specifies how opaque/translucent the shading is
Exercise 1c)
Output of this code, using fill
colors 3 and 6 and an alpha
of 0.7
Exercise 1c)
We can make the diagram a little fancier by specifying an own aesthetic, which
allows for labeling the colors too

ggplot(mapping = aes(x=Anxiety)) +
geom_histogram(data = filter(df, Group=="Picture"), mapping =
aes(fill ="Picture"), binwidth =5, alpha =0.7) +
geom_histogram(data =filter(df, Group=="Real Spider"), mapping =
aes(fill ="Real Spider"), binwidth =5, alpha =0.7) +
scale_fill_manual(values = c("Picture"=4,"Real Spider"=7)) +
labs(fill ="Group")
Exercise 1c)
scale_fill_manual(values = c("Picture"=4,"Real Spider"=7)) +
labs(fill ="Group")

All the information on color fillings is now contained in this part and a small box
containing a legend on what the colors mean will be added to the histogram

Running this command (with new colors 4 and 7) will yield a diagram that looks like
this:
Exercise 1d)
What are the minimum and maximum anxiety values for each group?

These results can be obtained quickly using the summarise() and group_by()
commands (obviously, using the simple min()/ max() on the subgroups would
work as well)
df %>% group_by(Group) %>% summarise(Min = min(Anxiety), Max =
max(Anxiety))

The output will be a small table

containing all the results needed
Exercise 1d)
Our observations from the histogram are confirmed looking at the minimum and
maximum anxiety

Both the minimums and the maximums are higher for those people who held a
real spider than those who only had a picture of one
Exercise 2
Exercise 2
Want to perform a two-sample T-test

Recall the worksheet on T-tests two weeks ago: Two-sample tests split the data
into two groups based on a certain characteristic and measure a variable of
interest in both of those groups

The difference is examined for statistical significance

Exercise 2
R command (given in the hint):
t.test(Anxiety~Group, alternative = "two.sided", var.equal =
FALSE, data = df)

Tells R to split data based on the Group individuals were part of and test for the
mean value for Anxiety
Exercise 2
Code output after running
the command

Mean values do differ,

however the difference is
not significant at an α=0.05
level (p would have to be
≤0.05)
Exercise 2
Interpretation of the results:

People who held the real spider experienced higher average anxiety values than
those who were only given a photo of a spider
The difference wasn’t large enough to achieve statistical significance though
This is likely also a result of the small sample size → only 24 observations have
been made; at a higher n a difference in means of 7 units might very well have
achieved high statistical significance levels
Exercise 3
Exercise 3
Creating boxplots in R with the tidyverse package → ggplot + geom_boxplot
as the root of the command
Example:
ggplot(data = df) + geom_boxplot(aes(x=Group, y=Anxiety,
color=Group))

ggplot commands can be structured in various ways, can also start out with
df %>% ggplot(…) + …
Exercise 3
Result should look
something like this though,
regardless of structure used
Exercise 4
Exercise 4
First time working with functions in R!
The function() command tells R you want to program your own function
◦ In the first normal brackets, you specify which/how many elements are considered in
the function
◦ To the left of the <- arrow or the equation sign, you enter a name for your function
→ here: ttestfromMeans
◦ In the swoopy brackets {} you specify what exactly the function is supposed to do
with the elements you put in, so that once you have it programmed, you can enter
your own number values and it will run the specified operations on those
Calculation of the t-
Exercise 4 value: Measure (mean
difference) divided by
the standard error
ttestfromMeans <- function(x1, x2, sd1, sd2, n1, n2){
df <- n1 + n2 – 2 n-2 degrees of freedom (n1+n2)-2

poolvar <- (((n1 - 1)sd1^2)+((n2 - 1)sd2^2)) / df

t <- (x1 - x2) / sqrt(poolvar*((1 / n1) + (1 / n2)))
sig <- 2*(1 - (pt(abs(t),df)))
paste("t(df = ", df,") = ",t,", p = ", sig,sep ="")}

Significance level
Cumulative variance of all the data in
the set
Code output to print:
t(df = dfval) = tval, p = sigval
Exercise 4
To be able to run the command and check whether the results are equal to
those from the integrated t.test command, we first need to calculate our values
for x1, x2, sd1, sd2, n1, n2

These can then be saved as variables with any given name or just entered as
number values, the only important thing is that they’re entered in the specified
order
Exercise 4
How to get those values? Once again, several ways → easiest probably the
group_by() %>% summarise() command

For x1, x2 (the group means)

df %>% group_by(Group) %>% summarise(mean(Anxiety)) → 40, 47

For sd1, sd2 (group standard deviations)

df %>% group_by(Group) %>% summarise(sd(Anxiety)) → 9.29, 11.03
Exercise 4
For n1, n2 (number of observations by group)
df %>% group_by(Group) %>% summarise(n()) → 12, 12

Another approach could be splitting the data into two subsets using the filter()
command and then using the simple mean(), n(), sd() for both subsets
group.pic <- filter(df, Group=="Picture")
group.real <- filter(df, Group=="Real Spider")

These results can then be entered into our function ttestfromMeans()

Exercise 4
Running the ttestfromMeans command will yield the following output

These results are equal to what we were given by the t.test command
t ≈ -1.681, p ≈ 0.107 → no statistical significance given α = 0.05

PR2 - Chapter 1-4 Final
No ratings yet
PR2 - Chapter 1-4 Final
20 pages
CaplanAdamsBoyd2020 PersonalityandLanguage
No ratings yet
CaplanAdamsBoyd2020 PersonalityandLanguage
7 pages
GenAI IN HIGHER EDUCATION FALL 2023 UPDATE TIME FO 240205 203837
No ratings yet
GenAI IN HIGHER EDUCATION FALL 2023 UPDATE TIME FO 240205 203837
17 pages
Academic Research Report
100% (2)
Academic Research Report
9 pages
Cbsnews 20230618 Abortion 2
No ratings yet
Cbsnews 20230618 Abortion 2
11 pages
El-Angbawi Et Al-2015-Cochrane Database of Systematic Reviews
No ratings yet
El-Angbawi Et Al-2015-Cochrane Database of Systematic Reviews
30 pages
Gunther Kress & Theo Van Leeuw
No ratings yet
Gunther Kress & Theo Van Leeuw
49 pages
Or QP Mca
No ratings yet
Or QP Mca
30 pages
Gian Brochure Iitm 2017 171003k08
No ratings yet
Gian Brochure Iitm 2017 171003k08
2 pages
Article1437574026 - Hamadneh and Al - Masaeed
No ratings yet
Article1437574026 - Hamadneh and Al - Masaeed
7 pages
Private Sector Engagement Workshop: Group Presentation - Group A
No ratings yet
Private Sector Engagement Workshop: Group Presentation - Group A
10 pages
Asq Control Chart
No ratings yet
Asq Control Chart
5 pages
Evaluation of Alcoholic and Aqueous Extracts of Nicandra Physalodes Leaves For Diuretic Activity
No ratings yet
Evaluation of Alcoholic and Aqueous Extracts of Nicandra Physalodes Leaves For Diuretic Activity
4 pages
KSS Thesis
No ratings yet
KSS Thesis
47 pages
Power and Group Work in Physical Education: A Foucauldian Perspective
No ratings yet
Power and Group Work in Physical Education: A Foucauldian Perspective
15 pages
Tmac PDF
No ratings yet
Tmac PDF
10 pages
English For Academic and Professional Purposes: Quarter 2 - Week 3 Writing Various Kinds of Position Paper
No ratings yet
English For Academic and Professional Purposes: Quarter 2 - Week 3 Writing Various Kinds of Position Paper
14 pages
Kia Ferguson AHS8100 Guided Practicum Wilmington University Spring 2012
No ratings yet
Kia Ferguson AHS8100 Guided Practicum Wilmington University Spring 2012
36 pages
Al Alwani 2014 Evaluation Criterion For Quality Assessment of e Learning Content
No ratings yet
Al Alwani 2014 Evaluation Criterion For Quality Assessment of e Learning Content
11 pages
HR Audit
No ratings yet
HR Audit
3 pages
State of Health Estimation For An EV Battery
No ratings yet
State of Health Estimation For An EV Battery
6 pages
Summary of "Advances in AI and Drone-Based Natural Disaster Management"
No ratings yet
Summary of "Advances in AI and Drone-Based Natural Disaster Management"
2 pages
L2 Motivational Self System and L2 Achievement A S
No ratings yet
L2 Motivational Self System and L2 Achievement A S
11 pages
ACGIHTLVforHandActivityLevel HAL
No ratings yet
ACGIHTLVforHandActivityLevel HAL
6 pages
Assessment of The M&E System For iCCM in Ethiopia - Dereje Et Al. (Ethiopian Medical Journal, 2014)
No ratings yet
Assessment of The M&E System For iCCM in Ethiopia - Dereje Et Al. (Ethiopian Medical Journal, 2014)
10 pages
Academic Writing
No ratings yet
Academic Writing
16 pages
Subtasking-of-MELCs-SHS - STATISTICS AND PROBABILITY
No ratings yet
Subtasking-of-MELCs-SHS - STATISTICS AND PROBABILITY
17 pages
Biostat
100% (1)
Biostat
66 pages
Wright Research - Smallcases
No ratings yet
Wright Research - Smallcases
4 pages
Factors Influencing The Service Lifespan of Buildings An Improved Hedonic Model
No ratings yet
Factors Influencing The Service Lifespan of Buildings An Improved Hedonic Model
9 pages