Week 4 Materials:
Lecture
Beginner R for
Public Health
Practitioners and
Researchers in
Liberia
R Coding and Biostatistics Short Course Series
Laura Skrip, PhD, MPH
Today’s Plan
Content Topics Time (Minutes) Skills
Practice Activity (30 minutes)
Defining new 60 • Making space for new objects
variables (cont.) • Using the which function to subset and redefine
existing variables
• Creating embedded for and if loops to create loops
based on conditional statements
Linking datasets 5 • Linking two datasets based on a common variable
using the cbind() and merge() functions
Introducing the 20 • Installing new packages
ggplot2 package
Preparing for Course Conclusion (5 minutes)
Beginner R Course - Developed by Laura Skrip
Activity: Cleaning data with the which
function
• Read in the diabetes.csv dataset (Source:
https://www.kaggle.com/uciml/pima-indians-diabetes-database).
• Use the which function to reformat the BMI variable into a
categorical variable based on the following classifications from the
US CDC:
BMI Classification
< 18.5 Underweight
18.5 to <25 Healthy
25 to <30 Overweight
30+ Obesity
3 Beginner R Course - Developed by Laura Skrip
Activity: Cleaning data with the which
function
Hints:
• Create ‘space’ for a new variable.
• Identify the row numbers which satisfy each category.
• Assignthe classification (i.e., healthy, overweight, etc.) to
the appropriate row numbers of the new variable.
4 Beginner R Course - Developed by Laura Skrip
Defining new
variables
(cont.)
Creating variables that allow us to investigate
our hypotheses
Beginner R Course - Developed by Laura Skrip
“If loops” to create variables
• An “if loop” in R allows you to apply an action when a condition is met.
• Example 1: Suppose you had a dataset with an Age variable. Age is often collected
as a numeric variable, but sometimes it is helpful to categorize age. You may want
to create a variable that reflects age groups known to be at differential risk of a
disease or condition.
Action: if age is less than 45, use the category ‘<45’ in the Age_Group variable and if
age is 45 or above, use the category ‘≥45’.
• Example 2: Suppose you are interested in creating a variable that indicates
whether females in the dataset are of childbearing age, as a way to calculate
population at risk of a particular condition affecting this subgroup.
• Action: if age is between 15 and 44 and sex is female, use the category ‘Yes’ in a
variable ChildbearingAge and if age is less < 15 or >44 and sex is female or male, use
the the category ‘No’.
6 Beginner R Course - Developed by Laura Skrip
“If Loop” syntax in R
‘If loops’ with assignment of answer:
Results_Vector_Two <- rep(NA,10)
for (i in 1:10) {
if (i > 5) {
Results_Vector_Two[i] <- i**2
}
}
• Note that we often embed an ‘if loop’ within a ‘for loop’
• What do we expect to get for indices > 5 in Results_Vector_Two? What
about for indices ≤ 5?
7 Beginner R Course - Developed by Laura Skrip
“If Loop” syntax in R
‘If loops’ with assignment of answer:
Results_Vector_Two <- rep(NA,10)
for (i in 1:10) { A conditional
if (i > 5) { statement that must
Results_Vector_Two[i] <- i**2
be met for the action
to occur. The result
} of this must be
} TRUE or FALSE.
• Note that we often embed an ‘if loop’ within a ‘for loop’
• What do we expect to get for indices > 5 in Results_Vector_Two? What
about for indices ≤ 5?
8 Beginner R Course - Developed by Laura Skrip
“If Loop” syntax in R
Results_Vector_Two <- rep(NA,10) A conditional
for (i in 1:10) { statement that must
be met for the action
if (Results_Vector[i] == 8) { to occur. The result
Results_Vector_Two[i] <- i**2 of this must be
TRUE or FALSE.
}
else if (Results_Vector[i] != 8) {
Results_Vector_Two[i] <- Results_Vector[i]
}
}
9 Beginner R Course - Developed by Laura Skrip
Activity: Defining new variables with an
if loop
• Create a new variable that categorizes risk level, as
defined (arbitrarily and for purposes of demonstration):
Risk level is high if person ≥ 55 years old or person has
diabetes.
Risk level is low if person is < 55 years and no diabetes.
Risk level is moderate if person < 55 years and has diabetes.
• You may need to look up syntax for ‘and’ and ‘or’, specifically when
trying to generate a logical answer of TRUE or FALSE.
10 Beginner R Course - Developed by Laura Skrip
Linking
datasets
Allowing for creation of a single data frame
when two datasets have a common, identifying
variable
Beginner R Course - Developed by Laura Skrip
Merging two datasets requires a
common, identifying variable
• In the VAERS datasets, we have the VAERS_ID variable.
• Let’s merge the data from the 2021VAERSDATA.csv and 2021VAERSVAX.csv
spreadsheets into a single data frame.
• What search terms could we use to identify the function for accomplishing this task?
12 Beginner R Course - Developed by Laura Skrip
Merging two datasets requires a
common, identifying variable
• In the VAERS datasets, we have the VAERS_ID variable.
• Let’s merge the data from the 2021VAERSDATA.csv and 2021VAERSVAX.csv
spreadsheets into a single data frame.
• What search terms could we use to identify the function for accomplishing this task?
merge(data1, data2, by = “ID1”)
13 Beginner R Course - Developed by Laura Skrip
Visualizing
data
Introducing the ggplot2 package
Beginner R Course - Developed by Laura Skrip
R-Graph Gallery
provides many
graph types with
sample code that
can be modified
according to your
own dataset.
https://www.r-
graph-
gallery.com/ggplot2-
package.html
15 Beginner R Course - Developed by Laura Skrip
Example: Multiple group histogram
• https://www.r-graph-gallery.com/histogram_several_group.html
16 Beginner R Course - Developed by Laura Skrip
Final Steps
for Beginner R
Demonstrating what you have learned and
where we are going
Beginner R Course - Developed by Laura Skrip
Outstanding Course Components
• Short post-test to assess what you have learned and how we did
• Data visualization practice!
• Assignment of final project/exercise
• Scheduling…
18 Beginner R Course - Developed by Laura Skrip
See you in lab
next time!
Beginner R Course - Developed by Laura Skrip