0% found this document useful (0 votes)
18 views19 pages

Beginner R - Q4D - ULSOPH - Lecture 4

The document summarizes the topics and activities covered in Week 4 of a Beginner R course for public health practitioners and researchers in Liberia. The plan includes practicing data cleaning using the which function (30 minutes), defining new variables using for and if loops (60 minutes), linking datasets using cbind() and merge() (5 minutes), and introducing the ggplot2 package for data visualization (20 minutes). Example activities are provided for practicing these skills, including recoding a BMI variable and creating a new risk level variable. The document concludes by discussing outstanding course components like a post-test, data visualization practice, assigning a final project, and scheduling.

Uploaded by

Jeremiah Wleh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

Beginner R - Q4D - ULSOPH - Lecture 4

The document summarizes the topics and activities covered in Week 4 of a Beginner R course for public health practitioners and researchers in Liberia. The plan includes practicing data cleaning using the which function (30 minutes), defining new variables using for and if loops (60 minutes), linking datasets using cbind() and merge() (5 minutes), and introducing the ggplot2 package for data visualization (20 minutes). Example activities are provided for practicing these skills, including recoding a BMI variable and creating a new risk level variable. The document concludes by discussing outstanding course components like a post-test, data visualization practice, assigning a final project, and scheduling.

Uploaded by

Jeremiah Wleh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Week 4 Materials:

Lecture

Beginner R for
Public Health
Practitioners and
Researchers in
Liberia
R Coding and Biostatistics Short Course Series
Laura Skrip, PhD, MPH
Today’s Plan
Content Topics Time (Minutes) Skills
Practice Activity (30 minutes)
Defining new 60 • Making space for new objects
variables (cont.) • Using the which function to subset and redefine
existing variables
• Creating embedded for and if loops to create loops
based on conditional statements
Linking datasets 5 • Linking two datasets based on a common variable
using the cbind() and merge() functions
Introducing the 20 • Installing new packages
ggplot2 package
Preparing for Course Conclusion (5 minutes)

Beginner R Course - Developed by Laura Skrip


Activity: Cleaning data with the which
function
• Read in the diabetes.csv dataset (Source:
https://www.kaggle.com/uciml/pima-indians-diabetes-database).
• Use the which function to reformat the BMI variable into a
categorical variable based on the following classifications from the
US CDC:

BMI Classification
< 18.5 Underweight
18.5 to <25 Healthy
25 to <30 Overweight
30+ Obesity
3 Beginner R Course - Developed by Laura Skrip
Activity: Cleaning data with the which
function
Hints:
• Create ‘space’ for a new variable.

• Identify the row numbers which satisfy each category.


• Assignthe classification (i.e., healthy, overweight, etc.) to
the appropriate row numbers of the new variable.

4 Beginner R Course - Developed by Laura Skrip


Defining new
variables
(cont.)
Creating variables that allow us to investigate
our hypotheses

Beginner R Course - Developed by Laura Skrip


“If loops” to create variables
• An “if loop” in R allows you to apply an action when a condition is met.

• Example 1: Suppose you had a dataset with an Age variable. Age is often collected
as a numeric variable, but sometimes it is helpful to categorize age. You may want
to create a variable that reflects age groups known to be at differential risk of a
disease or condition.
– Action: if age is less than 45, use the category ‘<45’ in the Age_Group variable and if
age is 45 or above, use the category ‘≥45’.

• Example 2: Suppose you are interested in creating a variable that indicates


whether females in the dataset are of childbearing age, as a way to calculate
population at risk of a particular condition affecting this subgroup.
• Action: if age is between 15 and 44 and sex is female, use the category ‘Yes’ in a
variable ChildbearingAge and if age is less < 15 or >44 and sex is female or male, use
the the category ‘No’.

6 Beginner R Course - Developed by Laura Skrip


“If Loop” syntax in R
‘If loops’ with assignment of answer:
Results_Vector_Two <- rep(NA,10)
for (i in 1:10) {
if (i > 5) {
Results_Vector_Two[i] <- i**2
}
}

• Note that we often embed an ‘if loop’ within a ‘for loop’


• What do we expect to get for indices > 5 in Results_Vector_Two? What
about for indices ≤ 5?
7 Beginner R Course - Developed by Laura Skrip
“If Loop” syntax in R
‘If loops’ with assignment of answer:
Results_Vector_Two <- rep(NA,10)
for (i in 1:10) { A conditional
if (i > 5) { statement that must
Results_Vector_Two[i] <- i**2
be met for the action
to occur. The result
} of this must be
} TRUE or FALSE.

• Note that we often embed an ‘if loop’ within a ‘for loop’


• What do we expect to get for indices > 5 in Results_Vector_Two? What
about for indices ≤ 5?
8 Beginner R Course - Developed by Laura Skrip
“If Loop” syntax in R
Results_Vector_Two <- rep(NA,10) A conditional
for (i in 1:10) { statement that must
be met for the action
if (Results_Vector[i] == 8) { to occur. The result
Results_Vector_Two[i] <- i**2 of this must be
TRUE or FALSE.
}
else if (Results_Vector[i] != 8) {
Results_Vector_Two[i] <- Results_Vector[i]
}
}
9 Beginner R Course - Developed by Laura Skrip
Activity: Defining new variables with an
if loop
• Create a new variable that categorizes risk level, as
defined (arbitrarily and for purposes of demonstration):
– Risk level is high if person ≥ 55 years old or person has
diabetes.
– Risk level is low if person is < 55 years and no diabetes.
– Risk level is moderate if person < 55 years and has diabetes.
• You may need to look up syntax for ‘and’ and ‘or’, specifically when
trying to generate a logical answer of TRUE or FALSE.

10 Beginner R Course - Developed by Laura Skrip


Linking
datasets
Allowing for creation of a single data frame
when two datasets have a common, identifying
variable

Beginner R Course - Developed by Laura Skrip


Merging two datasets requires a
common, identifying variable
• In the VAERS datasets, we have the VAERS_ID variable.

• Let’s merge the data from the 2021VAERSDATA.csv and 2021VAERSVAX.csv


spreadsheets into a single data frame.

• What search terms could we use to identify the function for accomplishing this task?

12 Beginner R Course - Developed by Laura Skrip


Merging two datasets requires a
common, identifying variable
• In the VAERS datasets, we have the VAERS_ID variable.

• Let’s merge the data from the 2021VAERSDATA.csv and 2021VAERSVAX.csv


spreadsheets into a single data frame.

• What search terms could we use to identify the function for accomplishing this task?

merge(data1, data2, by = “ID1”)

13 Beginner R Course - Developed by Laura Skrip


Visualizing
data
Introducing the ggplot2 package

Beginner R Course - Developed by Laura Skrip


R-Graph Gallery
provides many
graph types with
sample code that
can be modified
according to your
own dataset.
https://www.r-
graph-
gallery.com/ggplot2-
package.html

15 Beginner R Course - Developed by Laura Skrip


Example: Multiple group histogram

• https://www.r-graph-gallery.com/histogram_several_group.html
16 Beginner R Course - Developed by Laura Skrip
Final Steps
for Beginner R
Demonstrating what you have learned and
where we are going

Beginner R Course - Developed by Laura Skrip


Outstanding Course Components
• Short post-test to assess what you have learned and how we did

• Data visualization practice!

• Assignment of final project/exercise

• Scheduling…

18 Beginner R Course - Developed by Laura Skrip


See you in lab
next time!

Beginner R Course - Developed by Laura Skrip

You might also like