0% found this document useful (0 votes)
99 views7 pages

Lab 3B Confound It All

This lab focuses on analyzing lung capacity data obtained from an online observational study. Students will learn to import, clean, and analyze the dataset, exploring relationships between variables such as age, smoking status, and lung capacity. The lab emphasizes the ethical considerations of observational studies and encourages students to visualize data through plots and boxplots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views7 pages

Lab 3B Confound It All

This lab focuses on analyzing lung capacity data obtained from an online observational study. Students will learn to import, clean, and analyze the dataset, exploring relationships between variables such as age, smoking status, and lung capacity. The lab emphasizes the ethical considerations of observational studies and encourages students to visualize data through plots and boxplots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab 3B - Confound it all!

Directions: Record your responses to the lab questions in the spaces provided.
IF A QUESTION ASKS YOU TO MAKE A CODE OR GRAPH, YOU MUST INCLUDE THE CODE &
GRAPH!
Finding data in new places

Previous….
Since your first forays into doing data science, you've used data from two-sources:
●​ Built-in datasets from RStudio.
●​ Campaign data from IDS Campaign Manager.

In this lab…
Data can be found in many other places though, especially online.

In this lab, we'll read an observational study dataset from a website.


●​ We'll use this data to then explore what factors are associated with a person's lung
capacity.

Our new data

You can find the data online here: (Highlight and copy the URL. Will be used later)​
https://raw.githubusercontent.com/IDSUCLA/dataset/main/fev.csv

Variables that were measured include:


●​ Age in years.
●​ Lung capacity, measured in liters.
●​ The youth's heights, in inches
●​ Genders; "1" for males, "0" for females.
●​ Whether the participant was a smoker, "1", or non-smoker "0".
Importing our data

Rather than exporting the data and then uploading and importing-ing it, we'll pull the data
straight from the webpage into R.

To use our new dataset follow the instructions below:


1.​ Click on the Import Dataset button under the Environment tab.
2.​ Then click on the From Text (readr) option.
3.​ Type or copy/paste the URL into the box and then hit Update.

BEFORE IMPORTING, CHANGE THE FOLLOWING IMPORT OPTIONS


1.​ Change the name to: lungs
2.​ Uncheck the “First Row as Names” box
3.​ Change Delimiter to “Whitespace”
4.​ Click “Import”

Cleaning your data

Now that we've got the data loaded, we need to clean it to get it ready for use. (You can
use the codes below to help you or you can refer to the IDS Notebook for examples.
I recommend creating a script to organize your codes, just in case you make an error
and need to start over.)

Specifically:​

1.​ We want to name the variables: "age", "lung_cap", "height", "gender","smoker", in that
order. (Adjust/continue the code below to rename the variables)

lungs <- rename(lungs, age = X1, lung_cap = X2, height = X3, gender = X4, smoker = X5)

2.​ Change the type of variable for gender and smoker from numeric to character.
(Adjust the code below.)

Code to fix Gender variable:


lungs <- mutate(lungs, gender = as.character(gender))
Code to fix Smoker variable:
lungs <- mutate(lungs, smoker = as.character(smoker))
3.​ After changing the variable types for gender and smoker. (Adjust the codes below.)

For gender, use recode to change "1" to "Male" and "0" to "Female".

lungs <- mutate(lungs, gender = recode(gender, "1" = "Male" , "0" = "Female"))

For smoker, use recode to change "1" to "Yes" and "0" to "No".

lungs <- mutate(lungs, smoker = recode(smoker, "1" = "Yes" , "0" = "No"))

Analyzing our data

Our lungs data is from an observational study.

Write down a reason the researchers couldn't use an experiment to test the effects of
smoking on children's lungs.

The reason that the researchers couldn’t use an experiment is because it is unethical and
unsafe to force people to start smoking because of smoking’s long term consequences.

Observational studies are often helpful for analyzing how variables are related:

1.​ Do you think that a person's age affects their lung capacity? Explain.

Yes, naturally lung capacity declines with an increase in age.

2.​ Use the lungs data to create a xyplot of age and lung_cap. (Include it below)

xyplot(lung_cap ~ age, data = lungs)

3.​ Interpret the plot and describe why the relationship between the two variables makes
sense.

Apparently, I was incorrect. After some research I realized that younger people have smaller
bodies, thus they will have smaller lungs. Thus, younger children will have a smaller lung
capacity than older people.
Smoking and lung capacity

1.​ Make a boxplot that can be used to answer the statistical question (Plot it below):

“Do people who smoke tend to have lower lung capacity than those who do not smoke?”

​ bwplot(~lung_cap | smoker, data = lungs)

2.​ Use your boxplot to answer the question.


“Do people who smoke tend to have lower lung capacity than those who do not smoke?”

​ According to the results, it seems that people who smoke have a larger lung
capacity.

3.​ Were you surprised by the answer? Why?



​ Yes, I was surprised by this answer. I would think that people who smoke have worse
lungs than people who smoke because of the pollution and chemicals left in the lungs. However,
this data says otherwise.

4.​ Can you suggest a possible confounding factor that might be affecting the result?

​ I think that because these people have been smoking for so long, their bodies have
adjusted to this by increasing the lung capacity of given people.
Let’s compare

1.​ Create three subsets of the data using the following code:
newFileName <- filter(originalFileName, newRuleOrRules)

a.​ One that includes only 13 year olds ...


lungs13 <- filter( lungs, age == 13)

b.​ One that includes only 15 year olds ...


lungs15 <- filter( lungs, age == 15)

c.​ One that includes only 17 year olds.


lungs17 <- filter( lungs, age == 17)

2. Make a boxplot that compares the lung capacity of smokers and non-smokers for each
subset.

Group Plot

One that includes only


13-year-olds
One that includes only
15-year-olds

One that includes only


17-year-olds

3. How does the relationship between smoking and lung capacity change as we increase
the age from 13 to 15 to 17?

As age increased, lung capacity decreased.

Let’s compare

Does smoking affect lung capacity? If so, how? Support your answers with the
previously created plots.
Smoking does affect lung capacity. If someone smokes, their lung capacity decreases. As
shown in the lungs13, lung15, and lung17 data, as the age increases (meaning more time
smoked) the lung capacity of that person decreases due to the chemicals in his or her lungs. We
can see that it decreases because the median value, the Q1 value, and the Q3 value all
decrease with an increase in age.

You might also like