Lab 3B Confound It All
Lab 3B Confound It All
Directions: Record your responses to the lab questions in the spaces provided.
IF A QUESTION ASKS YOU TO MAKE A CODE OR GRAPH, YOU MUST INCLUDE THE CODE &
GRAPH!
Finding data in new places
Previous….
Since your first forays into doing data science, you've used data from two-sources:
● Built-in datasets from RStudio.
● Campaign data from IDS Campaign Manager.
In this lab…
Data can be found in many other places though, especially online.
You can find the data online here: (Highlight and copy the URL. Will be used later)
https://raw.githubusercontent.com/IDSUCLA/dataset/main/fev.csv
Rather than exporting the data and then uploading and importing-ing it, we'll pull the data
straight from the webpage into R.
Now that we've got the data loaded, we need to clean it to get it ready for use. (You can
use the codes below to help you or you can refer to the IDS Notebook for examples.
I recommend creating a script to organize your codes, just in case you make an error
and need to start over.)
Specifically:
1. We want to name the variables: "age", "lung_cap", "height", "gender","smoker", in that
order. (Adjust/continue the code below to rename the variables)
lungs <- rename(lungs, age = X1, lung_cap = X2, height = X3, gender = X4, smoker = X5)
2. Change the type of variable for gender and smoker from numeric to character.
(Adjust the code below.)
For gender, use recode to change "1" to "Male" and "0" to "Female".
For smoker, use recode to change "1" to "Yes" and "0" to "No".
Write down a reason the researchers couldn't use an experiment to test the effects of
smoking on children's lungs.
The reason that the researchers couldn’t use an experiment is because it is unethical and
unsafe to force people to start smoking because of smoking’s long term consequences.
Observational studies are often helpful for analyzing how variables are related:
1. Do you think that a person's age affects their lung capacity? Explain.
2. Use the lungs data to create a xyplot of age and lung_cap. (Include it below)
3. Interpret the plot and describe why the relationship between the two variables makes
sense.
Apparently, I was incorrect. After some research I realized that younger people have smaller
bodies, thus they will have smaller lungs. Thus, younger children will have a smaller lung
capacity than older people.
Smoking and lung capacity
1. Make a boxplot that can be used to answer the statistical question (Plot it below):
“Do people who smoke tend to have lower lung capacity than those who do not smoke?”
According to the results, it seems that people who smoke have a larger lung
capacity.
4. Can you suggest a possible confounding factor that might be affecting the result?
I think that because these people have been smoking for so long, their bodies have
adjusted to this by increasing the lung capacity of given people.
Let’s compare
1. Create three subsets of the data using the following code:
newFileName <- filter(originalFileName, newRuleOrRules)
2. Make a boxplot that compares the lung capacity of smokers and non-smokers for each
subset.
Group Plot
3. How does the relationship between smoking and lung capacity change as we increase
the age from 13 to 15 to 17?
Let’s compare
Does smoking affect lung capacity? If so, how? Support your answers with the
previously created plots.
Smoking does affect lung capacity. If someone smokes, their lung capacity decreases. As
shown in the lungs13, lung15, and lung17 data, as the age increases (meaning more time
smoked) the lung capacity of that person decreases due to the chemicals in his or her lungs. We
can see that it decreases because the median value, the Q1 value, and the Q3 value all
decrease with an increase in age.