Lab5 R
Lab5 R
To keep you on your toes (more so honestly because I struggled to find a normally distributed variable in
the GSS), I have decided to switch datasets for this lab. These data come from the 2020 Behavioral Risk
Factor Surveillance System (BRFSS), described as “the nation’s premier system of health-related
telephone surveys that collect state data about U.S. residents regarding their health-related risk
behaviors, chronic health conditions, and use of preventive services.”
I first imported the SAS file (available online) into SPSS. Then, to reduce the size of the set and make it
more applicable to us, I limited the dataset to include only respondents from the fine state of Alabama.
Ask me how, if you have any interest in limiting datasets like this. This would be the syntax code—and
the codebook would tell us that the variable @_STATE= 1 = Alabama.
1. Run a frequency distribution on the variable weight2. When you run it, the minimum weight is
______ and the maximum value is _______ (don’t worry if it is missing or not).
Recode those values as missing data with the “recode into different variable” command—and call this
new variable weight_r.
2. Run a histogram on this new variable (with normal curve) you created, weight_r. Copy and paste
the graph below.
3. With an understanding of the properties of the normal curve and the assumptions you can make
if you know the mean and standard deviation of a distribution, you could make a claim that
roughly 68% of the distribution falls between ______ standard deviation above and below the
mean. In the case of weight of Alabama adults, this means that 68% of the adults would fall
between ________ and _________.
4. You also remember that you can use a more objective measure (in other words, a statistic)
instead of the visual representation (the histogram) to determine whether or not a distribution
is skewed. Run the statistic on skewness for weight_r. What is this value? _______. Based on
this value, would you say the distribution is normally distributed or not?
You would like to present the data on weight to a paying stakeholder, but realize it makes more sense to
present it with fewer categories because it will be easier with fewer categories. So, recode weight_r into
weight_cat, with the following categories:
The housing commission of Alabama has asked you about the number of adults in Alabama households.
You quickly identify the variable ‘hhadult_r’ (I already made the 99s missing), which asks respondents
“How many members of your household, including yourself, are 18 years of age or older?”