0% found this document useful (0 votes)
4 views4 pages

Lab5 R

Hhhh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

Lab5 R

Hhhh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Lab 5: recoding, professional tables, and the normal distribution (Alabama edition)

*note: you are using a different dataset on this lab

To keep you on your toes (more so honestly because I struggled to find a normally distributed variable in
the GSS), I have decided to switch datasets for this lab. These data come from the 2020 Behavioral Risk
Factor Surveillance System (BRFSS), described as “the nation’s premier system of health-related
telephone surveys that collect state data about U.S. residents regarding their health-related risk
behaviors, chronic health conditions, and use of preventive services.”

I first imported the SAS file (available online) into SPSS. Then, to reduce the size of the set and make it
more applicable to us, I limited the dataset to include only respondents from the fine state of Alabama.
Ask me how, if you have any interest in limiting datasets like this. This would be the syntax code—and
the codebook would tell us that the variable @_STATE= 1 = Alabama.

DATASET NAME DataSet2 WINDOW=FRONT.


FILTER OFF.
USE ALL.
SELECT IF (@_STATE = 1).
EXECUTE.

1. Run a frequency distribution on the variable weight2. When you run it, the minimum weight is
______ and the maximum value is _______ (don’t worry if it is missing or not).

If you were to look at the codebook, you would see this:


Based on this, and based on the frequency distribution that you ran, what values do you think make the
most sense to designate as ‘missing’ (hint: I would classify three of them as missing).

Recode those values as missing data with the “recode into different variable” command—and call this
new variable weight_r.

2. Run a histogram on this new variable (with normal curve) you created, weight_r. Copy and paste
the graph below.

3. With an understanding of the properties of the normal curve and the assumptions you can make
if you know the mean and standard deviation of a distribution, you could make a claim that
roughly 68% of the distribution falls between ______ standard deviation above and below the
mean. In the case of weight of Alabama adults, this means that 68% of the adults would fall
between ________ and _________.

4. You also remember that you can use a more objective measure (in other words, a statistic)
instead of the visual representation (the histogram) to determine whether or not a distribution
is skewed. Run the statistic on skewness for weight_r. What is this value? _______. Based on
this value, would you say the distribution is normally distributed or not?
You would like to present the data on weight to a paying stakeholder, but realize it makes more sense to
present it with fewer categories because it will be easier with fewer categories. So, recode weight_r into
weight_cat, with the following categories:

 1= lowest weight to 150 pounds


 2= 151 pounds to 250 pounds
 3= 251 pounds to 350 pounds
 4= 351 to highest weight

5. Use that new variable, weight_cat, to fill in the table below:

Table 1: Frequency distribution of respondent weight; 2020 BRFSS (Alabama


sample)
Frequency Percent
77 to 150 pounds
151 to 250 pounds
251 to 350 pounds
351 to highest weight
Total

The housing commission of Alabama has asked you about the number of adults in Alabama households.
You quickly identify the variable ‘hhadult_r’ (I already made the 99s missing), which asks respondents
“How many members of your household, including yourself, are 18 years of age or older?”

6. Run a frequency on hhadult. What is the level of measurement?


7. Run both a histogram AND the skewness statistics. Is hhadult_r normally distributed, right
skewed, or left skewed?
8. Lastly, how many adults, on average, occupy Alabama homes?

You might also like