0% found this document useful (0 votes)
12 views4 pages

Unit 531 Describing and Assessing The Linear Relationship Between Two Scale Variables Without Answers

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views4 pages

Unit 531 Describing and Assessing The Linear Relationship Between Two Scale Variables Without Answers

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment for unit 531

Describing and assessing the linear relationship between income and BMI
Henk van der Kolk

03/08/2022

Goals of this assignment and preparing the data


In this assignment, you will (again) be running a simple linear regression analysis. These
data you will use are identical to the data used in unit 530. We use the smaller version of
the dataset, which includes the variable BMI. The data can be seen as from a random
sample from the Dutch population. The dataset we will use is called
Health_LISS_Core_Study_Wave_12_2020_data_plus_background_small.sav. The pdf with the
codebook has a similar name. Download the datafile and put the data in your working
directory. (Install and) load the packages tidyverse (for handling the data and making
plots), haven (for importing SPSS files), and broom (for inspecting regression output in a
nice way using the ‘tidy’ function).
With “simple linear regression models” we describe the relationship between one
dependent scale variable (we will sometimes treat variables with 5 values as ratio/scale
variables) and one independent scale variable. You first shortly focus on variables:
• ch19l021 (To what extent did your physical health or emotional problems hinder
your social activities over the past month?) and

• ch19l022 (To what extent did your physical health or emotional problems hinder
your work over the past month, for instance in your job, the housekeeping, taking
care of the children, doing volunteer work, or in school?)

1. Is it a good choice to take the first as independent and the second as dependent
variable?
**

Studying the relationship between BMI and income: expectations


After inspection of the data you become interested in the relationship between BMI and
income (we will use the variable nettoink in the data file). Following conventional wisdom,
we expect income to have a negative effect on BMI (poor people more often have a high
BMI). Normally we would extensively theorize about this. This is the ‘story version’ of a
theory,. In the story you also discuss why this relationship is expected. For now ‘keep it
simple’.

1
2. Draw the theoretical model (a graph) for this ‘conventional wisdom’ prediction (pen
and paper). Use boxes and arrows. Include a positive or a negative sign to indicate
the sign of the effect.

3. Also give the linear equation for the expectation.

4. Do you think the residuals should be part of the equation? Why (not)?

**

Studying the relationship between BMI and income: data inspection and
cleaning
5. Since we will use the income variable (nettoink), you inspect that variable. Create a
histogram of the variable nettoink.
You will notice there is at least one person with a net income of at least 1.5 million a month.
This is either a rich person OR a coding error :-). Normally do NOT simply remove outliers.
If this is the sample, that is what it is. However, since we do not have time to to extensively
check this data point, we focus on people with a somewhat more reasonable income.
6. Filter extreme outliers out (take 15.000 a month as a cut-off point).

7. After filtering the data, make a scatter plot of the relationship between income and
BMI (the datafile also filtered out outliers with extreme values). Think about which
variable you put on the x-axis and which on the y-axis. Include a regression line
through the cloud of datapoints by adding geom_smooth(method = "lm", se =
FALSE).

8. Based on this graph, do you think the relationship is as hypothesized?

**

Studying the relationship between BMI and income: data analysis


Run a simple linear regression using R: change the following syntax and add the
independent and dependent variables from the dataset. Store the output under the name
‘model’.
model <- name_of_the_dataset %>%
lm(dependentvariable ~ independentvariable, data = .)

9. Inspect the output using one of the two following commands:


# this requires the broom package
model %>%
tidy()

# or use
summary(model)

10. What is the intercept? What does this number tell you?

2
The intercept is: **
11. What is the slope?
The slope is: **
12. What is the sign of the slope? Is that what you expected (theoretically)?
The slope is: **
13. The effect of the income variable (the slope) seems extremely small (it says -04,
meaning you have to move the comma/dot four places to the left, meaning it is VERY
small). Why is it so small?
**
NOTE: Reading the scientific notation (with the “-04” after a number) may be
difficult. The following commands will often simplify things, but make sure you
are able to interpret scientific notation! Check the internet to find out how
scientific notation works.
model %>%
tidy() %>%
mutate_if(is.numeric, round, 5) # if a number is numeric, simplify
and round to 5 decimals

Assessing the relationship: inference


Let us check whether we can say something about the population. We can use the
‘confidence interval’ approach or the ‘test’ approach. These are just different ways of doing
basically the same thing.
14. What is the meaning of the “std.error” in the output?
**
We now first focus on the confidence interval.
15.The confidence interval of the slope and the intercept can be calculated “by hand” (and a
calculator), using the output (the standard error) presented above. Using the output
presented above, what is the CI?
**
Check your answer, using the following command lines:
confint(model, 'nettoink', level=0.95) %>%
as.data.frame() %>%
mutate_if(is.numeric, round, 6)

16. Does the 95 percent confidence interval in this case include zero? What does this
mean?

3
**
A second, similar way to approach this is by using a ‘testing’ approach (not the confidence
interval approach).
17. The effect (the slope) itself does not reveal much. That number depends on the scale
(are we measuring in Euro’s or Dollars or kEuro’s?). We need to ‘standardize’ that
effect, so we can check whether it is very ‘different’ from zero. How do we do that?
In other words, how are the estimate, the standard error and the t-value (here called
‘the statistic’) related?
**
18. What is the t-value in this case?

19. What is the p-value? And what does it mean?

**
<< END OF THE ASSIGNMENT>>

You might also like