Unit 531 Describing and Assessing The Linear Relationship Between Two Scale Variables Without Answers
Unit 531 Describing and Assessing The Linear Relationship Between Two Scale Variables Without Answers
Describing and assessing the linear relationship between income and BMI
Henk van der Kolk
03/08/2022
• ch19l022 (To what extent did your physical health or emotional problems hinder
your work over the past month, for instance in your job, the housekeeping, taking
care of the children, doing volunteer work, or in school?)
1. Is it a good choice to take the first as independent and the second as dependent
variable?
**
1
2. Draw the theoretical model (a graph) for this ‘conventional wisdom’ prediction (pen
and paper). Use boxes and arrows. Include a positive or a negative sign to indicate
the sign of the effect.
4. Do you think the residuals should be part of the equation? Why (not)?
**
Studying the relationship between BMI and income: data inspection and
cleaning
5. Since we will use the income variable (nettoink), you inspect that variable. Create a
histogram of the variable nettoink.
You will notice there is at least one person with a net income of at least 1.5 million a month.
This is either a rich person OR a coding error :-). Normally do NOT simply remove outliers.
If this is the sample, that is what it is. However, since we do not have time to to extensively
check this data point, we focus on people with a somewhat more reasonable income.
6. Filter extreme outliers out (take 15.000 a month as a cut-off point).
7. After filtering the data, make a scatter plot of the relationship between income and
BMI (the datafile also filtered out outliers with extreme values). Think about which
variable you put on the x-axis and which on the y-axis. Include a regression line
through the cloud of datapoints by adding geom_smooth(method = "lm", se =
FALSE).
**
# or use
summary(model)
10. What is the intercept? What does this number tell you?
2
The intercept is: **
11. What is the slope?
The slope is: **
12. What is the sign of the slope? Is that what you expected (theoretically)?
The slope is: **
13. The effect of the income variable (the slope) seems extremely small (it says -04,
meaning you have to move the comma/dot four places to the left, meaning it is VERY
small). Why is it so small?
**
NOTE: Reading the scientific notation (with the “-04” after a number) may be
difficult. The following commands will often simplify things, but make sure you
are able to interpret scientific notation! Check the internet to find out how
scientific notation works.
model %>%
tidy() %>%
mutate_if(is.numeric, round, 5) # if a number is numeric, simplify
and round to 5 decimals
16. Does the 95 percent confidence interval in this case include zero? What does this
mean?
3
**
A second, similar way to approach this is by using a ‘testing’ approach (not the confidence
interval approach).
17. The effect (the slope) itself does not reveal much. That number depends on the scale
(are we measuring in Euro’s or Dollars or kEuro’s?). We need to ‘standardize’ that
effect, so we can check whether it is very ‘different’ from zero. How do we do that?
In other words, how are the estimate, the standard error and the t-value (here called
‘the statistic’) related?
**
18. What is the t-value in this case?
**
<< END OF THE ASSIGNMENT>>