SOCS0075 Lecture5
SOCS0075 Lecture5
Tobias Rüttenauer
Social Research Institute
UCL Institute of Education
January 8, 2024
1 / 23
Happy new year!
Any questions? Anything to discuss?
2 / 23
Topics today
Today:
Set-up
Reality checks
Missing values
Results
3 / 23
Reproducibility
4 / 23
Project set-up
5 / 23
Reality checks
6 / 23
Dubious Values and Incomplete Data
7 / 23
Dubious Values and Incomplete Data
8 / 23
Consistency in Conceptualization and Measurement
9 / 23
Correct classification of your variables
10 / 23
Correct classification of your variables
11 / 23
Correct classification of your variables
12 / 23
Consistency in Conceptualization and Measurement
13 / 23
Consistency with common sense
14 / 23
External reality checks - a voting example
15 / 23
External reality checks
First estimate: 10,000 lost votes for Bush within the last ten minutes!
Is this plausible?
There are overall 300,000 panhandle voters.
1/12 of votes usually happen in last hour. Then , approx.
1/6 ∗ 1/12 = 1/72 usually vote within the last 10 minutes.
1/72 ∗ 300, 000 = 4, 200 overall votes within the relevant period.
Brady, H. E. (2010). Data-set versus causal-process observations: The 2000 U.S. Presidential
election. In H. E. Brady & D. Collier (Eds.), Rethinking social inquiry: Diverse tools, shared
standards (2. ed, pp. 267–271). Rowman & Littlefield.
16 / 23
Missing values
17 / 23
How to handle missing values
18 / 23
Suggestion on missing values
My suggestion:
Use Listwise Deletion
Most common method
Unbiased under standard assumptions
Keep a cautious eye on your number of observations (N)
If you compare across multiple models, your N should be constant
Discuss potential limitations if you have strong believes that missing
values are systematic
If you’re losing a large proportion of your observations check
where this is coming from!
Could this be a coding mistake?
Do you really need the variables that have a large amount of
missings? Trade-off between losing observations or losing a control
variable
19 / 23
Always check your N!
20 / 23
Always check your N!
21 / 23
The proposition that low-dose alcohol use protects against all-cause
mortality in general populations continues to be controversial.
Observational studies tend to show that people classified as ”moderate
drinkers” have longer life expectancy and are less likely to die from heart
disease than those classified as abstainers.
Why is this simply wrong?
See Zhao et al. 2023
22 / 23
Example code
23 / 23