Lab 2
Lab 2
Keri Hu
1/23
Today: Linear regression in R
2/23
Review of regression basics
• Univariate model: Yi “ β0 ` β1 Xi ` ϵi
• Interpretation:
• β̂1 : One-unit increase in X1 is associated with β̂1 units of increase
in Y on average, holding constant X2 , . . . , XK .
3/23
Correlation is not causation
4/23
Example: Covid infection
• “Of the 297 people in San Diego County with positive diagnoses,
cases in patients between 20 and 59 formed the bulk of the total,
236 overall or 79% of cases.”
“Dr. Eric McDonald said that statistic probably represented a testing bias,
as members of the military, first-responders and healthcare workers fall
most frequently into that age group and these people are tested at rates
much higher than the general population.”
6/23
Variables in the dataset
7/23
Plot Price versus Age, AGST, HarvestRain, WinterRain
8/23
Estimate a linear model: lm()
9/23
Regression result
10/23
Description of the table
• Residual: ei “ Yi ´ Ŷi
• Estimate: β̂0 (Intercept), β̂1 (WinterRain), β̂2 (AGST), β̂3
(HarvestRain), β̂4 (Age), β̂5 (FrancePop)
• The other three columns (Std. Error, t value, and Pr(>|t|))
help us determine if a variable should be included in the model,
specifically if its coefficient is significantly different from zero.
• “***, **, *, ., ” (most significant Ñ least significant): which
variables are significant
• Adjusted R2 : R2 adjusted for the number of independent variables
11/23
Hypothesis testing
12/23
Hypothesis testing in regression
13/23
Null hypothesis H0 : βk “ 0
If our sample statistic (e.g. t value) is far from the hypothetical value 0,
we can say this is unusual enough and reject the null hypothesis βk “ 0.
2 https://analystprep.com/cfa-level-1-exam/quantitative-methods/one-tailed-vs-two-tailed-
hypothesis-testing/ 14/23
t value, Std. Error, and Pr(>|t|)
15/23
Level of significance α
16/23
Refine the model
17/23
Re-run the model by leaving out FrancePop
18/23
What has changed?
19/23
Multicollinearity
[1] -0.9945
20/23
Add best fit line to plot
21/23
Make predictions
22/23
Compare to the actual values
Out-of-sample R2
23/23