FIT2086 Lecture 6 Linear Regression: Daniel F. Schmidt
FIT2086 Lecture 6 Linear Regression: Daniel F. Schmidt
FIT2086 Lecture 6
Linear Regression
Daniel F. Schmidt
September 4, 2017
Linear Regression Models
Model Selection for Linear Regression
Outline
H0 : null hypothesis
vs
HA : alternative hypothesis
Outline
Now we will start to see how these tools can be used to build
more complex models
Over the next three weeks we will look at supervised learning
In particular, we we will look at linear regression
But first, what is supervised learning?
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
yi = f (xi,1 , . . . , xi,p )
Linear Regression
120
115
110
105
5 10 15 20
Patient ID
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
120
115
110
105
5 10 15 20
Patient ID
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
ei = ŷi − yi
are the errors between our model predictions ŷi and the
observed data yi
⇒ often called residual error, or just residuals
A good fit would lead to overall small errors
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
120
115
110
105
5 10 15 20
Patient ID
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
125
Blood Pressure (mmHg)
120
115
110
105
86 88 90 92 94 96 98 100 102
Weight (kg)
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
125
Blood Pressure (mmHg)
120
115
110
105
86 88 90 92 94 96 98 100 102
Weight (kg)
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
E [BPi ] = µ
This says that the conditional mean of blood pressure BPi for
individual i, given the individual’s weight Weighti , is equal to
β0 plus β1 times the weight Weighti
Note our simple mean model is a linear model with β1 = 0
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
125
Blood Pressure (mmHg)
120
115
110
105
86 88 90 92 94 96 98 100 102
Weight (kg)
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
125
Blood Pressure (mmHg)
120
115
110
105
86 88 90 92 94 96 98 100 102
Weight (kg)
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
E [Yi | xi ] = ŷi = β0 + β1 xi
so
For every additional kilogram a person weighs, their blood
pressure increases by 2.2053mmHg
For a person who weighs zero kilograms, the predicted blood
pressure is 1.2009mmHg
The predictions might not make sense outside of sensible
ranges of the predictors!
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
Given LS estimates β̂0 , β̂1 we can find the predictions for our
data
ŷi = β̂0 + β̂1 xi
and residuals
ei = yi − ŷi
The vector of residuals e = (e1 , . . . , en ) has the properties
n
X
ei = 0 and corr (x, e) = 0
i=1
Given LS estimates β̂0 , β̂1 we can find the predictions for our
data
ŷi = β̂0 + β̂1 xi
and residuals
ei = yi − ŷi
The vector of residuals e = (e1 , . . . , en ) has the properties
n
X
ei = 0 and corr (x, e) = 0
i=1
ŷ = Xβ + β0 1n and e = y − ŷ.
RSS(β0 , β) = e0 e
That is, least-squares finds the plane such that the residuals
(errors) are uncorrelated with all predictors in the model
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
6
y
6
y
1.5
1
Residuals
0.5
-0.5
-1
0 0.2 0.4 0.6 0.8 1
x
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
The higher the q the more nonlinear the fit can become, but
at risk of overfitting
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
The higher the q the more nonlinear the fit can become, but
at risk of overfitting
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
6
y
Connecting LS to ML (1)
To show this, let our targets Y1 , . . . , Yn be RVs
Write the linear regression model as
p
X
Ŷi = β0 + βj xi,j + εi
j=1
Connecting LS to ML (2)
Each Yi is independent
Given target data y the likelihood function can be written
Pp 2
n 1 yi − β0 −
Y 1 2 j=1 βj xi,j
p(y | β0 , β, σ 2 ) = exp −
2πσ 2 2σ 2
i=1
n RSS(β0 , β)
L(y | β0 , β, σ 2 ) = log(2πσ 2 ) +
2 2σ 2
As the value of σ 2 scales the RSS term, it is easy to see that
the values of β0 and β that minimise the negative
log-likelihood are the least-squares estimates β̂0 and β̂
LS estimates are same as the maximum likelihood estimates
assuming the random “errors” εi are normally distributed
Our residuals
ei = yi − ŷi
can be viewed as our estimates of the errors εi .
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
Connecting LS to ML (3)
2 RSS(β̂0 , β̂)
σ̂ML =
n
but this tends to underestimate the actual variance.
A better estimate is the unbiased estimate
RSS(β̂0 , β̂)
σ̂u2 =
n−p−1
where p is the number of predictors used to fit the model.
Linear Regression Models Supervised Learning
Model Selection for Linear Regression Linear Regression Models
Outline
Underfitting/Overfitting (1)
y = β0 + β1 x + β2 x2 + ε
or
y = β0 + β1 x + β2 x2 + β3 x3 + β4 x4 + β5 x5 + ε
or another model with some other number of polynomial
terms.
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
12
10
6
y
-2
-4
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
12
10
6
y
-2
-4
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
12
10
6
y
-2
-4
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
12
10
6
y
-2
-4
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
H0 : β=0
vs
HA : β 6= 0
2 n n
L(y | β̂0 , β̂, σ̂ML )= log 2πRSS(β̂0 , β̂)/n +
2 2
This always decreases as we add more predictors to our model
⇒ cannot be used to select models, only parameters
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
2 n n
L(y | β̂0 , β̂, σ̂ML )= log 2πRSS(β̂0 , β̂)/n +
2 2
This always decreases as we add more predictors to our model
⇒ cannot be used to select models, only parameters
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
where
α(·) is a model complexity penalty;
kM is the number of predictors in model M;
n is the size of our data sample.
This is a form of penalized likelihood estimation
⇒ a model is penalized by its complexity (ability to fit data)
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
α(n, kM ) = kM
kM
α(n, kM ) = log n
2
AIC penalty smaller than BIC; increased chance of overfitting
BIC penalty bigger than AIC; increased chance of underfitting
Differences in scores of ≥ 3 or more are considered significant
Linear Regression Models Under and Overfitting
Model Selection for Linear Regression Model Selection Methods
Reading/Terms to Revise