Linear Regression
Linear Regression
Lec. 21
Linear Regression
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 1
Regression in practice
24 In MS Excel: just “add trendline” to a scatter graph,
22 reporting equation and R-squared value.
20
US GDP in T$
18
16
14 y = 0.5781x - 1146.5 𝑦𝑦 ≅ 𝜃𝜃1 𝑥𝑥 + 𝜃𝜃0
R² = 0.9751
12
10
2000 2005 2010 2015 2020 2025
year
https://www.datacamp.com/community/tutorials/linear-regression-R
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 2
Regression
y
𝑋𝑋: “input”, 𝑌𝑌: “output”:
How to predict the output as a function of the input?
x
Task: 𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥 and assess uncertainty
in estimation.
dataset analysis regression
function
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 3
Intuitive methods for regression
For a given 𝑥𝑥,
� 𝐼𝐼
𝑓𝑓
How to estimate 𝑓𝑓̂ 𝑥𝑥� ≅ 𝑓𝑓 𝑥𝑥� = 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥� ?
2Δ
We could estimate 𝑓𝑓̂ as the arithmetical average
of 𝑦𝑦𝑖𝑖 in the subset of pairs 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 , where 𝑥𝑥𝑖𝑖 = 𝑥𝑥.
�
y
But it is unlikely we can find any sample exactly at 𝑥𝑥.
�
Hence, we can define an interval, 𝐼𝐼 = 𝑥𝑥� − Δ; 𝑥𝑥� + Δ ,
around 𝑥𝑥� and average 𝑦𝑦𝑖𝑖 of samples where the
corresponding 𝑥𝑥𝑖𝑖 is inside the interval 𝐼𝐼: this is an 𝑥𝑥�
x
In parametric regression, instead, we use all data, and assuming a parametric form for 𝑓𝑓,
𝜃𝜃
e.g.: 𝑓𝑓 𝑥𝑥, 𝛉𝛉 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥, with 𝛉𝛉 = 0
𝜃𝜃1
To estimate 𝑓𝑓̂ ≅ 𝑓𝑓 is to estimate the parameters: 𝛉𝛉 � ≅ 𝛉𝛉,
so that ∀𝑥𝑥: 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥, 𝛉𝛉� = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥.
In linear regression, the relation of 𝛉𝛉 and 𝑓𝑓 is linear: 𝑓𝑓 = 𝐯𝐯 T 𝛉𝛉, for some 𝐯𝐯 depending on 𝑥𝑥.
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 4
Density estimation vs regression
𝑛𝑛
Univariate density estimation: Dataset 𝑥𝑥𝑖𝑖 𝑖𝑖=1 ~IID 𝑝𝑝𝑋𝑋 ⇒ Estimate 𝑝𝑝̂𝑋𝑋 ≅ 𝑝𝑝𝑋𝑋 ;
Parametric estimation: 𝑥𝑥𝑖𝑖 𝑛𝑛𝑖𝑖=1 |𝛉𝛉~IID 𝑝𝑝𝑋𝑋|𝛉𝛉 � ≅ 𝛉𝛉 ;
⇒ Estimate 𝛉𝛉
𝑛𝑛
Bivariate density estimation: Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 ~IID 𝑝𝑝𝑋𝑋,𝑌𝑌 ⇒ Estimate 𝑝𝑝̂𝑋𝑋,𝑌𝑌 ≅ 𝑝𝑝𝑋𝑋,𝑌𝑌 ;
Parametric estimation: 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛 |
𝑖𝑖=1 𝛉𝛉~IID
� ≅ 𝛉𝛉 ;
𝑝𝑝𝑋𝑋,𝑌𝑌|𝛉𝛉 ⇒ Estimate 𝛉𝛉
𝑛𝑛
Regression: Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝑌𝑌|𝑋𝑋~𝑝𝑝𝑌𝑌|𝑋𝑋 conditional distr.
Parametric regression: 𝑌𝑌|𝑋𝑋, 𝛉𝛉~𝑝𝑝𝑌𝑌|𝑋𝑋,𝛉𝛉
𝑌𝑌𝑖𝑖 |𝑥𝑥𝑖𝑖 , 𝛉𝛉~IID 𝑝𝑝𝑌𝑌|𝑋𝑋=𝑥𝑥𝑖𝑖,𝛉𝛉
𝑓𝑓 𝑥𝑥, 𝛉𝛉 = 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥, 𝛉𝛉 conditional mean
(regression function)
𝑛𝑛
Chain rule: 𝑝𝑝𝑋𝑋,𝑌𝑌 = 𝑝𝑝𝑋𝑋 𝑝𝑝𝑌𝑌|𝑋𝑋 . (Also) marginal distr. 𝑝𝑝𝑋𝑋 can be estimated from 𝑥𝑥𝑖𝑖 𝑖𝑖=1 ,
𝑛𝑛
but regression is about estimating only 𝑝𝑝𝑌𝑌|𝑋𝑋 from 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 .
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 5
Basis of linear regression
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 6
From MVN to linear regression
If the estimated joint density is Multi-Variate Normal, then the regression function is linear.
𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝛍𝛍 �
� , 𝚺𝚺 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥
𝑌𝑌 𝑌𝑌
estimated
density: 𝛍𝛍 �
�, 𝚺𝚺
𝑋𝑋 𝑋𝑋
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 7
Bivariate Normal RVs, recap
𝑋𝑋 1 1
𝐙𝐙 = ~𝒩𝒩 𝛍𝛍, 𝚺𝚺 ⇔ 𝑝𝑝𝐙𝐙 𝐳𝐳 = exp − 𝐙𝐙 − 𝛍𝛍 T 𝚺𝚺 −1 𝐙𝐙 − 𝛍𝛍
𝑌𝑌 2𝜋𝜋 𝚺𝚺 2
𝜇𝜇𝑋𝑋 = 0.55;
0.75
𝜇𝜇𝑌𝑌 = 0.55;
𝜎𝜎𝑋𝑋 = 0.13; 0.5
y
𝜎𝜎𝑌𝑌 = 0.10;
0.25
𝜌𝜌 = 0.87;
0
0 0.25 0.5 0.75 1
𝑛𝑛 | 𝑥𝑥𝑖𝑖
Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝛍𝛍, 𝚺𝚺 ~IID 𝒩𝒩 𝛍𝛍, 𝚺𝚺 vector notation: 𝐳𝐳𝑖𝑖 = 𝑦𝑦 ;
𝑖𝑖
Likelihood function:
𝑛𝑛 T −1
𝑛𝑛 −1
𝑛𝑛
𝑙𝑙𝑛𝑛 𝛍𝛍, 𝚺𝚺 = − 𝐳𝐳�𝑛𝑛 − 𝛍𝛍 𝚺𝚺 𝐳𝐳�𝑛𝑛 − 𝛍𝛍 − tr 𝚺𝚺 𝐒𝐒𝑛𝑛 − log 𝚺𝚺
2 2 2
1 𝑛𝑛
and sample covariance matrix: 𝐒𝐒𝑛𝑛 = � 𝐳𝐳𝑖𝑖 − 𝐳𝐳�𝑛𝑛 𝐳𝐳𝑖𝑖 − 𝐳𝐳�𝑛𝑛 T
𝑛𝑛 − 1 𝑖𝑖=1
sample covariance
� ̂
𝑉𝑉𝑋𝑋,𝑛𝑛 𝐶𝐶𝑋𝑋,𝑌𝑌,𝑛𝑛
=
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 𝑉𝑉�𝑌𝑌,𝑛𝑛
sample
variances
𝑛𝑛 − 1
MLE: � = 𝐳𝐳�𝑛𝑛 ;
𝛍𝛍 �=
𝚺𝚺 𝐒𝐒𝑛𝑛
𝑛𝑛
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 9
MVN: Conditional distributions
The conditional distribution is normal, with - Cond. Mean 𝑓𝑓 𝑥𝑥 = 𝜇𝜇𝑌𝑌|𝑥𝑥 linear with 𝑥𝑥 ;
2 - Cond. Variance 𝜎𝜎𝑌𝑌2|𝑋𝑋 invariant respect to 𝑥𝑥.
𝑌𝑌| 𝑋𝑋 = 𝑥𝑥 ~𝒩𝒩 𝜇𝜇𝑌𝑌|𝑥𝑥 , 𝜎𝜎𝑌𝑌|𝑋𝑋
𝜎𝜎𝑌𝑌 Straight line
𝔼𝔼𝑌𝑌|𝑥𝑥 𝑌𝑌|𝑥𝑥 = 𝜇𝜇𝑌𝑌|𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋 𝜇𝜇𝑌𝑌|𝑥𝑥 = argmax𝑦𝑦 𝑝𝑝𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝑦𝑦
𝕍𝕍𝑌𝑌|𝑥𝑥 𝑌𝑌|𝑥𝑥 = 𝜎𝜎𝑌𝑌2|𝑋𝑋 = 1 − 𝜌𝜌2 𝜎𝜎𝑌𝑌2 passes by all maxima.
1
0.75
0.5
y
0.25
1
Sample variances: 𝑉𝑉�𝑋𝑋,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 2
≅ 𝕍𝕍 𝑋𝑋 = 𝜎𝜎𝑋𝑋2 ;
𝑛𝑛−1
1
𝑉𝑉�𝑌𝑌,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 2
≅ 𝕍𝕍 𝑌𝑌 = 𝜎𝜎𝑌𝑌2 ;
𝑛𝑛−1
1
Sample covariance: 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 ≅ ℂ 𝑋𝑋, 𝑌𝑌 = 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 ;
𝑛𝑛−1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 11
Understanding the estimated coefficients
The regression line passes by average point 𝛍𝛍 = 𝜇𝜇𝑋𝑋 , 𝜇𝜇𝑌𝑌 . 𝑓𝑓
𝜎𝜎𝑌𝑌 𝜇𝜇𝑌𝑌 𝛍𝛍
Exact regression function: 𝑓𝑓 𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋
𝑓𝑓 − 𝜇𝜇𝑌𝑌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
⇒ = 𝜌𝜌
𝜎𝜎𝑌𝑌 𝜎𝜎𝑋𝑋 0 𝜇𝜇𝑋𝑋 𝑥𝑥
𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝑓𝑓 − 𝜇𝜇𝑌𝑌 𝑔𝑔
⇒ 𝑔𝑔 = 𝜌𝜌 𝑢𝑢𝑋𝑋 with 𝑢𝑢𝑋𝑋 = ; 𝑔𝑔 = 𝜌𝜌
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
𝑢𝑢𝑋𝑋 1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 12
Linear regression and MLE
The joint probability of 𝑋𝑋 and 𝑌𝑌 depends on parameters:
𝛉𝛉 : parameters of regression function.
𝜎𝜎𝜀𝜀2 : noise variance.
𝛈𝛈𝑋𝑋 : pars. marginal distribution 𝑝𝑝𝑋𝑋 (but we do not care about it, in regression).
𝑥𝑥
𝑝𝑝 𝑋𝑋, 𝑌𝑌 𝛉𝛉, 𝜎𝜎𝜀𝜀2 , 𝛈𝛈𝑋𝑋 = 𝑝𝑝 𝑌𝑌 𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑝𝑝 𝑋𝑋 𝛈𝛈𝑋𝑋 Chain rule 𝛉𝛉
Joint prob. Conditional prob. 𝑌𝑌
Marginal prob. 𝜎𝜎𝜀𝜀2
LH for observation 𝑖𝑖, 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : ℒ 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑥𝑥𝑗𝑗 𝑥𝑥𝑖𝑖
𝑖𝑖
𝜀𝜀 𝛉𝛉
Conditional independence: 𝑖𝑖 ≠ 𝑗𝑗 ⇒ 𝑌𝑌𝑖𝑖 ⊥ 𝑌𝑌𝑗𝑗 �𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑌𝑌𝑗𝑗 𝑌𝑌𝑖𝑖
𝑛𝑛
𝜎𝜎𝜀𝜀2
Global LH for dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 : ℒ𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = �ℒ 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑖𝑖=1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 13
Linear regression and MLE cont.
Log LH for observation 𝑖𝑖, 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = log ℒ 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑛𝑛
𝑛𝑛
Log global LH for dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 : 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = � 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑖𝑖=1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 14
MLE for linear regression under Normal errors, I
LH for normal error: 𝑌𝑌�𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝑓𝑓 𝑋𝑋, 𝛉𝛉 , 𝜎𝜎𝜀𝜀2
1 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝒩𝒩 𝑦𝑦𝑖𝑖 ; 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 , 𝜎𝜎𝜀𝜀2 ∝ φ
𝜀𝜀 𝜎𝜎𝜀𝜀 𝜎𝜎𝜀𝜀
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 15
MLE for linear regression under Normal errors, II
LH for normal error: 𝑌𝑌�𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝑓𝑓 𝑋𝑋, 𝛉𝛉 , 𝜎𝜎𝜀𝜀2
1 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝒩𝒩 𝑦𝑦𝑖𝑖 ; 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 , 𝜎𝜎𝜀𝜀2 ∝ φ
𝜀𝜀 𝜎𝜎𝜀𝜀 𝜎𝜎𝜀𝜀
log-LH, individual 2
obs: 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = log 𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = − log 𝜎𝜎𝜀𝜀 − + const.
𝜀𝜀 2𝜎𝜎𝜀𝜀2
𝑛𝑛 𝑛𝑛
1 2
Gl. log-LH: 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = � 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − 2 � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 + const.
2𝜎𝜎𝜀𝜀
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
2
Residual Sum of Squares: rss𝑛𝑛 𝛉𝛉 ≜ � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑖𝑖=1
1
𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − rss𝑛𝑛 𝛉𝛉
2𝜎𝜎𝜀𝜀2
MLE: � = argmax𝛉𝛉 𝑙𝑙𝑛𝑛 = argmin rss𝑛𝑛 ;
find 𝛉𝛉 � 𝜎𝜎𝜀𝜀2
then find 𝜎𝜎�𝜀𝜀2 = argmax𝜎𝜎𝜀𝜀2 𝑙𝑙𝑛𝑛 𝛉𝛉,
𝜎𝜎𝜀𝜀2 is irrelevant is this step
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 16
MLE for linear regression under Normal errors, III
𝑛𝑛 𝑛𝑛
2 2
RSS is a quadratic form of 𝛉𝛉: rss𝑛𝑛 𝛉𝛉 = � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 = � 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 − 𝜃𝜃1 𝑥𝑥𝑖𝑖
𝑖𝑖=1 𝑖𝑖=1
To min. rss, we assign zero gradient: ∇rss 𝛉𝛉 = 𝟎𝟎 , obtaining a linear sys. of 2 eqs. in 2 vars.
𝑛𝑛
𝜕𝜕 rss𝑛𝑛 𝜕𝜕𝜕𝜕 𝑥𝑥𝑖𝑖 , 𝛉𝛉 𝜕𝜕𝜕𝜕 𝑥𝑥, 𝛉𝛉 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕
= � 2 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 = 𝑥𝑥 𝑗𝑗 i.e. =1; = 𝑥𝑥
𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃0 𝜕𝜕𝜃𝜃1
𝑖𝑖=1
𝑛𝑛
𝑗𝑗
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖
𝑖𝑖=1
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝜕𝜕 rss𝑛𝑛
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 = 0 ⇔ � 𝑦𝑦𝑖𝑖 − 𝑛𝑛 𝜃𝜃̂0 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖 = 0
𝜕𝜕𝜃𝜃0
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 17
MLE for linear regression under Normal errors, IV
𝑛𝑛 𝑛𝑛 𝑛𝑛
� 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝜃𝜃̂0 � 𝑥𝑥𝑖𝑖 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖2 = 0 ⇔ 𝑋𝑋𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 𝑌𝑌�𝑛𝑛 + 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛2 − 𝜃𝜃̂1 𝑋𝑋𝑛𝑛2 = 0
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
1 2
� 𝜎𝜎𝜀𝜀2
MLE for 𝜎𝜎𝜀𝜀2 : 𝑙𝑙𝑛𝑛 𝛉𝛉, = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − 2 rss� 𝑛𝑛 with � = � 𝑦𝑦𝑖𝑖 − 𝑓𝑓̂𝑖𝑖
� 𝑛𝑛 ≜ rss𝑛𝑛 𝛉𝛉
rss
2𝜎𝜎𝜀𝜀
𝑖𝑖=1
𝜕𝜕𝑙𝑙𝑛𝑛 𝑛𝑛 1 rss
� 𝑛𝑛
2 =− 2+ 4 rss
2
� 𝑛𝑛 = 0 ⇔ 𝜎𝜎�𝜀𝜀 = noise level est.: biased.
𝜕𝜕𝜎𝜎𝜀𝜀 2𝜎𝜎𝜀𝜀 2𝜎𝜎𝜀𝜀 𝑛𝑛 (with 1/ 𝑛𝑛 − 2 : unbiased)
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 18
Example: graph of regression
n = 20 n = 200
-1 0 1 2 3 -1 0 1 2 3
𝑓𝑓
2
Few Many 2
samples: samples:
1.5
1
i
i
1
y
y
𝑓𝑓̂ ≅ 𝑓𝑓
𝑓𝑓̂
0.5 0
0 -1
0.5
𝜎𝜎�𝜀𝜀
1
|
|
𝜎𝜎�𝜀𝜀 ≅ 𝜎𝜎𝜀𝜀
i
i
0.5
|r
|r
0 0
-1 0 1 2 3 -1 0 1 2 3
x x
i i
rss𝑛𝑛 1
rss𝑛𝑛 1
0 0
1
1
-1 -1
� ≅ 𝛉𝛉
𝛉𝛉
0 1 2 3 0 1 2 3
0 0
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 19
Uncertainty of the estimator
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 20
Uncertainty of the estimator II
𝐱𝐱 𝜎𝜎�𝜀𝜀 𝑋𝑋𝑛𝑛2
𝛉𝛉 � Stand. err.: se
� 0 ≜ se �0 =
� 𝑛𝑛 Θ � 0 �𝜃𝜃0 , 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝜃𝜃0 , se20
Θ
𝛉𝛉 𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛
𝐘𝐘
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2 𝜎𝜎�𝜀𝜀
se
� 1 ≜ se �1 =
� 𝑛𝑛 Θ �1 �𝜃𝜃1 , 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝜃𝜃1 , se12
Θ
𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛
𝐱𝐱 �𝑗𝑗 − 𝜃𝜃𝑗𝑗
Θ
𝛉𝛉, 𝜎𝜎𝜀𝜀2 Estimated 𝜎𝜎𝜀𝜀2 : ~𝑡𝑡𝑛𝑛−2 ≅ 𝜑𝜑 Use Student’s t for small 𝑛𝑛.
� 𝜎𝜎�𝜀𝜀
𝛉𝛉, 2 se
� 𝑗𝑗
large 𝑛𝑛
𝜎𝜎�𝜀𝜀 𝜎𝜎�𝜀𝜀 se
�0
Analysis of stand. err. Special case: if �
𝑋𝑋𝑛𝑛 = 0 ⇒ se �0 = ; se
�1 = = ;
𝑛𝑛 𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛
Errors decay with 1/ 𝑛𝑛, error in the slope 𝜃𝜃̂1 decays also with 1/𝜎𝜎�𝑋𝑋,𝑛𝑛 .
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 21
Confidence Intervals, hypothesis testing, p-value
𝛉𝛉 �
𝛉𝛉 Typical test: is 𝑋𝑋 affecting 𝑌𝑌?
𝐻𝐻0 : ∀𝑥𝑥 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 = 𝔼𝔼 𝑌𝑌
𝐘𝐘
𝑓𝑓 𝑥𝑥 = 𝑓𝑓0 = 𝜃𝜃0 ⇔ 𝜃𝜃1 = 0
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2
If 0 ∈ 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 , then 𝐻𝐻0 is retained, at 𝛼𝛼 confidence.
𝐶𝐶𝐶𝐶
𝐱𝐱 [ ] : reject 𝐻𝐻0
𝛉𝛉, 𝜎𝜎𝜀𝜀2 0 𝜃𝜃1
� 𝜎𝜎�𝜀𝜀2
𝛉𝛉, [ ] 𝐶𝐶𝐶𝐶 : retain 𝐻𝐻0
0 𝜃𝜃1
p-value: 𝒫𝒫 = 2Φ − 𝜃𝜃̂1 /se
�1 for normal appr.
𝜃𝜃̂1
� 𝐻𝐻 ~𝑡𝑡 ≅ 𝜑𝜑
� 1 0 𝑛𝑛−2
se
𝒫𝒫 < 𝛼𝛼 ⇒ reject 𝐻𝐻0
0 𝜃𝜃1 /se
�1
𝜃𝜃̂1 /se
�1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 22
Prediction of new values of 𝑓𝑓 and 𝑦𝑦, given 𝑥𝑥
Predicting the/a value of 𝑦𝑦 at 𝑥𝑥 = 𝑥𝑥∗ : 𝑦𝑦∗ = 𝑦𝑦 𝑥𝑥∗ = 𝑓𝑓 𝑥𝑥∗ , 𝛉𝛉 + 𝜀𝜀∗ = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥∗ + 𝜀𝜀∗
se
� 𝑦𝑦∗ = �2
se 𝑓𝑓∗ + 𝜎𝜎�𝜀𝜀2 [ min at 𝑥𝑥∗ = 𝑋𝑋�𝑛𝑛 ⇒ se
� 2 = 𝜎𝜎�𝜀𝜀2 /𝑛𝑛 ]
𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ 3
𝑓𝑓̂
0
𝑓𝑓 Using the normal assumption, we get a CI for 𝑓𝑓
-1
and for 𝑦𝑦 at any 𝑥𝑥.
-1 0 1 2 3
0
- 𝐶𝐶𝐶𝐶𝑛𝑛 of 𝑦𝑦 comes a normal RV with mean 𝑓𝑓
and std. dev. 𝜎𝜎𝜀𝜀 .
-2
-1 0 1 2 3
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 24
Example Linear regression
2
R = 19 %
80
75
𝑛𝑛 = 40 75
65 65
strength [MPa]
strength [MPa]
60 60
55 55
MPa m3
𝜃𝜃̂1 = 0.1368
𝑦𝑦
50 50
45 45
Kg
2400 2420 2440 2460 2480 2400 2420 2440 2460 2480 2500
𝑥𝑥
3 3
density [kg/m ] density [kg/m ]
𝛼𝛼 = 5% ⇒ 1 − 𝛼𝛼 = 95% confidence
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃0 = 𝜃𝜃̂0 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 ; 𝜃𝜃̂0 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 = −500.9; −47.9 MPa
MPa m3
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 = 𝜃𝜃̂1 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 1 ; 𝜃𝜃̂1 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 1 = 0.0442; 0.2295
Kg
𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 : value for t-distr. with 𝑛𝑛 − 2 0 ∉ 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 ⇒ reject 𝐻𝐻0 : 𝜃𝜃1 = 0 ,
degrees of freedom. with significance 𝛼𝛼 = 5%,
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 25
Summary
Linear regression is a simple and important method to investigate relations between vars.
In linear regression, the relation between the regression function and the par.s is linear.
Regression analysis is computationally simple.
From dataset,
� of straight-line:
- we compute some sample moments and then estimated parameters 𝛉𝛉
dataset: sample moments: estimated parameters:
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = 𝑋𝑋𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 𝑌𝑌�𝑛𝑛 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 ⇒ 𝜃𝜃̂1 = , 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛
𝑉𝑉�𝑋𝑋,𝑛𝑛 = 𝑋𝑋�𝑛𝑛2 − 𝑋𝑋𝑛𝑛2 �
𝑉𝑉𝑋𝑋,𝑛𝑛
estimated regression function: ⇒ 𝑓𝑓̂ 𝑥𝑥 = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 26
References and readings
https://en.wikipedia.org/wiki/Linear_regression
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 27