100% found this document useful (1 vote)
53 views

Linear Regression

Linear regression aims to model the relationship between an input variable X and an output variable Y using linear functions. It estimates the regression function f(x) as the conditional expectation of Y given X=x. The linear regression model assumes f(x,θ) = θ0 + θ1x, where the parameters θ = (θ0, θ1) are estimated to minimize the residuals between the observed and predicted Y values. If the joint distribution of X and Y is estimated to be multi-variate normal, then the regression function f(x) will be a linear function of x.

Uploaded by

John Roncoroni
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
53 views

Linear Regression

Linear regression aims to model the relationship between an input variable X and an output variable Y using linear functions. It estimates the regression function f(x) as the conditional expectation of Y given X=x. The linear regression model assumes f(x,θ) = θ0 + θ1x, where the parameters θ = (θ0, θ1) are estimated to minimize the residuals between the observed and predicted Y values. If the joint distribution of X and Y is estimated to be multi-variate normal, then the regression function f(x) will be a linear function of x.

Uploaded by

John Roncoroni
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

12

24 704: Probability and Estimation Methods for Engineering Systems

Lec. 21

Linear Regression

instructor: Matteo Pozzi

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 1
Regression in practice
24 In MS Excel: just “add trendline” to a scatter graph,
22 reporting equation and R-squared value.
20
US GDP in T$

18
16
14 y = 0.5781x - 1146.5 𝑦𝑦 ≅ 𝜃𝜃1 𝑥𝑥 + 𝜃𝜃0
R² = 0.9751
12
10
2000 2005 2010 2015 2020 2025
year

In a program language as R: call the routine for


linear regression, you get optimal estimators,
standard errors, test values, p-values…

We will learn how to compute those values.

https://www.datacamp.com/community/tutorials/linear-regression-R

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 2
Regression

So far, our problem was: given samples of one RV,


estimate some parameters related to its distribution.
In regression problems, instead, we have joint samples
of (at least) two RVS, say 𝑋𝑋 and 𝑌𝑌, and we aim at
understanding the relation between these variables.

y
𝑋𝑋: “input”, 𝑌𝑌: “output”:
How to predict the output as a function of the input?
x

Because of randomness, we cannot predict 𝑌𝑌 deterministically from 𝑋𝑋.


But we can estimate the regression function 𝑓𝑓 𝑥𝑥 ≜ 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 .

Task: 𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥 and assess uncertainty
in estimation.
dataset analysis regression
function

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 3
Intuitive methods for regression
For a given 𝑥𝑥,
� 𝐼𝐼
𝑓𝑓
How to estimate 𝑓𝑓̂ 𝑥𝑥� ≅ 𝑓𝑓 𝑥𝑥� = 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥� ?

We could estimate 𝑓𝑓̂ as the arithmetical average
of 𝑦𝑦𝑖𝑖 in the subset of pairs 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 , where 𝑥𝑥𝑖𝑖 = 𝑥𝑥.

y
But it is unlikely we can find any sample exactly at 𝑥𝑥.

Hence, we can define an interval, 𝐼𝐼 = 𝑥𝑥� − Δ; 𝑥𝑥� + Δ ,
around 𝑥𝑥� and average 𝑦𝑦𝑖𝑖 of samples where the
corresponding 𝑥𝑥𝑖𝑖 is inside the interval 𝐼𝐼: this is an 𝑥𝑥�
x

example of smoothing and non-parametric regression.

In parametric regression, instead, we use all data, and assuming a parametric form for 𝑓𝑓,
𝜃𝜃
e.g.: 𝑓𝑓 𝑥𝑥, 𝛉𝛉 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥, with 𝛉𝛉 = 0
𝜃𝜃1
To estimate 𝑓𝑓̂ ≅ 𝑓𝑓 is to estimate the parameters: 𝛉𝛉 � ≅ 𝛉𝛉,
so that ∀𝑥𝑥: 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥, 𝛉𝛉� = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥.

In linear regression, the relation of 𝛉𝛉 and 𝑓𝑓 is linear: 𝑓𝑓 = 𝐯𝐯 T 𝛉𝛉, for some 𝐯𝐯 depending on 𝑥𝑥.

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 4
Density estimation vs regression
𝑛𝑛
Univariate density estimation: Dataset 𝑥𝑥𝑖𝑖 𝑖𝑖=1 ~IID 𝑝𝑝𝑋𝑋 ⇒ Estimate 𝑝𝑝̂𝑋𝑋 ≅ 𝑝𝑝𝑋𝑋 ;
Parametric estimation: 𝑥𝑥𝑖𝑖 𝑛𝑛𝑖𝑖=1 |𝛉𝛉~IID 𝑝𝑝𝑋𝑋|𝛉𝛉 � ≅ 𝛉𝛉 ;
⇒ Estimate 𝛉𝛉

𝑛𝑛
Bivariate density estimation: Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 ~IID 𝑝𝑝𝑋𝑋,𝑌𝑌 ⇒ Estimate 𝑝𝑝̂𝑋𝑋,𝑌𝑌 ≅ 𝑝𝑝𝑋𝑋,𝑌𝑌 ;
Parametric estimation: 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛 |
𝑖𝑖=1 𝛉𝛉~IID
� ≅ 𝛉𝛉 ;
𝑝𝑝𝑋𝑋,𝑌𝑌|𝛉𝛉 ⇒ Estimate 𝛉𝛉

𝑛𝑛
Regression: Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝑌𝑌|𝑋𝑋~𝑝𝑝𝑌𝑌|𝑋𝑋 conditional distr.
Parametric regression: 𝑌𝑌|𝑋𝑋, 𝛉𝛉~𝑝𝑝𝑌𝑌|𝑋𝑋,𝛉𝛉
𝑌𝑌𝑖𝑖 |𝑥𝑥𝑖𝑖 , 𝛉𝛉~IID 𝑝𝑝𝑌𝑌|𝑋𝑋=𝑥𝑥𝑖𝑖,𝛉𝛉
𝑓𝑓 𝑥𝑥, 𝛉𝛉 = 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥, 𝛉𝛉 conditional mean
(regression function)

𝑛𝑛
Chain rule: 𝑝𝑝𝑋𝑋,𝑌𝑌 = 𝑝𝑝𝑋𝑋 𝑝𝑝𝑌𝑌|𝑋𝑋 . (Also) marginal distr. 𝑝𝑝𝑋𝑋 can be estimated from 𝑥𝑥𝑖𝑖 𝑖𝑖=1 ,
𝑛𝑛
but regression is about estimating only 𝑝𝑝𝑌𝑌|𝑋𝑋 from 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 5
Basis of linear regression

Straight line regression function: generating model:


𝜃𝜃0
𝑓𝑓 𝑥𝑥, 𝛉𝛉 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 = 1 𝑥𝑥 𝛉𝛉 with 𝛉𝛉 = 𝑓𝑓
𝜃𝜃1
𝑦𝑦𝑖𝑖
Model for data 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : 𝜀𝜀𝑖𝑖
𝜃𝜃1
𝑌𝑌𝑖𝑖 = 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 + 𝜀𝜀𝑖𝑖 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 + 𝜀𝜀𝑖𝑖 1
𝜃𝜃0
Noise: 𝜀𝜀𝑖𝑖 0 𝑥𝑥𝑖𝑖 𝑥𝑥
𝔼𝔼 𝜀𝜀𝑖𝑖 |𝑋𝑋𝑖𝑖 = 𝑥𝑥 = 0
∀𝑥𝑥 ∈ ℝ: � zero-mean homoscedastic noise
𝕍𝕍 𝜀𝜀𝑖𝑖 |𝑋𝑋𝑖𝑖 = 𝑥𝑥 = 𝜎𝜎𝜀𝜀2

Linear regression: 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛


𝑖𝑖=1
� 𝜎𝜎�𝜀𝜀2
⇒ 𝛉𝛉, inferred model:
𝑓𝑓
𝑓𝑓̂ 𝑥𝑥 ≜ 𝑓𝑓 𝑥𝑥, 𝛉𝛉
� 𝑓𝑓̂𝑖𝑖 ≜ 𝑓𝑓̂ 𝑥𝑥𝑖𝑖 = 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
� 𝑦𝑦𝑖𝑖 𝑓𝑓̂
𝑟𝑟𝑖𝑖
= 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥 = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥𝑖𝑖 𝜃𝜃̂1
𝜃𝜃̂0 1
Residuals: 𝑟𝑟𝑖𝑖 = 𝜀𝜀𝑖𝑖̂ = 𝑦𝑦𝑖𝑖 − 𝑓𝑓̂𝑖𝑖 ≅ 𝜀𝜀𝑖𝑖
0 𝑥𝑥𝑖𝑖 𝑥𝑥

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 6
From MVN to linear regression

One approach to regression would be:


1. estimate the joint density, then
2. derive the conditional (and its mean) from that.

If the estimated joint density is Multi-Variate Normal, then the regression function is linear.

𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝛍𝛍 �
� , 𝚺𝚺 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥

dataset parameters of MVN regression function

dataset: 𝑓𝑓̂ : estimated regression line


𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖

𝑌𝑌 𝑌𝑌
estimated
density: 𝛍𝛍 �
�, 𝚺𝚺
𝑋𝑋 𝑋𝑋

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 7
Bivariate Normal RVs, recap

𝑋𝑋 1 1
𝐙𝐙 = ~𝒩𝒩 𝛍𝛍, 𝚺𝚺 ⇔ 𝑝𝑝𝐙𝐙 𝐳𝐳 = exp − 𝐙𝐙 − 𝛍𝛍 T 𝚺𝚺 −1 𝐙𝐙 − 𝛍𝛍
𝑌𝑌 2𝜋𝜋 𝚺𝚺 2

𝜇𝜇𝑋𝑋 𝜎𝜎𝑋𝑋2 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌


Parameters are moments: 𝛍𝛍 = 𝜇𝜇 ; 𝚺𝚺 =
𝑌𝑌 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 𝜎𝜎𝑌𝑌2
1

𝜇𝜇𝑋𝑋 = 0.55;
0.75

𝜇𝜇𝑌𝑌 = 0.55;
𝜎𝜎𝑋𝑋 = 0.13; 0.5

y
𝜎𝜎𝑌𝑌 = 0.10;
0.25

𝜌𝜌 = 0.87;
0
0 0.25 0.5 0.75 1

Contour lines are ellipses,


centered at the mean 𝛍𝛍.
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 8
Estimating a MVN model

𝑛𝑛 | 𝑥𝑥𝑖𝑖
Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝛍𝛍, 𝚺𝚺 ~IID 𝒩𝒩 𝛍𝛍, 𝚺𝚺 vector notation: 𝐳𝐳𝑖𝑖 = 𝑦𝑦 ;
𝑖𝑖

Likelihood function:
𝑛𝑛 T −1
𝑛𝑛 −1
𝑛𝑛
𝑙𝑙𝑛𝑛 𝛍𝛍, 𝚺𝚺 = − 𝐳𝐳�𝑛𝑛 − 𝛍𝛍 𝚺𝚺 𝐳𝐳�𝑛𝑛 − 𝛍𝛍 − tr 𝚺𝚺 𝐒𝐒𝑛𝑛 − log 𝚺𝚺
2 2 2

with sample average vector:


1 𝑛𝑛 𝑋𝑋�𝑛𝑛
𝐳𝐳�𝑛𝑛 = � 𝐳𝐳𝑖𝑖 =
𝑛𝑛 𝑖𝑖=1 𝑌𝑌�𝑛𝑛

1 𝑛𝑛
and sample covariance matrix: 𝐒𝐒𝑛𝑛 = � 𝐳𝐳𝑖𝑖 − 𝐳𝐳�𝑛𝑛 𝐳𝐳𝑖𝑖 − 𝐳𝐳�𝑛𝑛 T
𝑛𝑛 − 1 𝑖𝑖=1
sample covariance
� ̂
𝑉𝑉𝑋𝑋,𝑛𝑛 𝐶𝐶𝑋𝑋,𝑌𝑌,𝑛𝑛
=
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 𝑉𝑉�𝑌𝑌,𝑛𝑛
sample
variances

𝑛𝑛 − 1
MLE: � = 𝐳𝐳�𝑛𝑛 ;
𝛍𝛍 �=
𝚺𝚺 𝐒𝐒𝑛𝑛
𝑛𝑛
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 9
MVN: Conditional distributions
The conditional distribution is normal, with - Cond. Mean 𝑓𝑓 𝑥𝑥 = 𝜇𝜇𝑌𝑌|𝑥𝑥 linear with 𝑥𝑥 ;
2 - Cond. Variance 𝜎𝜎𝑌𝑌2|𝑋𝑋 invariant respect to 𝑥𝑥.
𝑌𝑌| 𝑋𝑋 = 𝑥𝑥 ~𝒩𝒩 𝜇𝜇𝑌𝑌|𝑥𝑥 , 𝜎𝜎𝑌𝑌|𝑋𝑋
𝜎𝜎𝑌𝑌 Straight line
𝔼𝔼𝑌𝑌|𝑥𝑥 𝑌𝑌|𝑥𝑥 = 𝜇𝜇𝑌𝑌|𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋 𝜇𝜇𝑌𝑌|𝑥𝑥 = argmax𝑦𝑦 𝑝𝑝𝑋𝑋,𝑌𝑌 𝑥𝑥, 𝑦𝑦
𝕍𝕍𝑌𝑌|𝑥𝑥 𝑌𝑌|𝑥𝑥 = 𝜎𝜎𝑌𝑌2|𝑋𝑋 = 1 − 𝜌𝜌2 𝜎𝜎𝑌𝑌2 passes by all maxima.
1

0.75

0.5

y
0.25

95% conf. int.


0
0 0.25 0.5 0.75 1

This is related to linear regression:


observing 𝑥𝑥, inferring 𝑌𝑌.
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 10
Estimating coefficients for linear regression in MVN
𝜎𝜎𝑌𝑌
Regression function: 𝑓𝑓 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 = 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 = 𝜇𝜇𝑌𝑌|𝑋𝑋=𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋
𝜎𝜎𝑌𝑌 ℂ 𝑋𝑋, 𝑌𝑌 𝜎𝜎𝑌𝑌
Pars. of regr. func.: 𝜃𝜃1 = 𝜌𝜌 = ; 𝜃𝜃0 = 𝜇𝜇𝑌𝑌 − 𝜌𝜌 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 − 𝜃𝜃1 𝜇𝜇𝑋𝑋 ;
𝜎𝜎𝑋𝑋 𝕍𝕍 𝑋𝑋 𝜎𝜎𝑋𝑋
1 1
Sample averages: 𝑋𝑋�𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 ≅ 𝜇𝜇𝑋𝑋 ; 𝑌𝑌�𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑦𝑦𝑖𝑖 ≅ 𝜇𝜇𝑌𝑌 ;
𝑛𝑛 𝑛𝑛

1
Sample variances: 𝑉𝑉�𝑋𝑋,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 2
≅ 𝕍𝕍 𝑋𝑋 = 𝜎𝜎𝑋𝑋2 ;
𝑛𝑛−1
1
𝑉𝑉�𝑌𝑌,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 2
≅ 𝕍𝕍 𝑌𝑌 = 𝜎𝜎𝑌𝑌2 ;
𝑛𝑛−1

1
Sample covariance: 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 ≅ ℂ 𝑋𝑋, 𝑌𝑌 = 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 ;
𝑛𝑛−1

𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛


Estimated parameters: 𝜃𝜃̂1 = =
𝑉𝑉�𝑋𝑋,𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 2

𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 11
Understanding the estimated coefficients
The regression line passes by average point 𝛍𝛍 = 𝜇𝜇𝑋𝑋 , 𝜇𝜇𝑌𝑌 . 𝑓𝑓
𝜎𝜎𝑌𝑌 𝜇𝜇𝑌𝑌 𝛍𝛍
Exact regression function: 𝑓𝑓 𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋
𝑓𝑓 − 𝜇𝜇𝑌𝑌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
⇒ = 𝜌𝜌
𝜎𝜎𝑌𝑌 𝜎𝜎𝑋𝑋 0 𝜇𝜇𝑋𝑋 𝑥𝑥
𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝑓𝑓 − 𝜇𝜇𝑌𝑌 𝑔𝑔
⇒ 𝑔𝑔 = 𝜌𝜌 𝑢𝑢𝑋𝑋 with 𝑢𝑢𝑋𝑋 = ; 𝑔𝑔 = 𝜌𝜌
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
𝑢𝑢𝑋𝑋 1

The estimated regress. line passes by average point 𝐦𝐦 = 𝑋𝑋�𝑛𝑛 , 𝑌𝑌�𝑛𝑛 . 𝑓𝑓


𝜇𝜇𝑌𝑌 𝛍𝛍 𝑓𝑓̂
Estimated regression funct.: 𝑓𝑓̂ 𝑥𝑥 = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥
𝑌𝑌�𝑛𝑛 𝐦𝐦
𝑓𝑓̂ − 𝑌𝑌�𝑛𝑛 𝑥𝑥 − 𝑋𝑋�𝑛𝑛
⇒ = 𝜌𝜌�
𝜎𝜎�𝑌𝑌,𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛 0 𝜇𝜇𝑋𝑋 𝑋𝑋�𝑛𝑛 𝑥𝑥
𝑥𝑥 − 𝑋𝑋�𝑛𝑛 𝑓𝑓̂ − 𝑌𝑌�𝑛𝑛 𝑔𝑔�
⇒ 𝑔𝑔� = 𝜌𝜌� 𝑢𝑢� 𝑋𝑋 with 𝑢𝑢� 𝑋𝑋 = ; 𝑔𝑔� = 𝜌𝜌�
𝜎𝜎�𝑋𝑋,𝑛𝑛 𝜎𝜎�𝑌𝑌,𝑛𝑛
𝑢𝑢� 𝑋𝑋 1

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 12
Linear regression and MLE
The joint probability of 𝑋𝑋 and 𝑌𝑌 depends on parameters:
𝛉𝛉 : parameters of regression function.
𝜎𝜎𝜀𝜀2 : noise variance.
𝛈𝛈𝑋𝑋 : pars. marginal distribution 𝑝𝑝𝑋𝑋 (but we do not care about it, in regression).
𝑥𝑥
𝑝𝑝 𝑋𝑋, 𝑌𝑌 𝛉𝛉, 𝜎𝜎𝜀𝜀2 , 𝛈𝛈𝑋𝑋 = 𝑝𝑝 𝑌𝑌 𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑝𝑝 𝑋𝑋 𝛈𝛈𝑋𝑋 Chain rule 𝛉𝛉
Joint prob. Conditional prob. 𝑌𝑌
Marginal prob. 𝜎𝜎𝜀𝜀2

LH for observation 𝑖𝑖, 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : ℒ 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑥𝑥𝑗𝑗 𝑥𝑥𝑖𝑖
𝑖𝑖
𝜀𝜀 𝛉𝛉

Conditional independence: 𝑖𝑖 ≠ 𝑗𝑗 ⇒ 𝑌𝑌𝑖𝑖 ⊥ 𝑌𝑌𝑗𝑗 �𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑌𝑌𝑗𝑗 𝑌𝑌𝑖𝑖
𝑛𝑛
𝜎𝜎𝜀𝜀2
Global LH for dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 : ℒ𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = �ℒ 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑖𝑖=1

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 13
Linear regression and MLE cont.

Log LH for observation 𝑖𝑖, 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = log ℒ 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑛𝑛
𝑛𝑛
Log global LH for dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 : 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = � 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑖𝑖=1

MLE: � 𝜎𝜎�𝜀𝜀2 = argmax 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2


𝛉𝛉,

Function 𝑙𝑙𝑛𝑛 is defined on the domain of straight-line parameters 𝛉𝛉 and


noise level 𝜎𝜎𝜀𝜀2 and it depends on the dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 .
Estimators are the optimal values 𝛉𝛉, � 𝜎𝜎�𝜀𝜀2 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 14
MLE for linear regression under Normal errors, I
LH for normal error: 𝑌𝑌�𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝑓𝑓 𝑋𝑋, 𝛉𝛉 , 𝜎𝜎𝜀𝜀2
1 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝒩𝒩 𝑦𝑦𝑖𝑖 ; 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 , 𝜎𝜎𝜀𝜀2 ∝ φ
𝜀𝜀 𝜎𝜎𝜀𝜀 𝜎𝜎𝜀𝜀

= 𝒩𝒩 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 ; 𝑦𝑦𝑖𝑖 , 𝜎𝜎𝜀𝜀2 by symmetry

generating inferred 𝑟𝑟1


𝜀𝜀2
model: 𝑓𝑓 model: 𝑓𝑓
𝜀𝜀1 𝑓𝑓 𝑟𝑟2 𝑓𝑓
𝑦𝑦1 𝜀𝜀3 𝑦𝑦1 𝑟𝑟3
𝑓𝑓̂
𝑦𝑦2 𝑦𝑦2 𝜃𝜃̂1
𝑦𝑦3 𝜃𝜃̂0 𝑦𝑦3
𝜃𝜃1 𝜃𝜃1
𝜃𝜃0 𝜃𝜃0
1 1
0 𝑥𝑥3 𝑥𝑥1 𝑥𝑥2 𝑥𝑥 0 𝑥𝑥3 𝑥𝑥1 𝑥𝑥2 𝑥𝑥

The inferred line should pass close to the data.


The best trend line minimizes a “penalty”, which is proportional to the squared resid.: 𝑟𝑟𝑖𝑖2 .
Residual Sum of Squares: rss𝑛𝑛 ≜ ∑𝑛𝑛𝑖𝑖=1 𝑟𝑟𝑖𝑖2 . Find 𝛉𝛉 which minimizes rss𝑛𝑛 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 15
MLE for linear regression under Normal errors, II
LH for normal error: 𝑌𝑌�𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝑓𝑓 𝑋𝑋, 𝛉𝛉 , 𝜎𝜎𝜀𝜀2
1 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝒩𝒩 𝑦𝑦𝑖𝑖 ; 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 , 𝜎𝜎𝜀𝜀2 ∝ φ
𝜀𝜀 𝜎𝜎𝜀𝜀 𝜎𝜎𝜀𝜀
log-LH, individual 2
obs: 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = log 𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = − log 𝜎𝜎𝜀𝜀 − + const.
𝜀𝜀 2𝜎𝜎𝜀𝜀2
𝑛𝑛 𝑛𝑛
1 2
Gl. log-LH: 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = � 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − 2 � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 + const.
2𝜎𝜎𝜀𝜀
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
2
Residual Sum of Squares: rss𝑛𝑛 𝛉𝛉 ≜ � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑖𝑖=1
1
𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − rss𝑛𝑛 𝛉𝛉
2𝜎𝜎𝜀𝜀2
MLE: � = argmax𝛉𝛉 𝑙𝑙𝑛𝑛 = argmin rss𝑛𝑛 ;
find 𝛉𝛉 � 𝜎𝜎𝜀𝜀2
then find 𝜎𝜎�𝜀𝜀2 = argmax𝜎𝜎𝜀𝜀2 𝑙𝑙𝑛𝑛 𝛉𝛉,
𝜎𝜎𝜀𝜀2 is irrelevant is this step

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 16
MLE for linear regression under Normal errors, III
𝑛𝑛 𝑛𝑛
2 2
RSS is a quadratic form of 𝛉𝛉: rss𝑛𝑛 𝛉𝛉 = � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 = � 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 − 𝜃𝜃1 𝑥𝑥𝑖𝑖
𝑖𝑖=1 𝑖𝑖=1

To min. rss, we assign zero gradient: ∇rss 𝛉𝛉 = 𝟎𝟎 , obtaining a linear sys. of 2 eqs. in 2 vars.
𝑛𝑛
𝜕𝜕 rss𝑛𝑛 𝜕𝜕𝜕𝜕 𝑥𝑥𝑖𝑖 , 𝛉𝛉 𝜕𝜕𝜕𝜕 𝑥𝑥, 𝛉𝛉 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕
= � 2 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 = 𝑥𝑥 𝑗𝑗 i.e. =1; = 𝑥𝑥
𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃0 𝜕𝜕𝜃𝜃1
𝑖𝑖=1
𝑛𝑛
𝑗𝑗
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖
𝑖𝑖=1
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝜕𝜕 rss𝑛𝑛
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 = 0 ⇔ � 𝑦𝑦𝑖𝑖 − 𝑛𝑛 𝜃𝜃̂0 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖 = 0
𝜕𝜕𝜃𝜃0
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

⇔ 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛


𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
𝜕𝜕 rss𝑛𝑛
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖 = 0 ⇔ � 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝜃𝜃̂0 � 𝑥𝑥𝑖𝑖 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖2 = 0
𝜕𝜕𝜃𝜃1
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 17
MLE for linear regression under Normal errors, IV
𝑛𝑛 𝑛𝑛 𝑛𝑛

� 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝜃𝜃̂0 � 𝑥𝑥𝑖𝑖 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖2 = 0 ⇔ 𝑋𝑋𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 𝑌𝑌�𝑛𝑛 + 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛2 − 𝜃𝜃̂1 𝑋𝑋𝑛𝑛2 = 0
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

⇔ 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = 𝜃𝜃̂1 𝑉𝑉�𝑋𝑋,𝑛𝑛 ∀𝑓𝑓: 𝑓𝑓𝑛𝑛̅ ≜ ∑𝑛𝑛𝑖𝑖=1 𝑓𝑓𝑖𝑖 /𝑛𝑛,


rss𝑛𝑛 average
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛
⇔ 𝜃𝜃̂1 =
𝑉𝑉�𝑋𝑋,𝑛𝑛
𝜃𝜃̂0 , 𝜃𝜃̂1 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛
1

close form expressions for estimating


parameters of fitting line.
0

𝑛𝑛
1 2
� 𝜎𝜎𝜀𝜀2
MLE for 𝜎𝜎𝜀𝜀2 : 𝑙𝑙𝑛𝑛 𝛉𝛉, = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − 2 rss� 𝑛𝑛 with � = � 𝑦𝑦𝑖𝑖 − 𝑓𝑓̂𝑖𝑖
� 𝑛𝑛 ≜ rss𝑛𝑛 𝛉𝛉
rss
2𝜎𝜎𝜀𝜀
𝑖𝑖=1
𝜕𝜕𝑙𝑙𝑛𝑛 𝑛𝑛 1 rss
� 𝑛𝑛
2 =− 2+ 4 rss
2
� 𝑛𝑛 = 0 ⇔ 𝜎𝜎�𝜀𝜀 = noise level est.: biased.
𝜕𝜕𝜎𝜎𝜀𝜀 2𝜎𝜎𝜀𝜀 2𝜎𝜎𝜀𝜀 𝑛𝑛 (with 1/ 𝑛𝑛 − 2 : unbiased)

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 18
Example: graph of regression
n = 20 n = 200

-1 0 1 2 3 -1 0 1 2 3

𝑓𝑓
2

Few Many 2

samples: samples:
1.5

1
i

i
1
y

y
𝑓𝑓̂ ≅ 𝑓𝑓
𝑓𝑓̂
0.5 0

0 -1

0.5

𝜎𝜎�𝜀𝜀
1
|

|
𝜎𝜎�𝜀𝜀 ≅ 𝜎𝜎𝜀𝜀
i

i
0.5
|r

|r
0 0
-1 0 1 2 3 -1 0 1 2 3

x x
i i

rss𝑛𝑛 1
rss𝑛𝑛 1

0 0
1

1
-1 -1

� ≅ 𝛉𝛉
𝛉𝛉
0 1 2 3 0 1 2 3

0 0

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 19
Uncertainty of the estimator

Distribution of the estimator: 𝑌𝑌 = 𝑓𝑓 𝑥𝑥, 𝛉𝛉 + 𝜀𝜀 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜀𝜀

𝐘𝐘 = 𝐟𝐟 𝐱𝐱, 𝛉𝛉 + 𝛆𝛆 = 𝜃𝜃0 + 𝜃𝜃1 𝐱𝐱 + 𝛆𝛆 [vector notation]

𝐱𝐱 �|𝛉𝛉, 𝐱𝐱, 𝜎𝜎𝜀𝜀2 = 𝛉𝛉


𝔼𝔼 𝛉𝛉 Estimators are unbiased.
𝚺𝚺Θ�
𝛉𝛉 �
𝛉𝛉
𝐘𝐘 𝜎𝜎𝜀𝜀2 𝑋𝑋𝑛𝑛2 −𝑋𝑋�𝑛𝑛
2
𝜎𝜎�𝜀𝜀 𝑋𝑋𝑛𝑛2 −𝑋𝑋�𝑛𝑛
�|𝛉𝛉, 𝐱𝐱, 𝜎𝜎𝜀𝜀2
𝕍𝕍 𝛉𝛉 = 𝚺𝚺Θ = ≅
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2 𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛 −𝑋𝑋�𝑛𝑛 1 𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛 −𝑋𝑋�𝑛𝑛 1
[proven in next lecture]
Uncertainty is proportional to 1/𝑛𝑛, hence estimators are consistent.
𝐱𝐱
𝛉𝛉, 𝜎𝜎𝜀𝜀2 � 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~ 𝒩𝒩 𝛉𝛉, 𝚺𝚺Θ
𝛉𝛉�𝛉𝛉, � is normally distributed, if 𝜎𝜎𝜀𝜀2 is known.
𝛉𝛉
� 𝜎𝜎�𝜀𝜀2
𝛉𝛉,
�|𝛉𝛉, 𝐱𝐱
𝛉𝛉 ≈ 𝒩𝒩 𝛉𝛉, 𝚺𝚺Θ� � is almost norm. distr., if 𝜎𝜎𝜀𝜀2 is estim.,
𝛉𝛉
but it is actually Student’s t distributed.
𝜎𝜎�𝜀𝜀2 2 2
𝑛𝑛 − 2 2 �𝜎𝜎𝜀𝜀 ~ 𝜒𝜒𝑛𝑛−2 𝜎𝜎�𝜀𝜀2 is chi-squared distributed (prop. to).
𝜎𝜎𝜀𝜀
two parameters have been calibrated 𝜃𝜃0 and 𝜃𝜃1 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 20
Uncertainty of the estimator II

Distribution of the estimator: 𝑌𝑌 = 𝑓𝑓 𝑥𝑥, 𝛉𝛉 + 𝜀𝜀 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜀𝜀

𝐘𝐘 = 𝐟𝐟 𝐱𝐱, 𝛉𝛉 + 𝛆𝛆 = 𝜃𝜃0 + 𝜃𝜃1 𝐱𝐱 + 𝛆𝛆 [vector notation]

𝐱𝐱 𝜎𝜎�𝜀𝜀 𝑋𝑋𝑛𝑛2
𝛉𝛉 � Stand. err.: se
� 0 ≜ se �0 =
� 𝑛𝑛 Θ � 0 �𝜃𝜃0 , 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝜃𝜃0 , se20
Θ
𝛉𝛉 𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛
𝐘𝐘
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2 𝜎𝜎�𝜀𝜀
se
� 1 ≜ se �1 =
� 𝑛𝑛 Θ �1 �𝜃𝜃1 , 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝜃𝜃1 , se12
Θ
𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛

𝐱𝐱 �𝑗𝑗 − 𝜃𝜃𝑗𝑗
Θ
𝛉𝛉, 𝜎𝜎𝜀𝜀2 Estimated 𝜎𝜎𝜀𝜀2 : ~𝑡𝑡𝑛𝑛−2 ≅ 𝜑𝜑 Use Student’s t for small 𝑛𝑛.
� 𝜎𝜎�𝜀𝜀
𝛉𝛉, 2 se
� 𝑗𝑗
large 𝑛𝑛
𝜎𝜎�𝜀𝜀 𝜎𝜎�𝜀𝜀 se
�0
Analysis of stand. err. Special case: if �
𝑋𝑋𝑛𝑛 = 0 ⇒ se �0 = ; se
�1 = = ;
𝑛𝑛 𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛
Errors decay with 1/ 𝑛𝑛, error in the slope 𝜃𝜃̂1 decays also with 1/𝜎𝜎�𝑋𝑋,𝑛𝑛 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 21
Confidence Intervals, hypothesis testing, p-value

Normal approximation: for 𝜃𝜃0 : 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃0 = 𝜃𝜃̂0 − 𝑧𝑧𝛼𝛼/2 se


� 0 ; 𝜃𝜃̂0 + 𝑧𝑧𝛼𝛼/2 se
�0

for 𝜃𝜃1 : 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 = 𝜃𝜃̂1 − 𝑧𝑧𝛼𝛼/2 se


� 1 ; 𝜃𝜃̂1 + 𝑧𝑧𝛼𝛼/2 se
�1

𝐱𝐱 use Student’s t if the dataset is small

𝛉𝛉 �
𝛉𝛉 Typical test: is 𝑋𝑋 affecting 𝑌𝑌?
𝐻𝐻0 : ∀𝑥𝑥 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 = 𝔼𝔼 𝑌𝑌
𝐘𝐘
𝑓𝑓 𝑥𝑥 = 𝑓𝑓0 = 𝜃𝜃0 ⇔ 𝜃𝜃1 = 0
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2
If 0 ∈ 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 , then 𝐻𝐻0 is retained, at 𝛼𝛼 confidence.
𝐶𝐶𝐶𝐶
𝐱𝐱 [ ] : reject 𝐻𝐻0
𝛉𝛉, 𝜎𝜎𝜀𝜀2 0 𝜃𝜃1
� 𝜎𝜎�𝜀𝜀2
𝛉𝛉, [ ] 𝐶𝐶𝐶𝐶 : retain 𝐻𝐻0
0 𝜃𝜃1
p-value: 𝒫𝒫 = 2Φ − 𝜃𝜃̂1 /se
�1 for normal appr.
𝜃𝜃̂1
� 𝐻𝐻 ~𝑡𝑡 ≅ 𝜑𝜑
� 1 0 𝑛𝑛−2
se
𝒫𝒫 < 𝛼𝛼 ⇒ reject 𝐻𝐻0
0 𝜃𝜃1 /se
�1
𝜃𝜃̂1 /se
�1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 22
Prediction of new values of 𝑓𝑓 and 𝑦𝑦, given 𝑥𝑥

Predicting the/a value of 𝑦𝑦 at 𝑥𝑥 = 𝑥𝑥∗ : 𝑦𝑦∗ = 𝑦𝑦 𝑥𝑥∗ = 𝑓𝑓 𝑥𝑥∗ , 𝛉𝛉 + 𝜀𝜀∗ = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥∗ + 𝜀𝜀∗

= 𝑓𝑓∗ + 𝜀𝜀∗ with 𝑓𝑓∗ = 𝑓𝑓 𝑥𝑥∗ , 𝛉𝛉


1
𝑦𝑦�∗ = 𝑓𝑓̂∗ = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥∗ = 𝐯𝐯∗T 𝛉𝛉
� with 𝐯𝐯∗ =
𝑥𝑥∗
𝜎𝜎�𝜀𝜀2 𝑋𝑋𝑛𝑛2 −𝑋𝑋�𝑛𝑛 1
Sq. stand. err.: se2
� 𝑓𝑓∗ = 𝐯𝐯∗T 𝚺𝚺Θ� 𝐯𝐯∗ = 1 𝑥𝑥∗
𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛 −𝑋𝑋�𝑛𝑛 1 𝑥𝑥∗
𝜎𝜎�𝜀𝜀2
= 𝑋𝑋𝑛𝑛2 − 2𝑋𝑋�𝑛𝑛 𝑥𝑥∗ + 𝑥𝑥∗2
𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛
𝜎𝜎�𝜀𝜀2 𝜎𝜎
� 2
𝜀𝜀 𝑋𝑋�𝑛𝑛 − 𝑥𝑥∗ 2
= 𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 + 𝑋𝑋�𝑛𝑛 − 2𝑋𝑋�𝑛𝑛 𝑥𝑥∗ + 𝑥𝑥∗ =
2 2 2 2
1+

𝑛𝑛 𝑉𝑉𝑋𝑋,𝑛𝑛 𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛

se
� 𝑦𝑦∗ = �2
se 𝑓𝑓∗ + 𝜎𝜎�𝜀𝜀2 [ min at 𝑥𝑥∗ = 𝑋𝑋�𝑛𝑛 ⇒ se
� 2 = 𝜎𝜎�𝜀𝜀2 /𝑛𝑛 ]

for 𝑓𝑓∗ : � 𝑓𝑓∗ ; 𝑓𝑓̂∗ + 𝑧𝑧𝛼𝛼/2 se


𝐶𝐶𝐶𝐶𝑛𝑛 𝑓𝑓∗ = 𝑓𝑓̂∗ − 𝑧𝑧𝛼𝛼/2 se � 𝑓𝑓∗
for 𝑦𝑦∗ : � 𝑦𝑦∗ ; 𝑓𝑓̂∗ + 𝑧𝑧𝛼𝛼/2 se
𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ = 𝑓𝑓̂∗ − 𝑧𝑧𝛼𝛼/2 se � 𝑦𝑦∗
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 23
Example of 95% confidence bounds
n = 20

𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ 3

Black straight-line represent the actual


2 regression line 𝑓𝑓.
𝐶𝐶𝐶𝐶𝑛𝑛 𝑓𝑓∗ From dataset with 𝑛𝑛 samples, we estimate
function 𝑓𝑓̂ and noise level 𝜎𝜎�𝜀𝜀2 .
1
Y

𝑓𝑓̂
0
𝑓𝑓 Using the normal assumption, we get a CI for 𝑓𝑓
-1
and for 𝑦𝑦 at any 𝑥𝑥.
-1 0 1 2 3

X 𝐶𝐶𝐶𝐶𝑛𝑛 for 𝑓𝑓 is smaller for central values of 𝑥𝑥.


n = 200
𝐶𝐶𝐶𝐶𝑛𝑛 for 𝑦𝑦 is always larger than 𝐶𝐶𝐶𝐶𝑛𝑛 for 𝑓𝑓.
4

𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ When 𝑛𝑛 is large,


𝐶𝐶𝐶𝐶𝑛𝑛 𝑓𝑓∗ 2
- 𝑓𝑓̂ converges to 𝑓𝑓,
- 𝐶𝐶𝐶𝐶𝑛𝑛 of 𝑓𝑓 converges to a point at 𝑓𝑓,
Y

0
- 𝐶𝐶𝐶𝐶𝑛𝑛 of 𝑦𝑦 comes a normal RV with mean 𝑓𝑓
and std. dev. 𝜎𝜎𝜀𝜀 .
-2

-1 0 1 2 3

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 24
Example Linear regression
2
R = 19 %
80

75
𝑛𝑛 = 40 75

𝑓𝑓̂ = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥


70 70

65 65

𝜃𝜃̂0 = −274.4 MPa

strength [MPa]
strength [MPa]

60 60

55 55

MPa m3
𝜃𝜃̂1 = 0.1368

𝑦𝑦
50 50

45 45
Kg
2400 2420 2440 2460 2480 2400 2420 2440 2460 2480 2500

𝑥𝑥
3 3
density [kg/m ] density [kg/m ]

𝛼𝛼 = 5% ⇒ 1 − 𝛼𝛼 = 95% confidence
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃0 = 𝜃𝜃̂0 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 ; 𝜃𝜃̂0 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 = −500.9; −47.9 MPa
MPa m3
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 = 𝜃𝜃̂1 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 1 ; 𝜃𝜃̂1 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 1 = 0.0442; 0.2295
Kg
𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 : value for t-distr. with 𝑛𝑛 − 2 0 ∉ 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 ⇒ reject 𝐻𝐻0 : 𝜃𝜃1 = 0 ,
degrees of freedom. with significance 𝛼𝛼 = 5%,

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 25
Summary

Linear regression is a simple and important method to investigate relations between vars.
In linear regression, the relation between the regression function and the par.s is linear.
Regression analysis is computationally simple.

From dataset,
� of straight-line:
- we compute some sample moments and then estimated parameters 𝛉𝛉
dataset: sample moments: estimated parameters:
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = 𝑋𝑋𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 𝑌𝑌�𝑛𝑛 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 ⇒ 𝜃𝜃̂1 = , 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛
𝑉𝑉�𝑋𝑋,𝑛𝑛 = 𝑋𝑋�𝑛𝑛2 − 𝑋𝑋𝑛𝑛2 �
𝑉𝑉𝑋𝑋,𝑛𝑛
estimated regression function: ⇒ 𝑓𝑓̂ 𝑥𝑥 = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥

- we can also estimate and noise level 𝜎𝜎�𝜀𝜀2 .


- we can assess the uncertainty of the estimators, e.g. via confidence bounds, and we can
test hypotheses.

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 26
References and readings

Baron, chapters: 11.1


Wasserman, chapters: 13.1-4
Kottegoda, Rosso, chapters: 6.1

https://en.wikipedia.org/wiki/Linear_regression

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 27

You might also like