100% found this document useful (1 vote)

54 views

Linear Regression

Linear regression aims to model the relationship between an input variable X and an output variable Y using linear functions. It estimates the regression function f(x) as the conditional expectation of Y given X=x. The linear regression model assumes f(x,θ) = θ0 + θ1x, where the parameters θ = (θ0, θ1) are estimated to minimize the residuals between the observed and predicted Y values. If the joint distribution of X and Y is estimated to be multi-variate normal, then the regression function f(x) will be a linear function of x.

Uploaded by

John Roncoroni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

54 views

Linear Regression

Uploaded by

John Roncoroni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

12

24 704: Probability and Estimation Methods for Engineering Systems

Lec. 21

Linear Regression

instructor: Matteo Pozzi

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 1
Regression in practice
24 In MS Excel: just “add trendline” to a scatter graph,
22 reporting equation and R-squared value.
20
US GDP in T$

18
16
14 y = 0.5781x - 1146.5 𝑦𝑦 ≅ 𝜃𝜃1 𝑥𝑥 + 𝜃𝜃0
R² = 0.9751
12
10
2000 2005 2010 2015 2020 2025
year

In a program language as R: call the routine for

linear regression, you get optimal estimators,
standard errors, test values, p-values…

We will learn how to compute those values.

https://www.datacamp.com/community/tutorials/linear-regression-R

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 2
Regression

So far, our problem was: given samples of one RV,

estimate some parameters related to its distribution.
In regression problems, instead, we have joint samples
of (at least) two RVS, say 𝑋𝑋 and 𝑌𝑌, and we aim at
understanding the relation between these variables.

y
𝑋𝑋: “input”, 𝑌𝑌: “output”:
How to predict the output as a function of the input?
x

Because of randomness, we cannot predict 𝑌𝑌 deterministically from 𝑋𝑋.

But we can estimate the regression function 𝑓𝑓 𝑥𝑥 ≜ 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 .

Task: 𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥 and assess uncertainty
in estimation.
dataset analysis regression
function

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 3
Intuitive methods for regression
For a given 𝑥𝑥,
� 𝐼𝐼
𝑓𝑓
How to estimate 𝑓𝑓̂ 𝑥𝑥� ≅ 𝑓𝑓 𝑥𝑥� = 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥� ?
2Δ
We could estimate 𝑓𝑓̂ as the arithmetical average
of 𝑦𝑦𝑖𝑖 in the subset of pairs 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 , where 𝑥𝑥𝑖𝑖 = 𝑥𝑥.
�

y
But it is unlikely we can find any sample exactly at 𝑥𝑥.
�
Hence, we can define an interval, 𝐼𝐼 = 𝑥𝑥� − Δ; 𝑥𝑥� + Δ ,
around 𝑥𝑥� and average 𝑦𝑦𝑖𝑖 of samples where the
corresponding 𝑥𝑥𝑖𝑖 is inside the interval 𝐼𝐼: this is an 𝑥𝑥�
x

example of smoothing and non-parametric regression.

In parametric regression, instead, we use all data, and assuming a parametric form for 𝑓𝑓,
𝜃𝜃
e.g.: 𝑓𝑓 𝑥𝑥, 𝛉𝛉 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥, with 𝛉𝛉 = 0
𝜃𝜃1
To estimate 𝑓𝑓̂ ≅ 𝑓𝑓 is to estimate the parameters: 𝛉𝛉 � ≅ 𝛉𝛉,
so that ∀𝑥𝑥: 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥, 𝛉𝛉� = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥.

In linear regression, the relation of 𝛉𝛉 and 𝑓𝑓 is linear: 𝑓𝑓 = 𝐯𝐯 T 𝛉𝛉, for some 𝐯𝐯 depending on 𝑥𝑥.

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 4
Density estimation vs regression
𝑛𝑛
Univariate density estimation: Dataset 𝑥𝑥𝑖𝑖 𝑖𝑖=1 ~IID 𝑝𝑝𝑋𝑋 ⇒ Estimate 𝑝𝑝̂𝑋𝑋 ≅ 𝑝𝑝𝑋𝑋 ;
Parametric estimation: 𝑥𝑥𝑖𝑖 𝑛𝑛𝑖𝑖=1 |𝛉𝛉~IID 𝑝𝑝𝑋𝑋|𝛉𝛉 � ≅ 𝛉𝛉 ;
⇒ Estimate 𝛉𝛉

𝑛𝑛
Bivariate density estimation: Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 ~IID 𝑝𝑝𝑋𝑋,𝑌𝑌 ⇒ Estimate 𝑝𝑝̂𝑋𝑋,𝑌𝑌 ≅ 𝑝𝑝𝑋𝑋,𝑌𝑌 ;
Parametric estimation: 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛 |
𝑖𝑖=1 𝛉𝛉~IID
� ≅ 𝛉𝛉 ;
𝑝𝑝𝑋𝑋,𝑌𝑌|𝛉𝛉 ⇒ Estimate 𝛉𝛉

𝑛𝑛
Chain rule: 𝑝𝑝𝑋𝑋,𝑌𝑌 = 𝑝𝑝𝑋𝑋 𝑝𝑝𝑌𝑌|𝑋𝑋 . (Also) marginal distr. 𝑝𝑝𝑋𝑋 can be estimated from 𝑥𝑥𝑖𝑖 𝑖𝑖=1 ,
𝑛𝑛
but regression is about estimating only 𝑝𝑝𝑌𝑌|𝑋𝑋 from 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 5
Basis of linear regression

Straight line regression function: generating model:

𝜃𝜃0
𝑓𝑓 𝑥𝑥, 𝛉𝛉 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 = 1 𝑥𝑥 𝛉𝛉 with 𝛉𝛉 = 𝑓𝑓
𝜃𝜃1
𝑦𝑦𝑖𝑖
Model for data 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : 𝜀𝜀𝑖𝑖
𝜃𝜃1
𝑌𝑌𝑖𝑖 = 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 + 𝜀𝜀𝑖𝑖 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 + 𝜀𝜀𝑖𝑖 1
𝜃𝜃0
Noise: 𝜀𝜀𝑖𝑖 0 𝑥𝑥𝑖𝑖 𝑥𝑥
𝔼𝔼 𝜀𝜀𝑖𝑖 |𝑋𝑋𝑖𝑖 = 𝑥𝑥 = 0
∀𝑥𝑥 ∈ ℝ: � zero-mean homoscedastic noise
𝕍𝕍 𝜀𝜀𝑖𝑖 |𝑋𝑋𝑖𝑖 = 𝑥𝑥 = 𝜎𝜎𝜀𝜀2

Linear regression: 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛

𝑖𝑖=1
� 𝜎𝜎�𝜀𝜀2
⇒ 𝛉𝛉, inferred model:
𝑓𝑓
𝑓𝑓̂ 𝑥𝑥 ≜ 𝑓𝑓 𝑥𝑥, 𝛉𝛉
� 𝑓𝑓̂𝑖𝑖 ≜ 𝑓𝑓̂ 𝑥𝑥𝑖𝑖 = 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
� 𝑦𝑦𝑖𝑖 𝑓𝑓̂
𝑟𝑟𝑖𝑖
= 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥 = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥𝑖𝑖 𝜃𝜃̂1
𝜃𝜃̂0 1
Residuals: 𝑟𝑟𝑖𝑖 = 𝜀𝜀𝑖𝑖̂ = 𝑦𝑦𝑖𝑖 − 𝑓𝑓̂𝑖𝑖 ≅ 𝜀𝜀𝑖𝑖
0 𝑥𝑥𝑖𝑖 𝑥𝑥

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 6
From MVN to linear regression

One approach to regression would be:

1. estimate the joint density, then
2. derive the conditional (and its mean) from that.

If the estimated joint density is Multi-Variate Normal, then the regression function is linear.

𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝛍𝛍 �
� , 𝚺𝚺 𝑓𝑓̂ 𝑥𝑥 ≅ 𝑓𝑓 𝑥𝑥

dataset parameters of MVN regression function

dataset: 𝑓𝑓̂ : estimated regression line

𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖

𝑌𝑌 𝑌𝑌
estimated
density: 𝛍𝛍 �
�, 𝚺𝚺
𝑋𝑋 𝑋𝑋

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 7
Bivariate Normal RVs, recap

𝑋𝑋 1 1
𝐙𝐙 = ~𝒩𝒩 𝛍𝛍, 𝚺𝚺 ⇔ 𝑝𝑝𝐙𝐙 𝐳𝐳 = exp − 𝐙𝐙 − 𝛍𝛍 T 𝚺𝚺 −1 𝐙𝐙 − 𝛍𝛍
𝑌𝑌 2𝜋𝜋 𝚺𝚺 2

𝜇𝜇𝑋𝑋 𝜎𝜎𝑋𝑋2 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌

Parameters are moments: 𝛍𝛍 = 𝜇𝜇 ; 𝚺𝚺 =
𝑌𝑌 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 𝜎𝜎𝑌𝑌2
1

𝜇𝜇𝑋𝑋 = 0.55;
0.75

𝜇𝜇𝑌𝑌 = 0.55;
𝜎𝜎𝑋𝑋 = 0.13; 0.5

y
𝜎𝜎𝑌𝑌 = 0.10;
0.25

𝜌𝜌 = 0.87;
0
0 0.25 0.5 0.75 1

Contour lines are ellipses,

centered at the mean 𝛍𝛍.
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 8
Estimating a MVN model

𝑛𝑛 | 𝑥𝑥𝑖𝑖
Dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝛍𝛍, 𝚺𝚺 ~IID 𝒩𝒩 𝛍𝛍, 𝚺𝚺 vector notation: 𝐳𝐳𝑖𝑖 = 𝑦𝑦 ;
𝑖𝑖

Likelihood function:
𝑛𝑛 T −1
𝑛𝑛 −1
𝑛𝑛
𝑙𝑙𝑛𝑛 𝛍𝛍, 𝚺𝚺 = − 𝐳𝐳�𝑛𝑛 − 𝛍𝛍 𝚺𝚺 𝐳𝐳�𝑛𝑛 − 𝛍𝛍 − tr 𝚺𝚺 𝐒𝐒𝑛𝑛 − log 𝚺𝚺
2 2 2

with sample average vector:

1 𝑛𝑛 𝑋𝑋�𝑛𝑛
𝐳𝐳�𝑛𝑛 = � 𝐳𝐳𝑖𝑖 =
𝑛𝑛 𝑖𝑖=1 𝑌𝑌�𝑛𝑛

1 𝑛𝑛
and sample covariance matrix: 𝐒𝐒𝑛𝑛 = � 𝐳𝐳𝑖𝑖 − 𝐳𝐳�𝑛𝑛 𝐳𝐳𝑖𝑖 − 𝐳𝐳�𝑛𝑛 T
𝑛𝑛 − 1 𝑖𝑖=1
sample covariance
� ̂
𝑉𝑉𝑋𝑋,𝑛𝑛 𝐶𝐶𝑋𝑋,𝑌𝑌,𝑛𝑛
=
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 𝑉𝑉�𝑌𝑌,𝑛𝑛
sample
variances

0.75

0.5

y
0.25

95% conf. int.

0
0 0.25 0.5 0.75 1

This is related to linear regression:

observing 𝑥𝑥, inferring 𝑌𝑌.
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 10
Estimating coefficients for linear regression in MVN
𝜎𝜎𝑌𝑌
Regression function: 𝑓𝑓 𝑥𝑥 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 = 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 = 𝜇𝜇𝑌𝑌|𝑋𝑋=𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋
𝜎𝜎𝑌𝑌 ℂ 𝑋𝑋, 𝑌𝑌 𝜎𝜎𝑌𝑌
Pars. of regr. func.: 𝜃𝜃1 = 𝜌𝜌 = ; 𝜃𝜃0 = 𝜇𝜇𝑌𝑌 − 𝜌𝜌 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 − 𝜃𝜃1 𝜇𝜇𝑋𝑋 ;
𝜎𝜎𝑋𝑋 𝕍𝕍 𝑋𝑋 𝜎𝜎𝑋𝑋
1 1
Sample averages: 𝑋𝑋�𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 ≅ 𝜇𝜇𝑋𝑋 ; 𝑌𝑌�𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑦𝑦𝑖𝑖 ≅ 𝜇𝜇𝑌𝑌 ;
𝑛𝑛 𝑛𝑛

1
Sample variances: 𝑉𝑉�𝑋𝑋,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 2
≅ 𝕍𝕍 𝑋𝑋 = 𝜎𝜎𝑋𝑋2 ;
𝑛𝑛−1
1
𝑉𝑉�𝑌𝑌,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 2
≅ 𝕍𝕍 𝑌𝑌 = 𝜎𝜎𝑌𝑌2 ;
𝑛𝑛−1

1
Sample covariance: 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛 ≅ ℂ 𝑋𝑋, 𝑌𝑌 = 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 ;
𝑛𝑛−1

𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛

Estimated parameters: 𝜃𝜃̂1 = =
𝑉𝑉�𝑋𝑋,𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 2

𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 11
Understanding the estimated coefficients
The regression line passes by average point 𝛍𝛍 = 𝜇𝜇𝑋𝑋 , 𝜇𝜇𝑌𝑌 . 𝑓𝑓
𝜎𝜎𝑌𝑌 𝜇𝜇𝑌𝑌 𝛍𝛍
Exact regression function: 𝑓𝑓 𝑥𝑥 = 𝜇𝜇𝑌𝑌 + 𝜌𝜌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
𝜎𝜎𝑋𝑋
𝑓𝑓 − 𝜇𝜇𝑌𝑌 𝑥𝑥 − 𝜇𝜇𝑋𝑋
⇒ = 𝜌𝜌
𝜎𝜎𝑌𝑌 𝜎𝜎𝑋𝑋 0 𝜇𝜇𝑋𝑋 𝑥𝑥
𝑥𝑥 − 𝜇𝜇𝑋𝑋 𝑓𝑓 − 𝜇𝜇𝑌𝑌 𝑔𝑔
⇒ 𝑔𝑔 = 𝜌𝜌 𝑢𝑢𝑋𝑋 with 𝑢𝑢𝑋𝑋 = ; 𝑔𝑔 = 𝜌𝜌
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
𝑢𝑢𝑋𝑋 1

The estimated regress. line passes by average point 𝐦𝐦 = 𝑋𝑋�𝑛𝑛 , 𝑌𝑌�𝑛𝑛 . 𝑓𝑓

𝜇𝜇𝑌𝑌 𝛍𝛍 𝑓𝑓̂
Estimated regression funct.: 𝑓𝑓̂ 𝑥𝑥 = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥
𝑌𝑌�𝑛𝑛 𝐦𝐦
𝑓𝑓̂ − 𝑌𝑌�𝑛𝑛 𝑥𝑥 − 𝑋𝑋�𝑛𝑛
⇒ = 𝜌𝜌�
𝜎𝜎�𝑌𝑌,𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛 0 𝜇𝜇𝑋𝑋 𝑋𝑋�𝑛𝑛 𝑥𝑥
𝑥𝑥 − 𝑋𝑋�𝑛𝑛 𝑓𝑓̂ − 𝑌𝑌�𝑛𝑛 𝑔𝑔�
⇒ 𝑔𝑔� = 𝜌𝜌� 𝑢𝑢� 𝑋𝑋 with 𝑢𝑢� 𝑋𝑋 = ; 𝑔𝑔� = 𝜌𝜌�
𝜎𝜎�𝑋𝑋,𝑛𝑛 𝜎𝜎�𝑌𝑌,𝑛𝑛
𝑢𝑢� 𝑋𝑋 1

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 12
Linear regression and MLE
The joint probability of 𝑋𝑋 and 𝑌𝑌 depends on parameters:
𝛉𝛉 : parameters of regression function.
𝜎𝜎𝜀𝜀2 : noise variance.
𝛈𝛈𝑋𝑋 : pars. marginal distribution 𝑝𝑝𝑋𝑋 (but we do not care about it, in regression).
𝑥𝑥
𝑝𝑝 𝑋𝑋, 𝑌𝑌 𝛉𝛉, 𝜎𝜎𝜀𝜀2 , 𝛈𝛈𝑋𝑋 = 𝑝𝑝 𝑌𝑌 𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑝𝑝 𝑋𝑋 𝛈𝛈𝑋𝑋 Chain rule 𝛉𝛉
Joint prob. Conditional prob. 𝑌𝑌
Marginal prob. 𝜎𝜎𝜀𝜀2

LH for observation 𝑖𝑖, 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : ℒ 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑥𝑥𝑗𝑗 𝑥𝑥𝑖𝑖
𝑖𝑖
𝜀𝜀 𝛉𝛉

Conditional independence: 𝑖𝑖 ≠ 𝑗𝑗 ⇒ 𝑌𝑌𝑖𝑖 ⊥ 𝑌𝑌𝑗𝑗 �𝑥𝑥𝑖𝑖 , 𝑥𝑥𝑗𝑗 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 𝑌𝑌𝑗𝑗 𝑌𝑌𝑖𝑖
𝑛𝑛
𝜎𝜎𝜀𝜀2
Global LH for dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 : ℒ𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = �ℒ 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑖𝑖=1

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 13
Linear regression and MLE cont.

Log LH for observation 𝑖𝑖, 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 : 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = log ℒ 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑛𝑛
𝑛𝑛
Log global LH for dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 : 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = � 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2
𝑖𝑖=1

MLE: � 𝜎𝜎�𝜀𝜀2 = argmax 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2

𝛉𝛉,

Function 𝑙𝑙𝑛𝑛 is defined on the domain of straight-line parameters 𝛉𝛉 and

noise level 𝜎𝜎𝜀𝜀2 and it depends on the dataset 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 .
Estimators are the optimal values 𝛉𝛉, � 𝜎𝜎�𝜀𝜀2 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 14
MLE for linear regression under Normal errors, I
LH for normal error: 𝑌𝑌�𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝑓𝑓 𝑋𝑋, 𝛉𝛉 , 𝜎𝜎𝜀𝜀2
1 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝒩𝒩 𝑦𝑦𝑖𝑖 ; 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 , 𝜎𝜎𝜀𝜀2 ∝ φ
𝜀𝜀 𝜎𝜎𝜀𝜀 𝜎𝜎𝜀𝜀

= 𝒩𝒩 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 ; 𝑦𝑦𝑖𝑖 , 𝜎𝜎𝜀𝜀2 by symmetry

generating inferred 𝑟𝑟1

𝜀𝜀2
model: 𝑓𝑓 model: 𝑓𝑓
𝜀𝜀1 𝑓𝑓 𝑟𝑟2 𝑓𝑓
𝑦𝑦1 𝜀𝜀3 𝑦𝑦1 𝑟𝑟3
𝑓𝑓̂
𝑦𝑦2 𝑦𝑦2 𝜃𝜃̂1
𝑦𝑦3 𝜃𝜃̂0 𝑦𝑦3
𝜃𝜃1 𝜃𝜃1
𝜃𝜃0 𝜃𝜃0
1 1
0 𝑥𝑥3 𝑥𝑥1 𝑥𝑥2 𝑥𝑥 0 𝑥𝑥3 𝑥𝑥1 𝑥𝑥2 𝑥𝑥

The inferred line should pass close to the data.

The best trend line minimizes a “penalty”, which is proportional to the squared resid.: 𝑟𝑟𝑖𝑖2 .
Residual Sum of Squares: rss𝑛𝑛 ≜ ∑𝑛𝑛𝑖𝑖=1 𝑟𝑟𝑖𝑖2 . Find 𝛉𝛉 which minimizes rss𝑛𝑛 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 15
MLE for linear regression under Normal errors, II
LH for normal error: 𝑌𝑌�𝑋𝑋, 𝛉𝛉, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝑓𝑓 𝑋𝑋, 𝛉𝛉 , 𝜎𝜎𝜀𝜀2
1 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = 𝒩𝒩 𝑦𝑦𝑖𝑖 ; 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 , 𝜎𝜎𝜀𝜀2 ∝ φ
𝜀𝜀 𝜎𝜎𝜀𝜀 𝜎𝜎𝜀𝜀
log-LH, individual 2
obs: 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = log 𝑝𝑝𝑌𝑌�𝑋𝑋,𝛉𝛉,𝜎𝜎2 𝑦𝑦𝑖𝑖 𝑥𝑥𝑖𝑖 , 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = − log 𝜎𝜎𝜀𝜀 − + const.
𝜀𝜀 2𝜎𝜎𝜀𝜀2
𝑛𝑛 𝑛𝑛
1 2
Gl. log-LH: 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = � 𝑙𝑙 𝑖𝑖 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − 2 � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 + const.
2𝜎𝜎𝜀𝜀
𝑖𝑖=1 𝑖𝑖=1
𝑛𝑛
2
Residual Sum of Squares: rss𝑛𝑛 𝛉𝛉 ≜ � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉
𝑖𝑖=1
1
𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2 = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − rss𝑛𝑛 𝛉𝛉
2𝜎𝜎𝜀𝜀2
MLE: � = argmax𝛉𝛉 𝑙𝑙𝑛𝑛 = argmin rss𝑛𝑛 ;
find 𝛉𝛉 � 𝜎𝜎𝜀𝜀2
then find 𝜎𝜎�𝜀𝜀2 = argmax𝜎𝜎𝜀𝜀2 𝑙𝑙𝑛𝑛 𝛉𝛉,
𝜎𝜎𝜀𝜀2 is irrelevant is this step

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 16
MLE for linear regression under Normal errors, III
𝑛𝑛 𝑛𝑛
2 2
RSS is a quadratic form of 𝛉𝛉: rss𝑛𝑛 𝛉𝛉 = � 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 = � 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 − 𝜃𝜃1 𝑥𝑥𝑖𝑖
𝑖𝑖=1 𝑖𝑖=1

To min. rss, we assign zero gradient: ∇rss 𝛉𝛉 = 𝟎𝟎 , obtaining a linear sys. of 2 eqs. in 2 vars.
𝑛𝑛
𝜕𝜕 rss𝑛𝑛 𝜕𝜕𝜕𝜕 𝑥𝑥𝑖𝑖 , 𝛉𝛉 𝜕𝜕𝜕𝜕 𝑥𝑥, 𝛉𝛉 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕
= � 2 𝑦𝑦𝑖𝑖 − 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 = 𝑥𝑥 𝑗𝑗 i.e. =1; = 𝑥𝑥
𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃𝑗𝑗 𝜕𝜕𝜃𝜃0 𝜕𝜕𝜃𝜃1
𝑖𝑖=1
𝑛𝑛
𝑗𝑗
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖
𝑖𝑖=1
𝑛𝑛 𝑛𝑛 𝑛𝑛
𝜕𝜕 rss𝑛𝑛
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 = 0 ⇔ � 𝑦𝑦𝑖𝑖 − 𝑛𝑛 𝜃𝜃̂0 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖 = 0
𝜕𝜕𝜃𝜃0
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

⇔ 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛

𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
𝜕𝜕 rss𝑛𝑛
= � 2 𝑦𝑦𝑖𝑖 − 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖 = 0 ⇔ � 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝜃𝜃̂0 � 𝑥𝑥𝑖𝑖 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖2 = 0
𝜕𝜕𝜃𝜃1
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 17
MLE for linear regression under Normal errors, IV
𝑛𝑛 𝑛𝑛 𝑛𝑛

� 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − 𝜃𝜃̂0 � 𝑥𝑥𝑖𝑖 − 𝜃𝜃̂1 � 𝑥𝑥𝑖𝑖2 = 0 ⇔ 𝑋𝑋𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 𝑌𝑌�𝑛𝑛 + 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛2 − 𝜃𝜃̂1 𝑋𝑋𝑛𝑛2 = 0
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1

⇔ 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = 𝜃𝜃̂1 𝑉𝑉�𝑋𝑋,𝑛𝑛 ∀𝑓𝑓: 𝑓𝑓𝑛𝑛̅ ≜ ∑𝑛𝑛𝑖𝑖=1 𝑓𝑓𝑖𝑖 /𝑛𝑛,

rss𝑛𝑛 average
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛
⇔ 𝜃𝜃̂1 =
𝑉𝑉�𝑋𝑋,𝑛𝑛
𝜃𝜃̂0 , 𝜃𝜃̂1 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛
1

close form expressions for estimating

parameters of fitting line.
0

𝑛𝑛
1 2
� 𝜎𝜎𝜀𝜀2
MLE for 𝜎𝜎𝜀𝜀2 : 𝑙𝑙𝑛𝑛 𝛉𝛉, = −𝑛𝑛 log 𝜎𝜎𝜀𝜀 − 2 rss� 𝑛𝑛 with � = � 𝑦𝑦𝑖𝑖 − 𝑓𝑓̂𝑖𝑖
� 𝑛𝑛 ≜ rss𝑛𝑛 𝛉𝛉
rss
2𝜎𝜎𝜀𝜀
𝑖𝑖=1
𝜕𝜕𝑙𝑙𝑛𝑛 𝑛𝑛 1 rss
� 𝑛𝑛
2 =− 2+ 4 rss
2
� 𝑛𝑛 = 0 ⇔ 𝜎𝜎�𝜀𝜀 = noise level est.: biased.
𝜕𝜕𝜎𝜎𝜀𝜀 2𝜎𝜎𝜀𝜀 2𝜎𝜎𝜀𝜀 𝑛𝑛 (with 1/ 𝑛𝑛 − 2 : unbiased)

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 18
Example: graph of regression
n = 20 n = 200

-1 0 1 2 3 -1 0 1 2 3

𝑓𝑓
2

Few Many 2

samples: samples:
1.5

1
i

i
1
y

y
𝑓𝑓̂ ≅ 𝑓𝑓
𝑓𝑓̂
0.5 0

0 -1

0.5

𝜎𝜎�𝜀𝜀
1
|

|
𝜎𝜎�𝜀𝜀 ≅ 𝜎𝜎𝜀𝜀
i

i
0.5
|r

|r
0 0
-1 0 1 2 3 -1 0 1 2 3

x x
i i

rss𝑛𝑛 1
rss𝑛𝑛 1

0 0
1

1
-1 -1

� ≅ 𝛉𝛉
𝛉𝛉
0 1 2 3 0 1 2 3

0 0

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 19
Uncertainty of the estimator

Distribution of the estimator: 𝑌𝑌 = 𝑓𝑓 𝑥𝑥, 𝛉𝛉 + 𝜀𝜀 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜀𝜀

𝐘𝐘 = 𝐟𝐟 𝐱𝐱, 𝛉𝛉 + 𝛆𝛆 = 𝜃𝜃0 + 𝜃𝜃1 𝐱𝐱 + 𝛆𝛆 [vector notation]

𝐱𝐱 �|𝛉𝛉, 𝐱𝐱, 𝜎𝜎𝜀𝜀2 = 𝛉𝛉

𝔼𝔼 𝛉𝛉 Estimators are unbiased.
𝚺𝚺Θ�
𝛉𝛉 �
𝛉𝛉
𝐘𝐘 𝜎𝜎𝜀𝜀2 𝑋𝑋𝑛𝑛2 −𝑋𝑋�𝑛𝑛
2
𝜎𝜎�𝜀𝜀 𝑋𝑋𝑛𝑛2 −𝑋𝑋�𝑛𝑛
�|𝛉𝛉, 𝐱𝐱, 𝜎𝜎𝜀𝜀2
𝕍𝕍 𝛉𝛉 = 𝚺𝚺Θ = ≅
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2 𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛 −𝑋𝑋�𝑛𝑛 1 𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛 −𝑋𝑋�𝑛𝑛 1
[proven in next lecture]
Uncertainty is proportional to 1/𝑛𝑛, hence estimators are consistent.
𝐱𝐱
𝛉𝛉, 𝜎𝜎𝜀𝜀2 � 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~ 𝒩𝒩 𝛉𝛉, 𝚺𝚺Θ
𝛉𝛉�𝛉𝛉, � is normally distributed, if 𝜎𝜎𝜀𝜀2 is known.
𝛉𝛉
� 𝜎𝜎�𝜀𝜀2
𝛉𝛉,
�|𝛉𝛉, 𝐱𝐱
𝛉𝛉 ≈ 𝒩𝒩 𝛉𝛉, 𝚺𝚺Θ� � is almost norm. distr., if 𝜎𝜎𝜀𝜀2 is estim.,
𝛉𝛉
but it is actually Student’s t distributed.
𝜎𝜎�𝜀𝜀2 2 2
𝑛𝑛 − 2 2 �𝜎𝜎𝜀𝜀 ~ 𝜒𝜒𝑛𝑛−2 𝜎𝜎�𝜀𝜀2 is chi-squared distributed (prop. to).
𝜎𝜎𝜀𝜀
two parameters have been calibrated 𝜃𝜃0 and 𝜃𝜃1 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 20
Uncertainty of the estimator II

Distribution of the estimator: 𝑌𝑌 = 𝑓𝑓 𝑥𝑥, 𝛉𝛉 + 𝜀𝜀 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜀𝜀

𝐘𝐘 = 𝐟𝐟 𝐱𝐱, 𝛉𝛉 + 𝛆𝛆 = 𝜃𝜃0 + 𝜃𝜃1 𝐱𝐱 + 𝛆𝛆 [vector notation]

𝐱𝐱 𝜎𝜎�𝜀𝜀 𝑋𝑋𝑛𝑛2
𝛉𝛉 � Stand. err.: se
� 0 ≜ se �0 =
� 𝑛𝑛 Θ � 0 �𝜃𝜃0 , 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝜃𝜃0 , se20
Θ
𝛉𝛉 𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛
𝐘𝐘
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2 𝜎𝜎�𝜀𝜀
se
� 1 ≜ se �1 =
� 𝑛𝑛 Θ �1 �𝜃𝜃1 , 𝐱𝐱, 𝜎𝜎𝜀𝜀2 ~𝒩𝒩 𝜃𝜃1 , se12
Θ
𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛

𝐱𝐱 �𝑗𝑗 − 𝜃𝜃𝑗𝑗
Θ
𝛉𝛉, 𝜎𝜎𝜀𝜀2 Estimated 𝜎𝜎𝜀𝜀2 : ~𝑡𝑡𝑛𝑛−2 ≅ 𝜑𝜑 Use Student’s t for small 𝑛𝑛.
� 𝜎𝜎�𝜀𝜀
𝛉𝛉, 2 se
� 𝑗𝑗
large 𝑛𝑛
𝜎𝜎�𝜀𝜀 𝜎𝜎�𝜀𝜀 se
�0
Analysis of stand. err. Special case: if �
𝑋𝑋𝑛𝑛 = 0 ⇒ se �0 = ; se
�1 = = ;
𝑛𝑛 𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛 𝜎𝜎�𝑋𝑋,𝑛𝑛
Errors decay with 1/ 𝑛𝑛, error in the slope 𝜃𝜃̂1 decays also with 1/𝜎𝜎�𝑋𝑋,𝑛𝑛 .

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 21
Confidence Intervals, hypothesis testing, p-value

Normal approximation: for 𝜃𝜃0 : 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃0 = 𝜃𝜃̂0 − 𝑧𝑧𝛼𝛼/2 se

� 0 ; 𝜃𝜃̂0 + 𝑧𝑧𝛼𝛼/2 se
�0

for 𝜃𝜃1 : 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 = 𝜃𝜃̂1 − 𝑧𝑧𝛼𝛼/2 se

� 1 ; 𝜃𝜃̂1 + 𝑧𝑧𝛼𝛼/2 se
�1

𝐱𝐱 use Student’s t if the dataset is small

𝛉𝛉 �
𝛉𝛉 Typical test: is 𝑋𝑋 affecting 𝑌𝑌?
𝐻𝐻0 : ∀𝑥𝑥 𝔼𝔼 𝑌𝑌|𝑋𝑋 = 𝑥𝑥 = 𝔼𝔼 𝑌𝑌
𝐘𝐘
𝑓𝑓 𝑥𝑥 = 𝑓𝑓0 = 𝜃𝜃0 ⇔ 𝜃𝜃1 = 0
𝜎𝜎𝜀𝜀2 𝜎𝜎�𝜀𝜀2
If 0 ∈ 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 , then 𝐻𝐻0 is retained, at 𝛼𝛼 confidence.
𝐶𝐶𝐶𝐶
𝐱𝐱 [ ] : reject 𝐻𝐻0
𝛉𝛉, 𝜎𝜎𝜀𝜀2 0 𝜃𝜃1
� 𝜎𝜎�𝜀𝜀2
𝛉𝛉, [ ] 𝐶𝐶𝐶𝐶 : retain 𝐻𝐻0
0 𝜃𝜃1
p-value: 𝒫𝒫 = 2Φ − 𝜃𝜃̂1 /se
�1 for normal appr.
𝜃𝜃̂1
� 𝐻𝐻 ~𝑡𝑡 ≅ 𝜑𝜑
� 1 0 𝑛𝑛−2
se
𝒫𝒫 < 𝛼𝛼 ⇒ reject 𝐻𝐻0
0 𝜃𝜃1 /se
�1
𝜃𝜃̂1 /se
�1
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 22
Prediction of new values of 𝑓𝑓 and 𝑦𝑦, given 𝑥𝑥

Predicting the/a value of 𝑦𝑦 at 𝑥𝑥 = 𝑥𝑥∗ : 𝑦𝑦∗ = 𝑦𝑦 𝑥𝑥∗ = 𝑓𝑓 𝑥𝑥∗ , 𝛉𝛉 + 𝜀𝜀∗ = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥∗ + 𝜀𝜀∗

= 𝑓𝑓∗ + 𝜀𝜀∗ with 𝑓𝑓∗ = 𝑓𝑓 𝑥𝑥∗ , 𝛉𝛉

1
𝑦𝑦�∗ = 𝑓𝑓̂∗ = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥∗ = 𝐯𝐯∗T 𝛉𝛉
� with 𝐯𝐯∗ =
𝑥𝑥∗
𝜎𝜎�𝜀𝜀2 𝑋𝑋𝑛𝑛2 −𝑋𝑋�𝑛𝑛 1
Sq. stand. err.: se2
� 𝑓𝑓∗ = 𝐯𝐯∗T 𝚺𝚺Θ� 𝐯𝐯∗ = 1 𝑥𝑥∗
𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛 −𝑋𝑋�𝑛𝑛 1 𝑥𝑥∗
𝜎𝜎�𝜀𝜀2
= 𝑋𝑋𝑛𝑛2 − 2𝑋𝑋�𝑛𝑛 𝑥𝑥∗ + 𝑥𝑥∗2
𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛
𝜎𝜎�𝜀𝜀2 𝜎𝜎
� 2
𝜀𝜀 𝑋𝑋�𝑛𝑛 − 𝑥𝑥∗ 2
= 𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 + 𝑋𝑋�𝑛𝑛 − 2𝑋𝑋�𝑛𝑛 𝑥𝑥∗ + 𝑥𝑥∗ =
2 2 2 2
1+
�
𝑛𝑛 𝑉𝑉𝑋𝑋,𝑛𝑛 𝑛𝑛 𝑉𝑉�𝑋𝑋,𝑛𝑛

se
� 𝑦𝑦∗ = �2
se 𝑓𝑓∗ + 𝜎𝜎�𝜀𝜀2 [ min at 𝑥𝑥∗ = 𝑋𝑋�𝑛𝑛 ⇒ se
� 2 = 𝜎𝜎�𝜀𝜀2 /𝑛𝑛 ]

for 𝑓𝑓∗ : � 𝑓𝑓∗ ; 𝑓𝑓̂∗ + 𝑧𝑧𝛼𝛼/2 se

𝐶𝐶𝐶𝐶𝑛𝑛 𝑓𝑓∗ = 𝑓𝑓̂∗ − 𝑧𝑧𝛼𝛼/2 se � 𝑓𝑓∗
for 𝑦𝑦∗ : � 𝑦𝑦∗ ; 𝑓𝑓̂∗ + 𝑧𝑧𝛼𝛼/2 se
𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ = 𝑓𝑓̂∗ − 𝑧𝑧𝛼𝛼/2 se � 𝑦𝑦∗
12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 23
Example of 95% confidence bounds
n = 20

𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ 3

Black straight-line represent the actual

2 regression line 𝑓𝑓.
𝐶𝐶𝐶𝐶𝑛𝑛 𝑓𝑓∗ From dataset with 𝑛𝑛 samples, we estimate
function 𝑓𝑓̂ and noise level 𝜎𝜎�𝜀𝜀2 .
1
Y

𝑓𝑓̂
0
𝑓𝑓 Using the normal assumption, we get a CI for 𝑓𝑓
-1
and for 𝑦𝑦 at any 𝑥𝑥.
-1 0 1 2 3

X 𝐶𝐶𝐶𝐶𝑛𝑛 for 𝑓𝑓 is smaller for central values of 𝑥𝑥.

n = 200
𝐶𝐶𝐶𝐶𝑛𝑛 for 𝑦𝑦 is always larger than 𝐶𝐶𝐶𝐶𝑛𝑛 for 𝑓𝑓.
4

𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ When 𝑛𝑛 is large,

𝐶𝐶𝐶𝐶𝑛𝑛 𝑓𝑓∗ 2
- 𝑓𝑓̂ converges to 𝑓𝑓,
- 𝐶𝐶𝐶𝐶𝑛𝑛 of 𝑓𝑓 converges to a point at 𝑓𝑓,
Y

0
- 𝐶𝐶𝐶𝐶𝑛𝑛 of 𝑦𝑦 comes a normal RV with mean 𝑓𝑓
and std. dev. 𝜎𝜎𝜀𝜀 .
-2

-1 0 1 2 3

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 24
Example Linear regression
2
R = 19 %
80

75
𝑛𝑛 = 40 75

𝑓𝑓̂ = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥

70 70

65 65

𝜃𝜃̂0 = −274.4 MPa

strength [MPa]
strength [MPa]

60 60

55 55

MPa m3
𝜃𝜃̂1 = 0.1368

𝑦𝑦
50 50

45 45
Kg
2400 2420 2440 2460 2480 2400 2420 2440 2460 2480 2500

𝑥𝑥
3 3
density [kg/m ] density [kg/m ]

𝛼𝛼 = 5% ⇒ 1 − 𝛼𝛼 = 95% confidence
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃0 = 𝜃𝜃̂0 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 ; 𝜃𝜃̂0 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 0 = −500.9; −47.9 MPa
MPa m3
𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 = 𝜃𝜃̂1 − 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 1 ; 𝜃𝜃̂1 + 𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 se
� 1 = 0.0442; 0.2295
Kg
𝑡𝑡𝛼𝛼/2,𝑛𝑛−2 : value for t-distr. with 𝑛𝑛 − 2 0 ∉ 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 ⇒ reject 𝐻𝐻0 : 𝜃𝜃1 = 0 ,
degrees of freedom. with significance 𝛼𝛼 = 5%,

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 25
Summary

Linear regression is a simple and important method to investigate relations between vars.
In linear regression, the relation between the regression function and the par.s is linear.
Regression analysis is computationally simple.

From dataset,
� of straight-line:
- we compute some sample moments and then estimated parameters 𝛉𝛉
dataset: sample moments: estimated parameters:
𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = 𝑋𝑋𝑋𝑋𝑛𝑛 − 𝑋𝑋�𝑛𝑛 𝑌𝑌�𝑛𝑛 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛𝑖𝑖=1 ⇒ 𝜃𝜃̂1 = , 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛
𝑉𝑉�𝑋𝑋,𝑛𝑛 = 𝑋𝑋�𝑛𝑛2 − 𝑋𝑋𝑛𝑛2 �
𝑉𝑉𝑋𝑋,𝑛𝑛
estimated regression function: ⇒ 𝑓𝑓̂ 𝑥𝑥 = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥

- we can also estimate and noise level 𝜎𝜎�𝜀𝜀2 .

- we can assess the uncertainty of the estimators, e.g. via confidence bounds, and we can
test hypotheses.

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 26
References and readings

Baron, chapters: 11.1

Wasserman, chapters: 13.1-4
Kottegoda, Rosso, chapters: 6.1

https://en.wikipedia.org/wiki/Linear_regression

12
24 704: Prob Est Eng Sys Lec. 21 Linear Regression 27

Henrietta Solar PV Project: Document Prepared by AERA Group On Behalf of Akuo Energy (Mauritius) LTD
No ratings yet
Henrietta Solar PV Project: Document Prepared by AERA Group On Behalf of Akuo Energy (Mauritius) LTD
17 pages
Mode Linear Regression SQL
100% (1)
Mode Linear Regression SQL
21 pages
5 - Ratio Regression and Difference Estimation - Revised
No ratings yet
5 - Ratio Regression and Difference Estimation - Revised
39 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Management and Operational Information: EN 50600-3-1:2016 STANDARD BY: Amalia Fiqhiyah 1202164314 Si40Int
No ratings yet
Management and Operational Information: EN 50600-3-1:2016 STANDARD BY: Amalia Fiqhiyah 1202164314 Si40Int
11 pages
Regression in Data Mining
No ratings yet
Regression in Data Mining
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Linear Regression and Tire Correlation
No ratings yet
Linear Regression and Tire Correlation
54 pages
Polymath Tutorial On Non-Linear Regression (Example 7-4) : A Pco 2 1+ 2
No ratings yet
Polymath Tutorial On Non-Linear Regression (Example 7-4) : A Pco 2 1+ 2
10 pages
Project 1 - Radio Link Failure Prediction
100% (1)
Project 1 - Radio Link Failure Prediction
8 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Cloud Motion Tracking (1) (Read-Only)
100% (1)
Cloud Motion Tracking (1) (Read-Only)
10 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
PR01
100% (1)
PR01
41 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Vinee
100% (1)
Vinee
28 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
10 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Wavelet Toolbox™ User's Guide PDF
No ratings yet
Wavelet Toolbox™ User's Guide PDF
617 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Presentation GPT 4
100% (1)
Presentation GPT 4
25 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Book
100% (1)
Book
480 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Hypothesis and Hypothesis Testing
100% (1)
Hypothesis and Hypothesis Testing
59 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
64 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
9 Regression
100% (1)
9 Regression
14 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Weather Prediction Mode
100% (1)
Weather Prediction Mode
4 pages
Regression (Basic Concepts)
No ratings yet
Regression (Basic Concepts)
39 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Sampling Techniques in Research Methodology: Presented by
No ratings yet
Sampling Techniques in Research Methodology: Presented by
18 pages
Lecture 20 - KEY - Multiple Linear Regression Worksheet
No ratings yet
Lecture 20 - KEY - Multiple Linear Regression Worksheet
4 pages
Joint Probability Mass Function
No ratings yet
Joint Probability Mass Function
8 pages
Does Eye Color Depend On Gender? It Might Depend On Who or How You Ask
No ratings yet
Does Eye Color Depend On Gender? It Might Depend On Who or How You Ask
11 pages
Biserial
No ratings yet
Biserial
2 pages
Indian Institute of Management Kashipur: Business Statistics, Term I, Academic Year 2021-2022 Syllabus
No ratings yet
Indian Institute of Management Kashipur: Business Statistics, Term I, Academic Year 2021-2022 Syllabus
8 pages
Solutions Chapter 5
100% (5)
Solutions Chapter 5
36 pages
Vegan
No ratings yet
Vegan
295 pages
Formula For Hypothesis Testing
No ratings yet
Formula For Hypothesis Testing
5 pages
Chapter 8
No ratings yet
Chapter 8
3 pages
Skew & Curtosis
No ratings yet
Skew & Curtosis
2 pages
Factor Hair Revised PDF
100% (1)
Factor Hair Revised PDF
25 pages
Part A
No ratings yet
Part A
4 pages
SCA - Module 5
No ratings yet
SCA - Module 5
37 pages
BUSD2027 QualityMgmt Module2
No ratings yet
BUSD2027 QualityMgmt Module2
168 pages
Sampling & Research Design
No ratings yet
Sampling & Research Design
20 pages
Time Series Models: Zeeshan Khan
No ratings yet
Time Series Models: Zeeshan Khan
29 pages
OpenStax Statistics CH02
No ratings yet
OpenStax Statistics CH02
36 pages
Statistical Methods for the Analysis of Biomedical Data Second Edition Robert F. Woolson - The ebook with rich content is ready for you to download
100% (1)
Statistical Methods for the Analysis of Biomedical Data Second Edition Robert F. Woolson - The ebook with rich content is ready for you to download
57 pages
Chi-Square Assignment
No ratings yet
Chi-Square Assignment
4 pages
DP Geography Asw IA Example G Ann en
No ratings yet
DP Geography Asw IA Example G Ann en
19 pages
Stats Paper 3
No ratings yet
Stats Paper 3
20 pages
Decision Analysis: Matthew Scotch, PHD, MPH
No ratings yet
Decision Analysis: Matthew Scotch, PHD, MPH
29 pages
Sampling: Gaurav Kumar Prajapat Sr. Audit Officer O/O Ag (Audit-Ii), Rajasthan Jaipur
No ratings yet
Sampling: Gaurav Kumar Prajapat Sr. Audit Officer O/O Ag (Audit-Ii), Rajasthan Jaipur
61 pages
Cfa Amos
No ratings yet
Cfa Amos
7 pages
MA1312 Assignment 01 (July 24)
No ratings yet
MA1312 Assignment 01 (July 24)
2 pages
Brochure 2021
No ratings yet
Brochure 2021
4 pages
T20 Cricket Score Prediction.
No ratings yet
T20 Cricket Score Prediction.
17 pages
Loyola College (Autonomous), Chennai - 600 034: B.SC - Degree Examination - Statistics ST 5509-Regression Analysis
No ratings yet
Loyola College (Autonomous), Chennai - 600 034: B.SC - Degree Examination - Statistics ST 5509-Regression Analysis
2 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

12

24 704: Probability and Estimation Methods for Engineering Systems

instructor: Matteo Pozzi

In a program language as R: call the routine for

We will learn how to compute those values.

So far, our problem was: given samples of one RV,

Because of randomness, we cannot predict 𝑌𝑌 deterministically from 𝑋𝑋.

example of smoothing and non-parametric regression.

Straight line regression function: generating model:

Linear regression: 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑛𝑛

One approach to regression would be:

dataset parameters of MVN regression function

dataset: 𝑓𝑓̂ : estimated regression line

𝜇𝜇𝑋𝑋 𝜎𝜎𝑋𝑋2 𝜌𝜌𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌

Contour lines are ellipses,

with sample average vector:

95% conf. int.

This is related to linear regression:

𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑋𝑋�𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝑌𝑌�𝑛𝑛

𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛

The estimated regress. line passes by average point 𝐦𝐦 = 𝑋𝑋�𝑛𝑛 , 𝑌𝑌�𝑛𝑛 . 𝑓𝑓

MLE: � 𝜎𝜎�𝜀𝜀2 = argmax 𝑙𝑙𝑛𝑛 𝛉𝛉, 𝜎𝜎𝜀𝜀2

Function 𝑙𝑙𝑛𝑛 is defined on the domain of straight-line parameters 𝛉𝛉 and

= 𝒩𝒩 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝛉𝛉 ; 𝑦𝑦𝑖𝑖 , 𝜎𝜎𝜀𝜀2 by symmetry

generating inferred 𝑟𝑟1

The inferred line should pass close to the data.

⇔ 𝜃𝜃̂0 = 𝑌𝑌�𝑛𝑛 − 𝜃𝜃̂1 𝑋𝑋�𝑛𝑛

⇔ 𝐶𝐶̂𝑋𝑋,𝑌𝑌,𝑛𝑛 = 𝜃𝜃̂1 𝑉𝑉�𝑋𝑋,𝑛𝑛 ∀𝑓𝑓: 𝑓𝑓𝑛𝑛̅ ≜ ∑𝑛𝑛𝑖𝑖=1 𝑓𝑓𝑖𝑖 /𝑛𝑛,

close form expressions for estimating

Distribution of the estimator: 𝑌𝑌 = 𝑓𝑓 𝑥𝑥, 𝛉𝛉 + 𝜀𝜀 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜀𝜀

𝐘𝐘 = 𝐟𝐟 𝐱𝐱, 𝛉𝛉 + 𝛆𝛆 = 𝜃𝜃0 + 𝜃𝜃1 𝐱𝐱 + 𝛆𝛆 [vector notation]

𝐱𝐱 �|𝛉𝛉, 𝐱𝐱, 𝜎𝜎𝜀𝜀2 = 𝛉𝛉

Distribution of the estimator: 𝑌𝑌 = 𝑓𝑓 𝑥𝑥, 𝛉𝛉 + 𝜀𝜀 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜀𝜀

𝐘𝐘 = 𝐟𝐟 𝐱𝐱, 𝛉𝛉 + 𝛆𝛆 = 𝜃𝜃0 + 𝜃𝜃1 𝐱𝐱 + 𝛆𝛆 [vector notation]

Normal approximation: for 𝜃𝜃0 : 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃0 = 𝜃𝜃̂0 − 𝑧𝑧𝛼𝛼/2 se

for 𝜃𝜃1 : 𝐶𝐶𝐶𝐶𝑛𝑛 𝜃𝜃1 = 𝜃𝜃̂1 − 𝑧𝑧𝛼𝛼/2 se

𝐱𝐱 use Student’s t if the dataset is small

= 𝑓𝑓∗ + 𝜀𝜀∗ with 𝑓𝑓∗ = 𝑓𝑓 𝑥𝑥∗ , 𝛉𝛉

for 𝑓𝑓∗ : � 𝑓𝑓∗ ; 𝑓𝑓̂∗ + 𝑧𝑧𝛼𝛼/2 se

Black straight-line represent the actual

X 𝐶𝐶𝐶𝐶𝑛𝑛 for 𝑓𝑓 is smaller for central values of 𝑥𝑥.

𝐶𝐶𝐶𝐶𝑛𝑛 𝑦𝑦∗ When 𝑛𝑛 is large,

𝑓𝑓̂ = 𝜃𝜃̂0 + 𝜃𝜃̂1 𝑥𝑥

𝜃𝜃̂0 = −274.4 MPa

- we can also estimate and noise level 𝜎𝜎�𝜀𝜀2 .

Baron, chapters: 11.1

You might also like