0% found this document useful (0 votes)

10 views

Lecture 3

The document discusses linear regression and least squares estimation. It introduces the concept of minimizing the sum of squared residuals to estimate regression coefficients. The normal equations are derived from taking the partial derivatives of this function and setting them equal to zero. Solving the normal equations provides the least squares estimates of the regression coefficients. Several properties of the least squares solution are also described.

Uploaded by

Noman Shahzad

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lecture 3

Uploaded by

Noman Shahzad

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Regression Estimation – Least

Squares and Maximum Likelihood

Dr. Frank Wood

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 1

Least Squares Max(min)imization
• Function to minimize w.r.t. β, β
n 2
Q= i=1 (Yi − (β0 + β1 Xi ))
• Minimize this by maximizing –Q
• Find partials and set both equal to zero

go to board

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 2

Normal Equations
• The result of this maximization step are called
the normal equations. b0 and b1 are called
point estimators of β and β respectively

Yi = nb0 + b1 Xi

2
Xi Yi = b0 Xi + b1 Xi

• This is a system of two equations and two

unknowns. The solution is given by…
Write these on board
Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 3
Solution to Normal Equations
• After a lot of algebra one arrives at

(Xi − X̄)(Yi − Ȳ )
b1 =
(Xi − X̄)2
b0 = Ȳ − b1 X̄

Xi
X̄ =
n
Yi
Ȳ =
n

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 4

Least Squares Fit
?
40
Estimate, y = 2.09x + 8.36, mse: 4.15
True, y = 2x + 9, mse: 4.22
35

30
Response/Output

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 5

Guess #1
40
Guess, y = 0x + 21.2, mse: 37.1
True, y = 2x + 9, mse: 4.22
35

30
Response/Output

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 6

Guess #2
40
Guess, y = 1.5x + 13, mse: 7.84
True, y = 2x + 9, mse: 4.22
35

30
Response/Output

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 7

Looking Ahead: Matrix Least Squares
   
Y1 X1 1
 Y2   X 2 1 

    β1
 ..  =  .. 
 .   .  β0
Yn Xn 1

• Solution to this equation is solution to least squares

linear regression (and maximum likelihood under
normal error distribution assumption)

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 8

Questions to Ask
• Is the relationship really linear?
• What is the distribution of the of “errors”?
• Is the fit good?
• How much of the variability of the response is
accounted for by including the predictor
variable?
• Is the chosen predictor variable the best one?

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 9

Is This Better?
40
7 Order, mse: 3.18

30
Response/Output

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 10

Goals for First Half of Course
• How to do linear regression
– Self familiarization with software tools
• How to interpret standard linear regression
results
• How to derive tests
• How to assess and address deficiencies in
regression models

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 11

Properties of Solution
• The ith residual is defined to be

ei = Yi − Ŷi
• The sum of the residuals is zero:

ei = (Yi − b0 − b1 Xi )
i

= Yi − nb0 − b1 Xi
= 0 By first normal equation.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 12

Properties of Solution
• The sum of the observed values Yi equals the
sum of the fitted values Ŷi

Yi = Ŷi
i i

= (b1 Xi + b0 )
i

= (b1 Xi + Ȳ − b1 X̄)
i

= b1 Xi + nȲ − b1 nX̄
i

= b1 nX̄ + Yi − b1 nX̄
i
Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 13
Properties of Solution
• The sum of the weighted residuals is zero
when the residual in the ith trial is weighted by
the level of the predictor variable in the ith trial

X i ei = (Xi (Yi − b0 − b1 Xi ))
i

= Xi Yi − b0 Xi − b1 (Xi2 )
i
= 0
By second normal equation.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 14

Properties of Solution
• The sum of the weighted residuals is zero
when the residual in the ith trial is weighted by
the fitted value of the response variable for
the ith trial

Ŷi ei = (b0 + b1 Xi )ei
i i

= b0 ei + b1 ei Xi
i i
= 0
By previous properties.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 15

Properties of Solution
• The regression line always goes through the
point
X̄, Ȳ

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 16

Estimating Error Term Variance σ
• Review estimation in non-regression setting.
• Show estimation results for regression
setting.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 17

Estimation Review
• An estimator is a rule that tells how to
calculate the value of an estimate based on
the measurements contained in a sample

• i.e. the sample mean

1
n
Ȳ = n i=1 Yi

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 18

Point Estimators and Bias
• Point estimator

θ̂ = f ({Y1 , . . . , Yn })
• Unknown quantity / parameter

θ
• Definition: Bias of estimator

B(θ̂) = E(θ̂) − θ

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 19

One Sample Example
0.7
µ = 5, σ = 0.75
samples
0.6
θ
est. θ
0.5

0.4

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9 10

run bias_example_plot.m
Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 20
Distribution of Estimator
• If the estimator is a function of the samples
and the distribution of the samples is known
then the distribution of the estimator can
(often) be determined
– Methods
• Distribution (CDF) functions
• Transformations
• Moment generating functions
• Jacobians (change of variable)

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 21

Example
• Samples from a Normal(µ,σ) distribution

Yi ∼ Normal(µ, σ 2 )

• Estimate the population mean

1
n
θ = µ, θ̂ = Ȳ = n i=1 Yi

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 22

Sampling Distribution of the Estimator
• First moment
n
1
E(θ̂) = E( Yi )
n i=1
n
1 nµ
= E(Yi ) = =θ
n i=1 n
• This is an example of an unbiased estimator

B(θ̂) = E(θ̂) − θ = 0

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 23

Variance of Estimator
• Definition: Variance of estimator

V (θ̂) = E([θ̂ − E(θ̂)]2 )

• Remember:
V (cY ) = c2 V (Y )
n n
V ( i=1 Yi ) = i=1 V (Yi )
Only if the Yi are independent with finite variance

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 24

Example Estimator Variance
• For N(0,1) mean estimator
n
1
V (θ̂) = V( Yi )
n i=1
n
1 nσ 2 σ2
= V (Y i ) = =
n2 i=1 n2 n

• Note assumptions

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 25

Distribution of sample mean estimator
0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

1000 samples
Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 26
Bias Variance Trade-off
• The mean squared error of an estimator

M SE(θ̂) = E([θ̂ − θ]2 )

• Can be re-expressed

M SE(θ̂) = V (θ̂) + (B(θ̂)2 )

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 27

MSE = VAR + BIAS2
• Proof
M SE(θ̂) = E((θ̂ − θ)2 )
= E(([θ̂ − E(θ̂)] + [E(θ̂) − θ])2 )
= E([θ̂ − E(θ̂)]2 ) + 2E([E(θ̂) − θ][θ̂ − E(θ̂)]) + E([E(θ̂) − θ]2 )
= V (θ̂) + 2E([E(θ̂)[θ̂ − E(θ̂)] − θ[θ̂ − E(θ̂)])) + (B(θ̂))2
= V (θ̂) + 2(0 + 0) + (B(θ̂))2
= V (θ̂) + (B(θ̂))2

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 28

Trade-off
• Think of variance as confidence and bias as
correctness.
– Intuitions (largely) apply
• Sometimes a biased estimator can produce
lower MSE if it lowers the variance.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 29

Estimating Error Term Variance σ
• Regression model
• Variance of each observation Yi is σ (the
same as for the error term ǫi)
• Each Yi comes from a different probability
distribution with different means that depend
on the level Xi
• The deviation of an observation Yi must be
calculated around its own estimated mean.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 30

s2 estimator for σ
2

SSE (Yi −Ŷi ) e2i
s2 = M SE = n−2 = n−2 = n−2

• MSE is an unbiased estimator of σ

E(M SE) = σ 2

• The sum of squares SSE has n-2 degrees of

freedom associated with it.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 31

Normal Error Regression Model
• No matter how the error terms ǫi are
distributed, the least squares method
provides unbiased point estimators of β and
β
– that also have minimum variance among all
unbiased linear estimators
• To set up interval estimates and make tests
we need to specify the distribution of the ǫi
• We will assume that the ǫi are normally
distributed.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 32

Normal Error Regression Model
Yi = β0 + β1 Xi + ǫi
• Yi value of the response variable in the ith trial
• β and β are parameters
• Xi is a known constant, the value of the
predictor variable in the ith trial
• ǫi ~iid N(0,σ)
• i = 1,…,n

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 33

Notational Convention
• When you see ǫi ~iid N(0,σ)

• It is read as ǫi is distributed identically and

independently according to a normal
distribution with mean 0 and variance σ

• Examples
– θ ~ Poisson(λ)
– z ~ G(θ)

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 34

Maximum Likelihood Principle
• The method of maximum likelihood chooses
as estimates those values of the parameters
that are most consistent with the sample data.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 35

Likelihood Function
• If
Xi ∼ F (Θ), i = 1 . . . n

then the likelihood function is

n
L({Xi }ni=1 , Θ) = i=1 F (Xi ; Θ)

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 36

Example, N(10,3) Density, Single Obs.
0.14
Samples
N(10, 3) Density
0.12

0.1

0.08

0.06

0.04

0.02

0
0 2 4 6 8 10 12 14 16 18 20

N=10, - log likelihood = 4.3038

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 37

Example, N(10,3) Density, Single Obs. Again
0.14
Samples
N(10, 3) Density
0.12

0.1

0.08

0.06

0.04

0.02

0
0 2 4 6 8 10 12 14 16 18 20

N=10, - log likelihood = 4.3038

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 38

Example, N(10,3) Density, Multiple Obs.
0.14
Samples
N(10, 3) Density
0.12

0.1

0.08

0.06

0.04

0.02

0
0 5 10 15 20 25

N=10, - log likelihood = 36.2204

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 39

Maximum Likelihood Estimation
• The likelihood function can be maximized
w.r.t. the parameter(s) Θ, doing this one can
arrive at estimators for parameters as well.
n
n
L({Xi }i=1 , Θ) = i=1 F (Xi ; Θ)

• To do this, find solutions to (analytically or by

following gradient)
dL({Xi }n
i=1 ,Θ)
dΘ =0
Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 40
Important Trick
• Never (almost) maximize the likelihood
function, maximize the log likelihood function
instead.
n
log(L({Xi }ni=1 , Θ)) = log( F (Xi ; Θ))
i=1
n

= log(F (Xi ; Θ))
i=1
Quite often the log of the density is easier to
work with mathematically.

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 41

ML Normal Regression
• Likelihood function
n

2 1 − 12 (Yi −β0 −β1 Xi )2
L(β0 , β1 , σ ) = 2 )1/2
e 2σ

i=1
(2πσ
1 n
− 21
(Yi −β0 −β1 Xi )2
= e 2σ i=1
(2πσ 2 )n/2

which if you maximize (how?) w.r.t. to the

parameters you get…

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 42

Maximum Likelihood Estimator(s)
• β
– b0 same as in least squares case
• β
– b1 same as in least squares case
• σ
(Yi −Ŷi )2
σ̂ 2 = i
n

• Note that ML estimator is biased as s2 is unbiased

and
n
s2 = M SE = n−2 σ̂ 2

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 43

Comments
• Least squares minimizes the squared error
between the prediction and the true output
• The normal distribution is fully characterized
by its first two central moments (mean and
variance)

• Food for thought:

– What does the bias in the ML estimator of the
error variance mean? And where does it come
from?

Frank Wood, [email protected] Linear Regression Models Lecture 3, Slide 44

Autotronik Manual
No ratings yet
Autotronik Manual
820 pages
Attractive: We've Been Crafting Beautiful Presentation & Making Clients Happy For Years
No ratings yet
Attractive: We've Been Crafting Beautiful Presentation & Making Clients Happy For Years
25 pages
OtPokemon Spawn Locations - Kanto
No ratings yet
OtPokemon Spawn Locations - Kanto
6 pages
An Ecocritical Reading of Hayao Miyazaki's Nausicaa of The Valley of The Wind: Territory, Toxicity, and Animals
No ratings yet
An Ecocritical Reading of Hayao Miyazaki's Nausicaa of The Valley of The Wind: Territory, Toxicity, and Animals
27 pages
Uop Sorbex Family of Technologies: James A. Johnson
No ratings yet
Uop Sorbex Family of Technologies: James A. Johnson
8 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
RegEstimationLS_ML_StatColumbia
No ratings yet
RegEstimationLS_ML_StatColumbia
44 pages
Lecture 4
No ratings yet
Lecture 4
32 pages
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
No ratings yet
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
35 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
Lecture 5
No ratings yet
Lecture 5
45 pages
Topic 3: Simple Linear Regression
No ratings yet
Topic 3: Simple Linear Regression
19 pages
UMVUE Statmat 2 2022
No ratings yet
UMVUE Statmat 2 2022
43 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Block1
No ratings yet
Block1
83 pages
SLRM note
No ratings yet
SLRM note
15 pages
Block 1
No ratings yet
Block 1
81 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Lecture 11
No ratings yet
Lecture 11
36 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Lecture1
No ratings yet
Lecture1
8 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Math170S_Lecture6
No ratings yet
Math170S_Lecture6
13 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Problem Set 3 PDF
No ratings yet
Problem Set 3 PDF
2 pages
LeastSquares_DeptMath
No ratings yet
LeastSquares_DeptMath
7 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Econometrics Lecture 3 Simple Linear Regression (SLR) For Cross Sectional Data Part 2
No ratings yet
Econometrics Lecture 3 Simple Linear Regression (SLR) For Cross Sectional Data Part 2
39 pages
3 The Basic Linear Model Finite Sample Results
No ratings yet
3 The Basic Linear Model Finite Sample Results
9 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
LM Week1 1 2019
No ratings yet
LM Week1 1 2019
28 pages
Least Squares Estimation PDF
No ratings yet
Least Squares Estimation PDF
5 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
Regresión Lineal
No ratings yet
Regresión Lineal
18 pages
MIT18 650F16 Regression
No ratings yet
MIT18 650F16 Regression
44 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Unit - III
No ratings yet
Unit - III
4 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Estimating A Regression Line: F. Chiaromonte 1
No ratings yet
Estimating A Regression Line: F. Chiaromonte 1
13 pages
Regression
No ratings yet
Regression
64 pages
LN3_Least Squares Estimation-Finite-Sample Properties_ver2_slides
No ratings yet
LN3_Least Squares Estimation-Finite-Sample Properties_ver2_slides
35 pages
Lect 6
No ratings yet
Lect 6
20 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
RMIT University
No ratings yet
RMIT University
172 pages
Stand Out of Our Light
100% (1)
Stand Out of Our Light
152 pages
Pastoral - Rogers PCT vs. Ellis REB Therapy
No ratings yet
Pastoral - Rogers PCT vs. Ellis REB Therapy
5 pages
Final Syllabus
No ratings yet
Final Syllabus
10 pages
Ce-102 (A) Statics & Dynamics - Questions2
No ratings yet
Ce-102 (A) Statics & Dynamics - Questions2
7 pages
Szeles Stage Hypnosis Show Course
No ratings yet
Szeles Stage Hypnosis Show Course
18 pages
Remote Command API Tutorial
No ratings yet
Remote Command API Tutorial
19 pages
Master in Theoretical and Practical Application of FEM and CAE Simulation
No ratings yet
Master in Theoretical and Practical Application of FEM and CAE Simulation
20 pages
2011 Bio Cluster Map
No ratings yet
2011 Bio Cluster Map
1 page
IT Project Management Tutorials
No ratings yet
IT Project Management Tutorials
20 pages
Network Protocols and Communication
100% (1)
Network Protocols and Communication
28 pages
Hist 113www Syllabus
No ratings yet
Hist 113www Syllabus
5 pages
America and I
No ratings yet
America and I
2 pages
VHDL Programs
83% (12)
VHDL Programs
49 pages
Business Management
No ratings yet
Business Management
2 pages
Idioms and Meaning
No ratings yet
Idioms and Meaning
30 pages
Class Exercise Nineth
No ratings yet
Class Exercise Nineth
8 pages
Strategic Alliances & AirBnB Case
No ratings yet
Strategic Alliances & AirBnB Case
16 pages
Activity 2 Prefinal 1
No ratings yet
Activity 2 Prefinal 1
10 pages
Problems in Ad Hoc Channel Access
No ratings yet
Problems in Ad Hoc Channel Access
7 pages
Classroom of The Elite Vol. 5
88% (8)
Classroom of The Elite Vol. 5
372 pages
Strategic Enterpreneurship Book 1
No ratings yet
Strategic Enterpreneurship Book 1
142 pages
Trnsmision Marina Rcd. 1250
100% (1)
Trnsmision Marina Rcd. 1250
42 pages
Concept Paper
No ratings yet
Concept Paper
4 pages
Summer Project Guidelines-Book
No ratings yet
Summer Project Guidelines-Book
16 pages