0% found this document useful (0 votes)

29 views9 pages

Lecture03a Least Squares Annotated

The document discusses the least squares method in linear regression, detailing the derivation of normal equations and the convexity of the cost function. It explains the geometric interpretation of the solution and the conditions for the invertibility of the Gram matrix. Additionally, it addresses issues related to rank deficiency and ill-conditioning in practical applications of linear regression.

Uploaded by

Quan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views9 pages

Lecture03a Least Squares Annotated

Uploaded by

Quan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

annote

Machine Learning Course - CS-433

Least Squares

Sept 24, 2024

Martin Jaggi
Last updated on: September 24, 2024
credits to Mohammad Emtiyaz Khan & Rüdiger Urbanke
Motivation
In rare cases, one can compute the
optimum of the cost function ana-
lytically. Linear regression using a
-

mean-squared error cost function is loss-fu = MSE

-
one such case. Here the solution can
be obtained explicitly, by solving a
linear system of equations. These
-

equations are sometimes called the

normal equations. This method is
one of the most popular methods
for data fitting. It is called least
squares.
non-convey

M
To derive the normal equations, we
first show that the problem is con- -

vex. We then use the optimality ⑧

conditions for convex functions (see NI = 0

the previous lecture notes on opti-
mization). I.e., at the optimum pa-
rameter, call it w?, it must be true
that the gradient of the cost function
is 0. I.e.,

rL(w?) = 0.

This is a system of D equations.

⑦ convex

-Optimal (global/
Normal Equations

C
Recall that the cost function for lin-

EnlXw y(R
ear regression with mean-squared er- -

ror is given by =

2 e
N
X
1 2 1 i
L(w) = yn x>
n w = (y Xw)>(y Xw),
2N n=1
2N

where feahes
2
y1
3 2
x11 12
↓
x ... x1D
3

wari
6 y2 7 6 x21 x22 . . . x2D 7 o

y = 4 . 5,X = 6
6 7
4 .. .. 7
. . . .. 5 .
.
yN xN 1 xN 2 . . . xN D

① We claim that this cost function is

convex in the w. There are several
-

ways of proving this:

fog =
f(g(w))
1. Simplest way: observe that : limea
L is naturally represented as g

the sum (with positive coef- ↓ : convex

ficients) of the simple terms

O fog
courex
(yn x> 2
n w) . Further, each of
these simple terms is the com-
position of a linear function sum of comiret

with a convex function (the ; convex

square function). Therefore,
each of these simple terms is
convex and hence the sum is
convex.
2. Directly verify the definition,
that for any 2 [0, 1] and ~
y
-o
W
w
line
w, w0, segment
stain
0 0
L( w + (1
-m
i
)w ) O
( L(w) + (1 O
)L(w ))  0.

Computation: LHS = ((Xi -

y /l -
z((xx y/ -

1 u
0
--

n -

x))(xw)y
=
(1 )kX(w w )k22,
2N -

which indeed is non-positive.

3. We can compute the sec-

[ )/,
ond derivative (the Hessian)
and show that it is positive
semidefinite (all its eigenval- xi =

ues are non-negative). For

the present case a computa-
tion shows that the Hessian -O PS D
..
.

has the form 2

= convex
1 >
X X.
N Example
This matrix is indeed positive
NSF-XT(y =
Xw)
semidefinite since its non-zero
eigenvalues are the squares of DEH =
EX
*
X
z
the non-zero singular values of
the matrix X. DXD matix
P..D . viMr30 Ne

W
⑧ ⑨
⑭ V
11Xv/ 0
O
= ,
= P SP
. . .
XL = O
② Now where we know that the func-
Compute

tion is convex, let us find its min-

imum. If we take the gradient of
this expression with respect to the
weight vector w we get
1 > = Xe = 0
rL(w) = X (y Xw).
N
If we set this expression to 0 we get
the normal equations for linear re- D EIRP
#xw
=
gression,

00 >
X (y Xw) = 0.
| {z }
error

Dequali
D variables
Geometric Interpretation
2EI12N
The error is orthogonal to all
-

columns of X.
The span of X is the space spanned
by the columns of X. Every ele-
ment of the span can be written
as u = Xw for some choice of w.
Which element of span(X) shall we
take? The normal equations tell us
-

that the optimum choice for u, call

it u?, is that element so that y u?
is orthogonal to span(X). In other
words, we should pick u? to be equal
Ext = o

to the projection of y onto span(X).

The following figure illustrates this:

(taken from Bishop’s book)

S= span(X/ YEIRN
4-
·

=
t
2
'2
⑧
'1 Xw
y
Col1
of X

min llell
W

11"-XwP
f(w) =
Least Squares solve X*(y -Xw) = 0
The matrix X>X 2 RD⇥D is called
the Gram matrix. If it is invertible, #X+xw
we can multiply the normal equation um
=
by the inverse of the Gram matrix DxD
-

from the left to get a closed-form ex-

pression for the minimum:

=
w? = (X>X) 1X>y. Ok
We can use this model to predict a
new value for an unseen datapoint
(test point) xm:

ŷm := x> ? >

O Stu
> 1 >
m w = xm (X X) X y.

Invertibility and Uniqueness

Note that the Gram matrix
X>X 2 RD⇥D is invertible if and
only if X has full column rank, or
#
in other words rank(X) = D.
-

&
Proof: To see this assume first that rank(X) < D. Then there exists a
non-zero vector u so that Xu = 0. It follows that X>Xu = 0, and so
rank(X>X) < D. Therefore, X>X is not invertible.
Conversely, assume that X>X is not invertible. Hence, there exists a
non-zero vector v so that X>Xv = 0. It follows that

Fit
0 = v>X>Xv = (Xv)>(Xv) = kXvk2.

This implies that Xv = 0, i.e., rank(X) < D.

case 2 : D> N
over-paramelized
Rank Deficiency and Ill-Conditioning
Unfortunately, in practice, X is of-
ten rank deficient.
-over-param setting
E
• If D > N , we always have
rank(X) < D
(since row rank = col. rank) invertible
-
potentially
I
• If D  N , but some of
the columns x:d are (nearly)
D

.
collinear, then the matrix is ill-

-
-

conditioned, leading to numer-

-
ical issues when solving the lin-
ear system.

Can we solve least squares if X is

rank deficient? Yes, using a linear
system solver. o

XXw = Xy
Summary of Linear Regression
We have studied three types of methods:

O
1. Grid Search

O
2. Iterative Optimization Algorithms
(Stochastic) Gradient Descent
3. Least squares
closed-form solution, for linear MSE
-
Additional Notes
Solving linear systems
There are many ways to solve a linear system Mb
Xw = y, but it usually
involves a decomposition of the matrixM
X such as the QR or LU decom-
position which are very robust. Matlab’s backslash operator and also
NumPy’s linalg package implement this in just one line:
MD
w = np . l i n a l g . s o l v e (X, y)
It is important to never invert a matrix to solve a linear system - as this
would incur a cost at least three times the cost of using a linear solver.
For more, see this blog post https://gregorygundersen.com/blog/2020/
12/09/matrix-inversion/.

For a robust implementation, see Sec. 7.5.2 of Kevin Murphy’s book.

Closed-form solution for MAE mi 1xw-y1

Can you derive closed-form solution for 1-parameter model when using
the MAE cost function?
See this short article: http://www.johnmyleswhite.com/notebook/2013/
03/22/modes-medians-and-means-an-unifying-perspective/.

(Step-Up) Samir S. Shah, Brian Alverson, Jeanine Ronan - Step-Up To Pediatrics-LWW (2013)
100% (5)
(Step-Up) Samir S. Shah, Brian Alverson, Jeanine Ronan - Step-Up To Pediatrics-LWW (2013)
607 pages
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
0% (1)
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
4 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Linear Least Squares
No ratings yet
Linear Least Squares
21 pages
Least-Square Method
No ratings yet
Least-Square Method
32 pages
Notas de Optimizacion
No ratings yet
Notas de Optimizacion
3 pages
CS 532 Lecture Notes
No ratings yet
CS 532 Lecture Notes
25 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Chap 03
No ratings yet
Chap 03
59 pages
Chapter 12 Lecture Notes
No ratings yet
Chapter 12 Lecture Notes
4 pages
Lecture-04 - Least Squares and Geometry
No ratings yet
Lecture-04 - Least Squares and Geometry
35 pages
31 Least Squares
No ratings yet
31 Least Squares
39 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Lecture-05 - Least Squares and Optimization
No ratings yet
Lecture-05 - Least Squares and Optimization
34 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
Linear Least Squared
No ratings yet
Linear Least Squared
23 pages
Data Analysis
No ratings yet
Data Analysis
40 pages
Lecture25 Ps
No ratings yet
Lecture25 Ps
10 pages
Final Exam Solutions: Solution
No ratings yet
Final Exam Solutions: Solution
10 pages
Lecture 2
No ratings yet
Lecture 2
10 pages
3.1 Least-Squares Problems
No ratings yet
3.1 Least-Squares Problems
28 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
Worksheet 2
No ratings yet
Worksheet 2
9 pages
Some Notes On Least Squares, QR-factorization, SVD and Fitting
No ratings yet
Some Notes On Least Squares, QR-factorization, SVD and Fitting
12 pages
Solutions Manual For Econometric Analysis 7th Edition by Greene Sample Chapter
No ratings yet
Solutions Manual For Econometric Analysis 7th Edition by Greene Sample Chapter
13 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
Least Squares Full Resume
No ratings yet
Least Squares Full Resume
15 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
LinearRegression LectureNotesPublic PDF
No ratings yet
LinearRegression LectureNotesPublic PDF
7 pages
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
No ratings yet
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
20 pages
Kayatu
No ratings yet
Kayatu
3 pages
Chapter 6: Application: The Least-Square Solution and The Least-Squares Error. 1 Best Approximation
No ratings yet
Chapter 6: Application: The Least-Square Solution and The Least-Squares Error. 1 Best Approximation
7 pages
LeastSquares DeptMath
No ratings yet
LeastSquares DeptMath
7 pages
10 Regression, Including Least-Squares Linear and Logistic Regression
No ratings yet
10 Regression, Including Least-Squares Linear and Logistic Regression
5 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Asset-V1 ColumbiaX+CSMM.102x+3T2018+type@asset+block@ML Lecture3 PDF
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+3T2018+type@asset+block@ML Lecture3 PDF
33 pages
Econometrics I 3
No ratings yet
Econometrics I 3
27 pages
Chapter 2
No ratings yet
Chapter 2
16 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
ECON 5350 Class Notes Least Squares: 2.1 The Problem
No ratings yet
ECON 5350 Class Notes Least Squares: 2.1 The Problem
4 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Least Squares and Data Fitting
No ratings yet
Least Squares and Data Fitting
6 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
Solution Quiz 1
No ratings yet
Solution Quiz 1
5 pages
Day 1
No ratings yet
Day 1
41 pages
Linear Regression: 1 Perspective 1: Maximum Likelihood Estimation
No ratings yet
Linear Regression: 1 Perspective 1: Maximum Likelihood Estimation
5 pages
Tutorial 2 2023
No ratings yet
Tutorial 2 2023
10 pages
Lecture-4 2
No ratings yet
Lecture-4 2
50 pages
Group 30
No ratings yet
Group 30
33 pages
Properties of The Singular Value Decomposition: Preliminary Definitions
No ratings yet
Properties of The Singular Value Decomposition: Preliminary Definitions
24 pages
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
No ratings yet
2 Classical Linear Regression Models: 2.1 Assumptions For The Ordinary Least Squares Regression
18 pages
Cs419 Closed Form Derv
No ratings yet
Cs419 Closed Form Derv
5 pages
Ee127-Fa2018-Mt1-El Ghaoui-Soln
No ratings yet
Ee127-Fa2018-Mt1-El Ghaoui-Soln
15 pages
Lec10 LeastSquaresRegression PDF
No ratings yet
Lec10 LeastSquaresRegression PDF
4 pages
Least Squares Method
No ratings yet
Least Squares Method
36 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Problems in Quantum Mechanics: Third Edition
From Everand
Problems in Quantum Mechanics: Third Edition
D. ter Haar
3/5 (2)
Volume 2
No ratings yet
Volume 2
270 pages
Mathematics 11 03784 v3
No ratings yet
Mathematics 11 03784 v3
12 pages
Lecture03b Overfitting Annotated
No ratings yet
Lecture03b Overfitting Annotated
5 pages
Computation of Electromagnetic Fields Around HVDC Transmission Line Tying Egypt and Ksa
No ratings yet
Computation of Electromagnetic Fields Around HVDC Transmission Line Tying Egypt and Ksa
6 pages
Lecture03b Overfitting
No ratings yet
Lecture03b Overfitting
5 pages
RISC-V Instruction Set Summary
No ratings yet
RISC-V Instruction Set Summary
4 pages
Chapter6 Slides
No ratings yet
Chapter6 Slides
28 pages
Inf Theory 3
No ratings yet
Inf Theory 3
76 pages
Chapter4 Slides
No ratings yet
Chapter4 Slides
42 pages
08 Giaigandung Hephuongtrinh BG Tuan8 Editted
No ratings yet
08 Giaigandung Hephuongtrinh BG Tuan8 Editted
18 pages
07PPLapdon - BG - T7 Editted
No ratings yet
07PPLapdon - BG - T7 Editted
21 pages
Translation and Culture Analysis Tarian Lengger Maut 2021
No ratings yet
Translation and Culture Analysis Tarian Lengger Maut 2021
4 pages
Abnormal Beh Lara
No ratings yet
Abnormal Beh Lara
5 pages
How Do You Start A Summary of A Book Review
No ratings yet
How Do You Start A Summary of A Book Review
2 pages
Congenital Hypothyroidism
No ratings yet
Congenital Hypothyroidism
2 pages
(Adjustment and Mental Health) : 1. (Frustration) 2. (Conflict) 3. (Pressure)
No ratings yet
(Adjustment and Mental Health) : 1. (Frustration) 2. (Conflict) 3. (Pressure)
5 pages
Application Letter
No ratings yet
Application Letter
18 pages
CBSE Class 3 Science Birds MCQS, Multiple Choice Questions
No ratings yet
CBSE Class 3 Science Birds MCQS, Multiple Choice Questions
21 pages
Final Science Lesson Plan
No ratings yet
Final Science Lesson Plan
15 pages
Under The Influence of Social Media
No ratings yet
Under The Influence of Social Media
18 pages
Self Esteem
0% (1)
Self Esteem
3 pages
I - Ochorowicz - The Project of An International Congress of Psychology
No ratings yet
I - Ochorowicz - The Project of An International Congress of Psychology
13 pages
Chromosomal-Aberrations
No ratings yet
Chromosomal-Aberrations
4 pages
Job Duties and Tasks For: "Registered Nurse"
No ratings yet
Job Duties and Tasks For: "Registered Nurse"
7 pages
Cpar M1
No ratings yet
Cpar M1
23 pages
Project Template
No ratings yet
Project Template
11 pages
AOTD - P1M1 - The Rise of Info Products & Self Education
No ratings yet
AOTD - P1M1 - The Rise of Info Products & Self Education
4 pages
Summer Training Report For BBA IV
No ratings yet
Summer Training Report For BBA IV
14 pages
VSP Midrange Installation HQT 4180 Exam
No ratings yet
VSP Midrange Installation HQT 4180 Exam
2 pages
Linear Equations in One Variable (Level 2)
No ratings yet
Linear Equations in One Variable (Level 2)
12 pages
DELM 212 Educational Leadership
No ratings yet
DELM 212 Educational Leadership
12 pages
Anaphora Resolution PDF
No ratings yet
Anaphora Resolution PDF
63 pages
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
No ratings yet
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
5 pages
CSR of Wipro
79% (14)
CSR of Wipro
26 pages
Certificates
No ratings yet
Certificates
4 pages
Checklist For Enrollment of Providers
No ratings yet
Checklist For Enrollment of Providers
7 pages
Mastering Biology Chapter 9 Homework Answers
100% (1)
Mastering Biology Chapter 9 Homework Answers
8 pages
COR2100 - Que VU - Term2 - 2223 (Updated)
No ratings yet
COR2100 - Que VU - Term2 - 2223 (Updated)
6 pages
EMB 602 Human Resource Management 1 PDF
No ratings yet
EMB 602 Human Resource Management 1 PDF
5 pages
Hope 3 Q2 - Module 1
No ratings yet
Hope 3 Q2 - Module 1
28 pages

Lecture03a Least Squares Annotated

Uploaded by

Lecture03a Least Squares Annotated

Uploaded by

annote

Machine Learning Course - CS-433

Sept 24, 2024

mean-squared error cost function is loss-fu = MSE

equations are sometimes called the

vex. We then use the optimality ⑧

conditions for convex functions (see NI = 0

This is a system of D equations.

① We claim that this cost function is

ways of proving this:

the sum (with positive coef- ↓ : convex

ficients) of the simple terms

with a convex function (the ; convex

Computation: LHS = ((Xi -

which indeed is non-positive.

ues are non-negative). For

has the form 2

tion is convex, let us find its min-

that the optimum choice for u, call

to the projection of y onto span(X).

The following figure illustrates this:

from the left to get a closed-form ex-

pression for the minimum:

ŷm := x> ? >

Invertibility and Uniqueness

This implies that Xv = 0, i.e., rank(X) < D.

conditioned, leading to numer-

Can we solve least squares if X is

For a robust implementation, see Sec. 7.5.2 of Kevin Murphy’s book.

Closed-form solution for MAE mi 1xw-y1

You might also like