0% found this document useful (0 votes)

7 views59 pages

ML - Lec 5 - Regression - Gradient Descent Least Square

The document discusses linear regression, focusing on the formulation of the regression model using basis functions and the application of gradient descent for optimization. It outlines the loss function, the learning algorithm, and the closed-form solution using ordinary least squares. Additionally, it emphasizes that linear regression is a convex optimization problem, ensuring that gradient descent will reach a global optimum.

Uploaded by

alidashtbozorg709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views59 pages

ML - Lec 5 - Regression - Gradient Descent Least Square

Uploaded by

alidashtbozorg709

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

General linear regression

problem
• Using our new notations for the basis function linear
regression can be written as
n
y = ∑ w j φ j (x)
j= 0

• Where φj(x) can be either xj for multivariate regression or

one of the non-linear basis functions we defined
€
• … and φ0(x)=1 for the intercept term
Gradient descent
Gradient Descent for Linear Regression
n

Goal: minimize the following predict with : yˆ i = ∑ w j φ j (x i )

j
loss function:
2
# &
i 2
J X,y (w) = ∑ ( y − ŷ
i
) = ∑% y − ∑ w jφ j (x )((
% i i

i
€ i $ j '

sum over n examples sum over k+1 basis vectors

Gradient Descent for Linear Regression
n
Goal: minimize the following predict with : yˆ i = ∑ w j φ j (x i )
loss function: j

2
% (2
J X, y (w) = ∑ ( y − yˆ
i i
) = ∑'' y i − ∑ w j φ j (x i )**
i i & j )
∂ ∂ €i 2

∂w j
J(w) =
∂w j
∑( y − ˆ
y i
)
i

€ ∂
= 2∑ ( y i − yˆ i ) yˆ i
i
∂w j
∂
= 2∑ ( y − yˆ i )
i
∑w φ j j (x i
)
i ∂w j j

= 2∑ ( y i − yˆ i ) φ j (x i )
i
Gradient Descent for Linear Regression
Learning algorithm:

• Initialize weights w=0

• For t=1,… until convergence: k
• Predict for each example xi using w: ŷi = ∑ w jφ j (x i )
j=0

∂
• Compute gradient of loss: J(w) = 2∑ ( y i − yˆ i ) φ j (x i )
∂w j
• This is a vector g i

• Update: w = w – λg
•λ is the learning rate.
€
Gradient Descent for Linear
Regression
We can use any of the tricks
stochastic gradient descent (if the data is too big to put in
(memory
regularization
… –
Linear regression is a convex
optimization problem
so again gradient descent will reach a global optimum

proof: differentiate again to get the second derivative

Multivariate Least Squares

Approach 2: Matrix Inversion

OLS (Ordinary Least Squares Solution)
n
Goal: minimize the following predict with : yˆ i = ∑ w j φ j (x i )
loss function: j

2
% (2
J X, y (w) = ∑ ( y − yˆ
i i
) = ∑'' y i − ∑ w j φ j (x i )**
i i & j )
∂ €
J(w) = 2∑ ( y i − yˆ i ) φ j (x i )
∂w j i
€
n
Goal: minimize the following predict with : yˆ i = ∑ w j φ j (x i )
loss function: j

2
% (2
J X, y (w) = ∑ ( y − yˆ
i i
) = ∑'' y i − ∑ w j φ j (x i )**
i i & j )
∂ €
J(w) = 2∑ ( y i − yˆ i ) φ j (x i )
∂w j i
k+1 basis vectors
€
" %
Notation: φ
$ 0 (x 1
) φ1 (x1
)  φ k (x1 ) '
$ φ (x 2 ) φ (x 2 )  φ k (x 2 ) '
Φ=$ 0 1
' n examples
$     '
$ φ (x n ) φ (x n )  φ k (x n ) '&
# 0 1
n
Goal: minimize the following predict with : yˆ i = ∑ w j φ j (x i )
loss function: j

2
% (2
J X, y (w) = ∑ ( y − yˆ
i i
) = ∑'' y i − ∑ w j φ j (x i )**
i i & j )
∂ €
J(w) = 2∑ ( y i − yˆ i ) φ j (x i ) ! w $
∂w j i # 0 &
k+1 basis vectors
€ w = # .. &
" % ## w &&
" y1 % φ
$ 0 (x 1
) φ1 (x1
)  φ k (x1 ) ' " k %
$ ... ' $ φ (x 2 ) φ (x 2 )  φ k (x 2 ) '
y =$ ' Φ=$ 0 1
' n examples
$ ... ' $     '
$ n'
#y & $ φ (x n ) φ (x n )  φ k (x n ) '&
# 0 1
k+1 basis vectors

" % ! φ1 $
" y1 % φ
$ 0 (x 1
) φ1 (x1
)  1
φ k (x ) ' # &
$ ... ' $ φ (x 2 ) φ (x 2 )  φ k (x 2 ) ' ## ...
&
y =$ ' Φ=$ 0 1
'= & n examples
$ ... ' $     ' # ... &
$ n' # &
#y & $ φ (x n ) φ (x n )  φ k (x n ) '& # φn & ! w $
# 0 1
" % # 0 &
w = # .. &
## w &&
" k %
∂
J(w) = 2∑ ( y i − ŷ i ) φ 0 (x i )
∂ w0 i
notation: φ ij ≡ φ j (x i )
…

∂
J(w) = 2∑ ( y i − ŷ i ) φ k (x i )
∂ wk i
k+1 basis vectors

" % ! $
" y1 % φ (x 1
) φ (x1
)  1
φ k (x ) ' # φ1 &
$ 0 1
$ ... ' &
$ φ (x 2 ) φ (x 2 )  φ k (x 2 ) ' ## ...
y =$ ' Φ=$ 0 1
'= & n examples
$ ... ' $     ' # ... &
$ n' # &
#y & $ φ (x n ) φ (x n )  φ k (x n ) '& # φn & ! w $
# 0 1
" % # 0 &
w = # .. &
## w &&
" k %
∂
J(w) = 2∑ ( y iφ 1i − ŷ iφ 1i )
∂ w0 i
n
recall ŷ i = ∑ w jφ ij
… j
i
=φ w
∂
J(w) = 2∑ ( y iφ ki − ŷ iφ ki )
∂ wk i
k+1 basis vectors

" %
"y %1 φ
$ 0 (x 1
) φ (x1
)  φ k (x1 ) ' # 1 &
1 φ
$ ... ' $ φ (x 2 ) φ (x 2 )  φ k (x 2 ) ' % ... (
y =$ ' Φ=$ 0 1
' =% ( n examples
$ ... ' $     ' %% ... ((
$ n' n
#y & $ φ (x n ) φ (x n )  φ k (x ) &
n ' $ φ ' ! w $
# 0 1
# 0 &
w = # .. &
## w &&
€ " k %
∂
J(w) = 2∑ ( y iφ 0i − φ i w φ 0i )
∂ w0 i

…
= 2ΦT y − 2ΦT Φw
∂
J(w) = 2∑ ( y iφ ki − φ i w φ ki )
∂ wk i
n examples

" % ! $
φ (x1
) φ (x 2
)  φ (x n
) # y1 &
$ 0 0 0 '
T $ ' # ... &
Φ =$ ' # &
$     ' y =# ... &
$ φ k (x1 ) φ k (x 2 )  φ k (x n ) ' # ... &
# & # &
# yn &
" %

∂
J(w) = 2∑ (φ 0i y i − φ 0i φ i w )
∂ w0 i

…
= 2ΦT y −...
∂
J(w) = 2∑ (φ ki y i − φ kiφ i w )
∂ wk i
n examples k+1 basis

! $ ! ! w $
 φ k (x ) $&
1 n
# φ (x ) ... φ (x ) & φ (x1
)  1
0 0
# 0 # 0 &
# φ (x1 ) φ (x n
) & # φ (x 2 )  φ k (x 2 ) & # .. &
1 1
# & # 0 & ## w &&
#     & #     &
# φ (x1 ) & " k %
 φ (x n
) # φ (x n )  φ k (x n ) &%
" k k % " 0

∂
J(w) = 2∑ (φ 0i y i − φ 0i φ i w )
∂ w0 i

…
T
= ... − 2Φ Φw
∂
J(w) = 2∑ (φ ki y i − φ kiφ i w )
∂ wk i
k+1 basis vectors

…
= 2ΦT y − 2ΦT Φw = 0
∂
J(w) = 2∑ ( y iφ ki − φ i w φ ki ) −1
∂ wk i
w = (Φ Φ) ΦT y
T
recap: Solving linear regression
• To optimize – closed form:
• We just take the derivative w.r.t. to w and set to 0:

∂
∑
∂w i
(yi − wx i ) 2
= 2∑ −xi (yi − wxi ) ⇒
i

2∑ xi (yi − wxi ) = 0 ⇒ 2∑ xi yi − 2∑ wxi xi = 0

i i i

∑ x y = ∑ wx
i i
2
i ⇒
i i
covar(X,Y)/var(X)
∑x y i i
if mean(X)=mean(Y)=0
i
w=
∑x 2
i
i
k+1 basis vectors

…
= 2ΦT y − 2ΦT Φw = 0
∂
J(w) = 2∑ ( y iφ ki − φ i w φ ki ) −1
∂ wk i
w = (Φ Φ) ΦT y
T
LMS for general linear regression problem
J(w) = ∑ (y i − w T φ (x i )) 2
i
T −1 T
Deriving w we get: w = (Φ Φ) Φ y
€
n entries vector
k+1 entries vector

€ n by k+1 matrix

This solution is also known as ‘pseudo inverse’

Another reason to start with an objective function: you can see

when two learning methods are the same!
LMS versus gradient descent
J(w) = ∑ (y i − w T φ (x i )) 2 w = (ΦT Φ)−1 ΦT y
i

LMS solution:
+ Very simple in Matlab or something similar
€ which is expensive for a large
- Requires matrix inverse,
matrix.

Gradient descent:
+ Fast for large matrices
+ Stochastic GD is very memory efficient
+ Easily extended to other cases
- Parameters to tweak (how to decide convergence?
what is the learning rate? ….)
Regression and Overfitting
An example: polynomial basis
vectors on a small dataset

– From Bishop Ch 1
0 th Order Polynomial

n=10
1 st Order Polynomial
3 rd Order Polynomial
9 th Order Polynomial
Over-fitting

Root-Mean-Square (RMS) Error:

Polynomial Coefficients
Data Set Size:
9th Order Polynomial
Regularization
Penalize large coefficient values

2
1 # i & λ 2
J X,y (w) = ∑%% y − ∑ w jφ j (x )(( −
i
w
2 i $ j ' 2
Regularization:
+
Polynomial Coefficients
none exp(18) huge
Over Regularization:
Regularized Gradient Descent for LR
n
Goal: minimize the following predict with : yˆ i = ∑ w j φ j (x i )
loss function: j

2
1 # i & λ 2
J X,y (w) = ∑%% y − ∑ w jφ j (x )(( −
i
w
2 i $ € j ' 2
2
1 # i & λ
J X,y (w) = ∑%% y − ∑ w jφ j (x )(( − ∑ w 2j
i

2 i $ j ' 2 j

∂
J(w) = ∑ ( y i − ŷ i ) φ j (x i ) − λ w j
∂wj i
Probabilistic Interpretation of
Least Squares
A probabilistic interpretation
Our least squares minimization solution can also be
motivated by a probabilistic in interpretation of the
regression problem: y = w T φ (x) + ε
where ε is Gaussian noise

€ The MLE for w in this model

is the same as the solution
we derived for least squares
criteria:

w = (ΦT Φ)−1 ΦT y
Bias-Variance
Example
Tom Dietterich, Oregon St
Example
Tom Dietterich, Oregon St

Same experiment, repeated:

with 50 samples of 20 points each

Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
No ratings yet
Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
21 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
11 - Học máy cơ bản - Hồi quy tuyến tính 1
No ratings yet
11 - Học máy cơ bản - Hồi quy tuyến tính 1
105 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
23 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Python Tutorial
No ratings yet
Python Tutorial
37 pages
02 01 Regression
No ratings yet
02 01 Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
General Topology
90% (10)
General Topology
154 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
03 Regression
No ratings yet
03 Regression
55 pages
Group 30
No ratings yet
Group 30
33 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Least Squares Regression
No ratings yet
Least Squares Regression
15 pages
Day 1
No ratings yet
Day 1
41 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Representer Function
No ratings yet
Representer Function
12 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Lec 3
No ratings yet
Lec 3
20 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Goal Is To Minimize The Error Between A Polynomial and The Data
No ratings yet
Goal Is To Minimize The Error Between A Polynomial and The Data
6 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Lecture 1.5-1.6
No ratings yet
Lecture 1.5-1.6
23 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
7 pages
Linear Regression: 1 Perspective 1: Maximum Likelihood Estimation
No ratings yet
Linear Regression: 1 Perspective 1: Maximum Likelihood Estimation
5 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Solution Quiz 1
No ratings yet
Solution Quiz 1
5 pages
Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
Least Squares Regression
No ratings yet
Least Squares Regression
4 pages
ML Lec8
No ratings yet
ML Lec8
7 pages
EC501 Lecture 01
No ratings yet
EC501 Lecture 01
28 pages
Lec10 LeastSquaresRegression PDF
No ratings yet
Lec10 LeastSquaresRegression PDF
4 pages
NTS Analytical Reasoning - Ebook
No ratings yet
NTS Analytical Reasoning - Ebook
3 pages
PYL800 Assignment1
No ratings yet
PYL800 Assignment1
2 pages
Complex Integration
No ratings yet
Complex Integration
22 pages
Ece-V-Digital Signal Processing (10ec52) - Notes PDF
No ratings yet
Ece-V-Digital Signal Processing (10ec52) - Notes PDF
161 pages
MGMT3101 International Business Strategy S22014
No ratings yet
MGMT3101 International Business Strategy S22014
24 pages
Analytical Term Paper Sample
100% (1)
Analytical Term Paper Sample
8 pages
Linear System Theory: Second Edition
No ratings yet
Linear System Theory: Second Edition
7 pages
Cigré 1996: 38-203 o
No ratings yet
Cigré 1996: 38-203 o
6 pages
Experiment 4
No ratings yet
Experiment 4
7 pages
Example of The Use of Stokes Theorem
No ratings yet
Example of The Use of Stokes Theorem
3 pages
Basic Differentiation Table
No ratings yet
Basic Differentiation Table
1 page
Lec3 Convex Function Exercise
No ratings yet
Lec3 Convex Function Exercise
4 pages
Data Handling - WS7
No ratings yet
Data Handling - WS7
5 pages
Content Analysis
No ratings yet
Content Analysis
13 pages
Two-Dimensional Conduction: Finite-Difference Equations and Solutions
No ratings yet
Two-Dimensional Conduction: Finite-Difference Equations and Solutions
25 pages
Signals and Systems
No ratings yet
Signals and Systems
17 pages
Allama Iqbal Open University, Islamabad Warning: Assignment No. 1
No ratings yet
Allama Iqbal Open University, Islamabad Warning: Assignment No. 1
3 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
11.3 Finite Difference Methods For Linear Problems
No ratings yet
11.3 Finite Difference Methods For Linear Problems
23 pages
Calculus ET1 Anton
No ratings yet
Calculus ET1 Anton
4 pages
18-Application of Derivative-01 Theory
No ratings yet
18-Application of Derivative-01 Theory
18 pages
Pre-Service Teachers' Difficulties With Problem Solving
No ratings yet
Pre-Service Teachers' Difficulties With Problem Solving
8 pages
M Stat (2015) - Revised PDF
No ratings yet
M Stat (2015) - Revised PDF
59 pages
Item Analysis Using CITAS - Demonstration (Empty)
No ratings yet
Item Analysis Using CITAS - Demonstration (Empty)
115 pages
Memonetal JASEM Editorial V4 Iss2 June2020
No ratings yet
Memonetal JASEM Editorial V4 Iss2 June2020
21 pages
Activity 1
No ratings yet
Activity 1
9 pages
Baio Bayesian Health Economics PDF
No ratings yet
Baio Bayesian Health Economics PDF
67 pages
MIT6 079F09 Lec19 PDF
No ratings yet
MIT6 079F09 Lec19 PDF
27 pages
Supplier Selection Method: A Case-Study On A Car Seat Manufacturer in Thailand
No ratings yet
Supplier Selection Method: A Case-Study On A Car Seat Manufacturer in Thailand
5 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet

ML - Lec 5 - Regression - Gradient Descent Least Square

Uploaded by

ML - Lec 5 - Regression - Gradient Descent Least Square

Uploaded by

General linear regression

• Where φj(x) can be either xj for multivariate regression or

Goal: minimize the following predict with : yˆ i = ∑ w j φ j (x i )

sum over n examples sum over k+1 basis vectors

• Initialize weights w=0

proof: differentiate again to get the second derivative

Approach 2: Matrix Inversion

2∑ xi (yi − wxi ) = 0 ⇒ 2∑ xi yi − 2∑ wxi xi = 0

This solution is also known as ‘pseudo inverse’

Another reason to start with an objective function: you can see

Root-Mean-Square (RMS) Error:

€ The MLE for w in this model

Same experiment, repeated:

You might also like