0% found this document useful (0 votes)
11 views14 pages

2 Linear Regression

The document provides an overview of classical machine learning algorithms, focusing on supervised and unsupervised learning, with a detailed explanation of linear regression. It describes the components of supervised learning, including the network function, loss function, and optimization algorithm, and outlines the optimization process for linear regression using gradient descent. Additionally, it discusses the closed-form solution for minimizing the loss function in linear regression.

Uploaded by

ghughudekhecho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

2 Linear Regression

The document provides an overview of classical machine learning algorithms, focusing on supervised and unsupervised learning, with a detailed explanation of linear regression. It describes the components of supervised learning, including the network function, loss function, and optimization algorithm, and outlines the optimization process for linear regression using gradient descent. Additionally, it discusses the closed-form solution for minimizing the loss function in linear regression.

Uploaded by

ghughudekhecho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Classical machine learning algorithms

• Supervised Learning
• Generic equation

• Three main ingredients of any supervised


algorithm
• Network function
• Map the features into the labels
• Loss function
• Quantify the deviation between actual and
corresponding prediction
• Optimization algorithm
• Technique used to minimize the loss

• Unsupervised Learning

1
Linear Regression
• We aim at predicting a continuous target
value given an input feature vector.
• We assume a d-dimensional feature vector
is denoted by , while is the output variable..
• The hypothesis function is defined by

• Geometrically, when d = 1, is actually a line


in a 2D plan

2
Linear Regression

Slope Intercept

• The intercept term is often called the bias parameter


• This terminology derives from the point of view that the
output of the transformation is biased toward being in the 𝑤𝑜
absence of any input.
• This term is different from the idea of a statistical bias, in
which a statistical estimation algorithm’s expected
estimate of a quantity is not equal to the true quantity.

3
Optimization: Linear Regression
• Mapping of th instance

• Deviation in approximation for the th


instance

• Overall deviation

4
Optimization: Linear Regression
• Overall deviation

• Optimization Problem

• Gradient Descent will be inapplicable as


mode is a non-differentiable statistic.

5
Optimization: Linear Regression
• We minimize the square of deviation
instead of mode.

• Gradient descent
1. Start with an initial guess for , say J
• Gradient descent: initialize your starting point for
search for minimum anywhere
2. Iterate until convergence,
• Compute gradient of J w.r.t linear coefficient at time
t
• Update to get by taking a step in the opposite
direction of the gradient

And eventually you will get to minimum 6


Optimization: Linear Regression
𝑵
𝟏 𝟐
𝒂𝒓𝒈𝒎𝒊𝒏𝑤 , 𝑤 , …, 𝑤 𝑱= ∑ ( 𝒚 𝒊 − ( 𝒘 𝒙 𝒊 +𝒘 𝟎) )
𝑻
0 1 𝑑
𝟐 𝒊 =𝟏
𝑇

… … …
^
𝑦 𝑖=¿ × +𝑤 0
… … …
… … …

𝑋 ∈ ℝ 𝑑× 𝑁
7
Optimization: Linear Regression
𝑵
𝟏 𝟐
𝒂𝒓𝒈𝒎𝒊𝒏𝑤 , 𝑤 , …, 𝑤 𝑱= ∑ ( 𝒚 𝒊 − ( 𝒘 𝒙 𝒊 +𝒘 𝟎) )
𝑻
0 1 𝑑
𝟐 𝒊 =𝟏
𝑇


^
𝑦 𝑖=¿ ×
… … … …
… … … …
… … …

𝑋 ∈ ℝ 𝑑× 𝑁
8
Optimization: Linear Regression
𝑵
𝟏 𝟐
𝒂𝒓𝒈𝒎𝒊𝒏𝒘 𝑱 = ∑ ( 𝒚 𝒊 − 𝒘 𝒙 𝒊 ) =∥ 𝒚 − 𝑿 𝒘 ∥ 𝑭
𝑻 𝑻 𝟐
𝟐 𝒊=𝟏

𝟏
𝒂𝒓𝒈𝒎𝒊𝒏𝒘 [ 𝟐 𝟐
𝑱 = ( 𝒚 𝟏 − 𝒘 𝒙 𝟏) + ( 𝒚 𝟐 −𝒘 𝒙 𝟐 ) +… + ( 𝒚 𝑵 − 𝒘 𝒙 𝑵 )
𝟐
𝑻 𝑻 𝑻 𝟐
]
𝟐 𝟐
𝜕 ( 𝒚 𝟏 −𝒘 𝒙 𝟏 ) 𝜕 ( 𝒚 𝟏 −𝒘 𝒙 𝟏 ) 𝜕 ( 𝒚 𝟏 − 𝒘 𝒙𝟏 )
𝑻 𝑻 𝑻

=−𝟐 ( 𝒚 𝟏 −𝒘 𝒙 𝟏 ) 𝒙 𝟎𝟏
𝑻
= ×
𝜕 𝑤0 𝜕 ( 𝒚 𝟏 − 𝒘 𝒙𝟏 ) 𝜕𝑤0
𝑻

9
Optimization: Linear Regression
𝟏
𝒂𝒓𝒈𝒎𝒊𝒏𝒘 [ 𝟐 𝟐
𝑱 = ( 𝒚 𝟏 − 𝒘 𝒙 𝟏) + ( 𝒚 𝟐 −𝒘 𝒙 𝟐 ) +… + ( 𝒚 𝑵 − 𝒘 𝒙 𝑵 )
𝟐
𝑻 𝑻 𝑻 𝟐
]
𝟐 𝟐
𝜕 ( 𝒚 𝟏 −𝒘 𝒙 𝟏 ) 𝜕 ( 𝒚 𝟏 −𝒘 𝒙 𝟏 ) 𝜕 ( 𝒚 𝟏 − 𝒘 𝒙𝟏 )
𝑻 𝑻 𝑻

=−𝟐 ( 𝒚 𝟏 −𝒘 𝒙𝟏 ) 𝒙 𝟎 𝟏
𝑻
= ×
𝜕 𝑤0 𝜕 ( 𝒚 𝟏 − 𝒘 𝒙𝟏 ) 𝜕𝑤0
𝑻

𝜕𝑱 𝟏
𝜕 𝑤0 𝟐
[
= − 𝟐 ( 𝒚 𝟏 − 𝒘 𝒙 𝟏 ) 𝒙 𝟎𝟏 −𝟐 ( 𝒚 𝟐 −𝒘 𝒙𝟐 ) 𝒙 𝟎𝟐 −… − 𝟐 ( 𝒚 𝑵 − 𝒘 𝒙 𝑵 ) 𝒙 𝟎 𝑵
𝑻 𝑻 𝑻
]

10
Optimization: Linear Regression
𝜕𝑱
𝜕 𝑤0
[
=− ( 𝒚 𝟏 −𝒘 𝒙 𝟏 ) 𝒙 𝟎𝟏 + ( 𝒚 𝟐 − 𝒘 𝒙 𝟐 ) 𝒙 𝟎 𝟐 +…+ 𝟐 ( 𝒚 𝑵 − 𝒘 𝒙 𝑵 ) 𝒙 𝟎 𝑵
𝑻 𝑻 𝑻
]
𝑇

𝜕𝑱
=¿
− × − ×
𝜕 𝑤0 … …
… …
… …

𝜕𝑱
=¿− 𝑋0 . × ( 𝑦 − 𝑋𝑇 𝑤)
𝜕 𝑤0
11
Optimization: Linear Regression

𝜕𝑱 𝑻
=− 𝑿 𝟎 .×( 𝒚 − 𝑿 𝒘 )
𝜕 𝑤0

𝜕 𝑱 𝑻
=− 𝑿 𝒋 . ×( 𝒚 − 𝑿 𝒘)
𝜕𝑤 𝑗

𝜕 𝑱 𝑻
=− 𝑿 × ( 𝒚 − 𝑿 𝒘 )
𝜕𝑤

12
Optimization: Closed form Solution
At the point of minimization, the gradient
of the loss function with respect to the
model parameters is zero, indicating that
the best fit has been found.

𝜕 𝑱 𝑻
=− 𝑿 × ( 𝒚 − 𝑿 𝒘 )
𝜕𝑤

13
References
• Lecture 2: Deep Learning Fundamentals by
Serena Yeung

You might also like