0% found this document useful (0 votes)
3 views56 pages

ML 3

This document provides an overview of supervised learning, specifically focusing on linear regression and gradient descent. It explains the process of supervised learning, the theory behind linear regression, and the method of ordinary least squares (OLS) for minimizing residual errors. Additionally, it introduces gradient descent as an iterative method for optimizing the cost function in regression analysis.

Uploaded by

mazina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views56 pages

ML 3

This document provides an overview of supervised learning, specifically focusing on linear regression and gradient descent. It explains the process of supervised learning, the theory behind linear regression, and the method of ordinary least squares (OLS) for minimizing residual errors. Additionally, it introduces gradient descent as an iterative method for optimizing the cost function in regression analysis.

Uploaded by

mazina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Introduction to Machine Learning


Course 3 - Supervised Learning : Linear Regression
4th year Statistics and Data Science

Ayoub Asri

12 February 2025

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Section 1

Supervised Learning

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Supervised Learning

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Supervised Learning Process I

A simplified process can be presented as :

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Supervised Learning Process II

It can be defined after adjustements :

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Supervised Learning Process III

The final process is then defined as :

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Section 2

Supervised Learning : Linear Regression

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory I

The basic intuitive idea behind the linear regression is to try


to find constant linear relationship between the variables.

Ideally find the relationship : y = x

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory 2

After finding the relationship between y and x, we can use it to


fit estimate the value of new observation (that has not been
present in the original data, which means was not included
in the creation of the regression line)

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory 3

The real question is to know, where to put the regression line


for real life data.

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory 4

This line can be a good fit ?

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory 5

Or even this one !!

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory 6

The fundamental idea is to minimize the global distance


between the points and the regression line

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory 7

This distance (measure) is called : The residual error

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Theory 8

The best method which provides the best solution to this


problem is Ordinary Least squares (OLS)
Ps. the details can be found in the cours of econometrics.

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Linear Regression : Example


To better understand the subject of linear regression, we can
use a real life data set as an example.

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Linear Regression : General formulation


For a more general case :

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

OLS
For each variable of the data set, we will associate a coefficient β

ŷ = β0 x0 + · · · + βn xn
or

n
X
ŷ = β i xi
i=0

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

OLS solution formulation I

Find the algebraic formulation of the solution, means it is the


unique solution.

Uni variate case :


For example, for the case of a simple regression problem :
y = b0 + b1 x

The solution is given by :

i − x̄)(yi − ȳ)
 P
b1 = (x

2
(xi − x̄)
P

b0 = ȳ − b1 x̄

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

OLS solution formulation I

Multivariate case
For the multivariate case, where we have k explanatory
variables, the solution is given by :

′ ′
β = (β0 , · · · , βk ) = (X X)−1 (X Y )

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

OLS : drowbacks !

Since OLS provide a theoretical and algebraic solution,


does it always provide a good solution ?
What will happen to this solution, if we have many
observations ? or when we have many variables ?
We be there an effect ?
What is the alternative ?

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The alternative

The alternative to OLS in the case of large data sets or


generally in ML is to use methods that are iterative

Iterative methods can easily be implemented and are much


easier to handle
To introduce the iterative solution, we need to introduce
some concepts.

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Section 3

Gradient Descent

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The cost function

The main goal of this problem is to find the value of β that


minimize the value of the residual error

m
X
(y j − ŷ j )2
j=1

Or we can even calculate the mean of squared errors

m
1 X
(y j − ŷ j )2
m j=1

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Minimisation Problem

The goal of the problem is then : Find the values of β that


minimize the value of the mean of squared residuals.
This is called : a minimization problem with respect to a cost
function :

m
1 X
J(β) = (y j − ŷ j )2
2m j=1

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The cost function : simplification

We can simplify the cost function to :

m
1 X
J(β) = (y j − ŷ j )2
2m j=1
m n
!2
1 X
βi xji
X
= yj −
2m j=1 i=1

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Derivative of cost function

To find the minimum, we need to calculate the derivative :

 !2 
m n
∂J ∂  1 X
βi xji
X
(β) = yj − 
∂βk ∂βk 2m j=1 i=1
m n
!
1 X
βi xji (−xjk )
X
= yj −
m j=1 i=1

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The cost function 2

The analytical solution is very complex and take many


power and time to execute
We propose to determine the solution by Gradient
Descent

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 1
We will present the different steps of the Gradient Descent
Algorithm applied to this problem.
We start by caculating the derivative :
m n
!
∂J 1 X
βi xji (−xjk )
X
(β) = yj −
∂βk m j=1 i=1

For simplification and more general use, we present the


matrix form of the parial derivatives :

 ∂J 
∂β
 .0 
 .. 
∇β J =  
∂J
∂βn

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 2

The matrix form of the data is given by :

     
1 x11 x12 ... x1n y1 β0
1

x21 x22 ... 2
xn  
 y2 
 
 β1 
 
X=
 .. .. .. .. ..  y =  ..  β =  .. 
   
. . . . .   .   . 
1 xm
1 xm
2 ... xmn y m βn

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 3

Now, we calculate the gradient in the matrix form :


1 Pm  j Pn j
 
j
−m j=1 y − i=1 βi xi x0
..
 
∇β J = 
 
  . 


1 m n j
−m yj − xjn
P P
j=1 i=1 βi xi

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 3

We can simplify this form of the Gradient :

j j m Pn j j
P  P 
m
j=1 y x0 j=1 i=1 βi xi x0
1  .. + 1 
  .. 
∇β J = − .
m Pm .
 
 m 
j xj m n j j
y
P P
j=1 n j=1 β x
i=1 i i nx

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 4

Q. Ca, you determine what is are the unknowns in the last


formula ?

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 5

The only unknown is β

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 5

Goal of Gradient descent


We have to find the best method that allows us to “guess” the
correct values of β that minimizes the gradient.

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 6

Given a cost function J(β) how can we computationally search


for the correct value of β that minimizes the gradient of that
function ?
What would the search process looks like in the case of single
value of β ?

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 7

A common answer of the second question is : “the common


mountain analogy”
The common mountain analogy

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The Common mountain analogy I

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The Common mountain analogy II

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The Common mountain analogy III

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The Common mountain analogy IV

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The Common mountain analogy V

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

The Common mountain analogy VI

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 8

This is exaclty what gradient descent does


It even looks similar for the case of a single coefficient search

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 1 dimensional cost function 1

This is the case of a regression with only one explanatory


variable

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 1 dimensional cost function 2

We start by choosing a a starting point

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 1 dimensional cost function 3

Then, we calculate the gradient at that point

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 1 dimensional cost function 3

Step forward, proportional to negative gradient

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 1 dimensional cost function 4

Repeat the steps

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 1 dimensional cost function 5

Repeat the steps

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 1 dimensional cost function 6

what we are essentially doing is mapping the gradient.


If we walk along the gradient, we will Eventually ! find the
value that minimizes the value of the gradient.

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 8

some ideas about Gradient Descent :

Since steps are proportional to the negative of gradients


then : Steeper steps at start gives larger gradients
and smaller gradients at end gives smaller gradients
This practically assures a solution for every problem

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 2 dimensional cost function 1


We can apply the same principle to a 2-D cost function (two
variables).

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 2 dimensional cost function 2

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Example of 2 dimensional cost function 3

We can show the contour plot of this solution

Ayoub Asri
Introduction to Machine Learning
Supervised Learning Supervised Learning : Linear Regression Gradient Descent

Gradient Descent 9

Finally, we can propose the algorithm of Gradient Descent that


can be applied to any minimization problem
Gradient descent algorithm

Ayoub Asri
Introduction to Machine Learning

You might also like