0% found this document useful (0 votes)

7 views26 pages

GradientDescent-Regression Slides

Regression is a fundamental supervised learning algorithm used to predict continuous values based on input data. It includes models like linear and polynomial regression, where the relationship between variables is established through parameters that are learned from data. Techniques such as least squares and gradient descent are employed to optimize these models and minimize prediction errors.

Uploaded by

7601nandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

GradientDescent-Regression Slides

Uploaded by

7601nandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Regression

● Regression algorithm is one of the most fundamental machine

● learning algorithms.
● We have come across regression problems in some stage of our life.
○ Predicting the price of a house given the house features
○ The relationship between height and weight

● Regression is a supervised learning problem where we provide the

algorithm with the true value of each data during the training
process.The trained model is used to predict continuous values.
Regression
Let us assume that we are given m data points D= <x1,y1>, <x2,y2>, … <xm,ym>.
● the problem is to determine a function f such that f(x) is the best
Then
predictor for y, with respect to D.

There are several common regression models like:

● Linear regression
● Polynomial regression
Linear Regression
● The simplest form, which assumes a linear relationship between the independent
●variables and the dependent variable.
● Example:
○ Predict the price of house based on factors like area, location etc.
○ Predict height from age

the goal is to build a system that can take a

vector x ∈ Rn as input and predict the value of a
scalar y ∈ R as its output. The output of linear
regression is a linear function of the input. Let ŷ
be the value that our model predicts y should
take on. We define the output to be ŷ = wTx,
where w ∈ Rn is a vector of parameters.
Linear Regression
Parameters are values that control the behavior of the system. In this case, wi is the
●
coefficient that we multiply by feature xi before summing up the contributions from all
the features. We can think of w as a set of weights that determine how
each feature affects the prediction. If a feature xi receives a positive weight wi,
then increasing the value of that feature increases the value of our prediction ŷ.
If a feature receives a negative weight, then increasing the value of that feature
decreases the value of our prediction. If feature’s weight is large in magnitude,
then it has a large effect on the prediction. If feature’s weight is zero, it has no
effect on the prediction.
Linear Regression
● Consider a simple linear regression model y = 𝛽0+ 𝛽1x + 𝜀, where
●
y is termed as the dependent variable and x as the independent.
● The terms 𝛽0+ 𝛽1 are the parameters of the model.
○ 𝛽0 is termed as an intercept term
○ 𝛽1 is termed as slope parameter.These are called regression
coefficients
○ The error component 𝜀 accounts for the failure of data to lie on
a straight line. It represents the difference between true and
observed realization of y.
Least Squares Linear Regression
● Least squares linear regression is a method used to find the best
●fitting line through a set of data points. The goal is to minimize the
sum of the squared differences between the observed data points and
the predicted values on the line.
● We can write the model for each observation as yi = 𝛽0+ 𝛽1xi + 𝜀i (i =
1,2,..,n)
● Least squares method minimize the sum of squared differences.
● sum of squared differences
● We have to find the value of 𝛽0 and 𝛽1 which minimizes the above
error.
Least Squares Linear Regression
How to learn 𝛽 0 and 𝛽 1
●
It can be solved in different ways. We can find a closed form solution given the
training examples.
● Line Slope = cov(x,y) / var(x)
● Line Intercept = mean(y) - slope(x, y) * mean(x)
● Slope 𝛽1is calculated as

Where

● Intercept 𝛽0is calculated as

Least Squares Linear Regression
● For a linear regression model with multiple features, the relationship
●
between the target variable y and the input features X is expressed as:
Least Squares Linear Regression
Transform this to a matrix we get :
●

y is an n×1 vector of observed outputs.

X is an n×(p+1) matrix of inputs (independent variables), where n is the number of
observations and p+1 represents the features (including the intercept term, which is
typically the first column of ones).
𝛳 is a (p+1)×1 vector of the model parameters (coefficients).
Least Squares Linear Regression
We want to minimize the sum of squared differences between the observed outputs
and●the predictions made by the linear model:

Using matrix notation we can rewrite it as

To ﬁnd the least squares solution, we need to ﬁnd the point at which the gradient of
the objective function is zero. The gradient of J(θ) with respect to θ is given by:
Least Squares Linear Regression
Set the gradient to 0 and solve for 𝛳 we get
●

This above equation is the normal equation for least squares linear regression.
The closed form solution for the linear regression is
It is important to note that the closed-form solution exists only when the matrix
Xᵀ * X is invertible. Also it is suitable for smaller datasets where matrix inversion is
computationally feasible.
Least Squares Linear Regression
●
● least squares linear regression, a closed-form solution allows us to
calculate the regression coefficients (e.g., β0 and β1) in one step using
matrix algebra, specifically by solving the normal equation.
Example
Let’s assume we have the following data points: x=[1,2,3,4,5], y=[2,3,5,4,6].
Compute β0 and β1
Polynomial Regression
● It is an extension of linear regression where the relationship between the
●
independent variable x and the dependent variable y is modeled as an nth
degree polynomial. It allows for more complex, non-linear
relationships between the input and output variables by fitting a
polynomial curve to the data.
In polynomial regression, the model takes
the form: y=β0+β1x+β2x2+⋯+βnxn+ϵ
Polynomial Regression
Polynomial regression can be solved using matrix algebra. Given a design
●
matrix X (which contains the original and polynomial-transformed features),
the parameters 𝛳 can be estimated by:

Where X is:
Learning Parameters: Gradient Descent
Gradient descent (GD) is a mechanism in supervised learning to learn parameters
● network by navigating the error surface in an efficient and principled
of neural
way. It is used to find the function parameters (coefficients) that minimize a loss
function.
Error surfaces, on the other hand, are graphical representations of the
relationship between the model’s parameters and the corresponding error values.

Let θ be vector of parameters. θ = [ω, b] ∈ R2 . ω and

b, are randomly initialized.

Change in value of ω and b be ∆θ = [∆ω, ∆b]

Learning Parameters: Gradient Descent
● move only by a small amount η. Moving
in the direction of ∆θ, in small steps
η∆θ then the resultant vector is θnew
shown in red colour in ﬁgure.

θnew = θ + η · ∆θ. Now how to ﬁnd the value of ∆θ

Let us denote ∆θ = u. Then by Tayer series L(θ + ηu) can be written as

L(θ + ηu) = L(θ) + η ∗ u T ∇L(θ) + (η2/2! )∗ u T ∇2L(θ)u + (η3/3!) ∗ ... + (η 4/4!)∗ …

= L(θ) + η ∗ u T∇L(θ) [η is typically small, so η2 , η3 , ... → 0]

where ∇L(θ) is the gradient vector as

Learning Parameters: Gradient Descent
L(θ + ηu) = L(θ) + η ∗ u T∇L(θ)
●
Note that the move (ηu) would be favorable only if, L(θ + ηu) − L (θ) < 0 [i.e., if the new
loss is less than the previous loss]

This implies that uT∇L(θ) be less than zero

Let β be the angle between u and ∇L(θ), then we know that,

Multiply throughout by k
cos(β) = −1 when the angle is 180◦ ,

Therefore u should move in a direction opposite to the gradient.

Learning Parameters: Gradient Descent

●
Learning Parameters: Gradient Descent

●
Gradient Descent:Example

●
Batch Gradient Descent
batch gradient descent: gradient descent uses all n training examples for weight
●updates
In this approach, the algorithm calculates the gradient of the cost function using the
entire dataset before updating the weights (or parameters).

Single Weight Update per Epoch: After calculating the average gradient using all
training examples, the algorithm updates the model's weights once. This process is
repeated for multiple iterations until the cost function converges to a minimum.
Batch Gradient Descent
Consider the example of linear regression hθ(x)=θ0+θ1x, let the cost function
●be mean squared error.

As per gradient descent to update the parameters we need gradients of θ0 and

θ1

Note that it uses all m training examples for weight updates.

In ML applications as we have millions of data it becomes very expensive.

Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a variation of the gradient descent
●
algorithm where we update the model parameters (weights) using one training
example at a time.

Stochastic Gradient Descent (SGD), on the other hand, speeds things up by

updating the model’s parameters after processing each individual training example

This makes the algorithm faster and more memory-eﬃcient, especially for large
datasets, but it can introduce some noise into the updates

In the above cost function take m =1, so each weight update will happen for each data
point.
Stochastic Gradient Descent
Algorithm:
●
Mini-Batch Gradient Descent
the algorithm updates the parameters after it sees mini batch size number of data
●
points.

Averaging the gradients gives a better sense of gradient direction which is consistent
with number of samples.
Mini-batch version SGD is default option for training neural networks
●

Potential of Cucumber (Cucumis Sativus) Peel Extract As Alternative Moisturizing Soap
No ratings yet
Potential of Cucumber (Cucumis Sativus) Peel Extract As Alternative Moisturizing Soap
8 pages
# Managing Conflict in The Workplace
100% (2)
# Managing Conflict in The Workplace
53 pages
IMSO 19th Preparation Statement - Final
No ratings yet
IMSO 19th Preparation Statement - Final
1 page
Bearing Cross Reference PDF
100% (2)
Bearing Cross Reference PDF
21 pages
Hot Rod Coyote Swap Guide Reprint July 2013
No ratings yet
Hot Rod Coyote Swap Guide Reprint July 2013
8 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Gpro From HTfed
No ratings yet
Gpro From HTfed
19 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Talent 100 HSC Study Guide
100% (5)
Talent 100 HSC Study Guide
39 pages
Lesson 8 - Regression-T
No ratings yet
Lesson 8 - Regression-T
54 pages
Regression
No ratings yet
Regression
16 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Massimi On Pauli'SPrinciple
100% (1)
Massimi On Pauli'SPrinciple
227 pages
Regression Linear Simple
No ratings yet
Regression Linear Simple
37 pages
Unit No. 2
No ratings yet
Unit No. 2
30 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
2-Linear Regression
No ratings yet
2-Linear Regression
31 pages
Activity 1
No ratings yet
Activity 1
4 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Regression
No ratings yet
Regression
25 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
Chap 2 Linear Regression - Part1
No ratings yet
Chap 2 Linear Regression - Part1
29 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
CSE 412 Lab Manual 3 Linear Regression
No ratings yet
CSE 412 Lab Manual 3 Linear Regression
10 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
General Ledger Conversion Document - Workday Community
No ratings yet
General Ledger Conversion Document - Workday Community
7 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Analogy
No ratings yet
Analogy
4 pages
PH Process Chemistry
No ratings yet
PH Process Chemistry
25 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
MICRO CHAP6 ACTS DRAFT Copy 1
No ratings yet
MICRO CHAP6 ACTS DRAFT Copy 1
3 pages
Cybersecurity Mesh
No ratings yet
Cybersecurity Mesh
12 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Access - Catalog.805b.Color - DP&Casing Tools-46
No ratings yet
Access - Catalog.805b.Color - DP&Casing Tools-46
1 page
Linear Regression
No ratings yet
Linear Regression
11 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
En5922 Db08a-01
No ratings yet
En5922 Db08a-01
2 pages
Vocab Words Economist
No ratings yet
Vocab Words Economist
38 pages
Safety Data Sheet: Product Name: MOBILGREASE 28
No ratings yet
Safety Data Sheet: Product Name: MOBILGREASE 28
10 pages
Zackaria, Fared - The Rise of Illiberal Democracy PDF
No ratings yet
Zackaria, Fared - The Rise of Illiberal Democracy PDF
13 pages
3M Petrifilm Yeast Molds
No ratings yet
3M Petrifilm Yeast Molds
8 pages
Parts
No ratings yet
Parts
4 pages
Apex Voltage To Current Conversion
No ratings yet
Apex Voltage To Current Conversion
4 pages
3.7 Design Formulas For Siphons
No ratings yet
3.7 Design Formulas For Siphons
6 pages
Will Happen?
No ratings yet
Will Happen?
43 pages
Formatting Tags Available in ArcMap
No ratings yet
Formatting Tags Available in ArcMap
11 pages
Soci1001 Unit 5
No ratings yet
Soci1001 Unit 5
15 pages
DS Program1
No ratings yet
DS Program1
6 pages
UPhL Ep 01
No ratings yet
UPhL Ep 01
6 pages
Activity 6-2 Name: Joella Mae Escanda Section: B
No ratings yet
Activity 6-2 Name: Joella Mae Escanda Section: B
2 pages
Cloud Computing Tutorial
No ratings yet
Cloud Computing Tutorial
6 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

GradientDescent-Regression Slides

Uploaded by

GradientDescent-Regression Slides

Uploaded by

Regression

● Regression algorithm is one of the most fundamental machine

● Regression is a supervised learning problem where we provide the

There are several common regression models like:

the goal is to build a system that can take a

● Intercept 𝛽0is calculated as

y is an n×1 vector of observed outputs.

Using matrix notation we can rewrite it as

Let θ be vector of parameters. θ = [ω, b] ∈ R2 . ω and

Change in value of ω and b be ∆θ = [∆ω, ∆b]

θnew = θ + η · ∆θ. Now how to ﬁnd the value of ∆θ

L(θ + ηu) = L(θ) + η ∗ u T ∇L(θ) + (η2/2! )∗ u T ∇2L(θ)u + (η3/3!) ∗ ... + (η 4/4!)∗ …

where ∇L(θ) is the gradient vector as

This implies that uT∇L(θ) be less than zero

Therefore u should move in a direction opposite to the gradient.

As per gradient descent to update the parameters we need gradients of θ0 and

Note that it uses all m training examples for weight updates.

In ML applications as we have millions of data it becomes very expensive.

Stochastic Gradient Descent (SGD), on the other hand, speeds things up by

You might also like