0% found this document useful (0 votes)

14 views25 pages

Notes Unit 1-3 Part-III

The document discusses the concept of gradient descent, an optimization algorithm used in machine learning to minimize cost functions by iteratively adjusting parameters based on the gradient. It explains the process of gradient descent, including its advantages and disadvantages, as well as variations like Stochastic Gradient Descent and Mini Batch Stochastic Gradient Descent. Additionally, it touches on linear regression, highlighting the relationship between independent and dependent variables and the formulation of the regression line.

Uploaded by

Mayank Purohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views25 pages

Notes Unit 1-3 Part-III

Uploaded by

Mayank Purohit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Dr.

Rahul Dubey

41
Intuition Behind Linear Regression

Dr. Rahul Dubey

13 January 2025 42
Dr. Rahul Dubey

13 January 2025 43
Dr. Rahul Dubey

13 January 2025 44
Dr. Rahul Dubey

13 January 2025 45
Dr. Rahul Dubey

13 January 2025 46
Gradient Descent
➢ Gradient descent was initially discovered by "Augustin-Louis
Cauchy" in mid of 18th century. Gradient Descent is defined as one
of the most commonly used iterative optimization algorithms of
machine learning to train the machine learning and deep learning
models.
Dr. Rahul Dubey
➢ It helps in finding the local minimum or local maxima of a function.
 The main objective of using a gradient descent algorithm is to
minimize the cost function using iteration. To achieve this goal, it
performs two steps iteratively:
1) Calculates the first-order derivative of the function to compute the
gradient or slope of that function.
2) Move away from the direction of the gradient, which means slope
increased from the current point by alpha times, where Alpha is
defined as Learning Rate.
47
Dr. Rahul Dubey
➢ If we move towards a negative gradient or away from the gradient of the
function at the current point, it will give the local minimum of that
function.
➢ Whenever we move towards a positive gradient or towards the gradient of
the function at the current point, we will get the local maximum of that
function.

48
➢ To understand how gradient descent works, let’s consider a simple one-
dimensional problem. Imagine we have a machine learning model with a
single parameter, x. Our goal is to find the value for x that minimizes the
loss function.

f(x) = 6x^2 – 12x + 3

Dr. Rahul Dubey

➢ Analytically, we can find the function’s minimum by setting the first
derivative to zero and solving for x.
➢ However, in real-world applications, problems are often so complex that we
don’t know the form of the function in advance and can’t solve it
analytically. This is where gradient descent comes into play and helps us
find the minimum value

49
➢ In gradient descent, we calculate the
gradient, which for a one-dimensional
problem is essentially its first derivative.

d/dx = 12x – 12.

➢ We then initialize x to a random value and

Dr. Rahul Dubey

calculate the output of the function. By
calculating the gradient for x, we obtain the
direction of the slope.

➢ We adjust x by a small amount in the opposite direction of the gradient.

The step size is often referred to as the learning rate. This is to avoid
overshooting and missing the minimum.

➢ The gradient descent algorithm is

x = x – η * d/dx. 50
➢ Suppose we initialize x at -0.9 and eta at 0.03.

x= -0.9 – 0.03 * (-22.8)= -0.216

Dr. Rahul➢ opposite

Dubey
We adjust x by a small amount in the
direction of the gradient.

➢ As we continue, x will eventually reach

1. As you can see, as we get closer to
the minimum, the changes become
smaller because the slope softens.

51
➢ Consider a two-dimensional function

f(x,y) = 6x^2 + 9y^2 – 12x – 14y + 3.

d/dx = 12x – 12 x = x – η * d/dx

d/dy = 18y – 14 y = y – η * d/dy

Dr. Rahul Dubey

➢ In this way, we adjust each parameter in the direction that reduces the
function’s value the most, guided by the corresponding partial derivative.

52
Advantages:
1. Very simple to implement.

Disadvantages:

Dr. Rahul Dubey

1. This algorithm takes an entire dataset of n-points at a time
to compute the derivative to update the weights which
require a lot of memory.
2. Minima is reached after a long time or is never reached.
3. This algorithm can be stuck at local minima.

53
Batch, Stochastic and Mini Batch GD

Dr. Rahul Dubey

54
Stochastic Gradient Descent (SGD)


Dr. Rahul Dubey

55
Mini Batch Stochastic Gradient
Descent (MB-SGD)
➢ MB-SGD algorithm is an extension of the SGD algorithm and it overcomes
the problem of large time complexity in the case of the SGD algorithm.
MB-SGD algorithm takes a batch of points or subset of points from the
dataset to compute derivate.
Dr. Rahul Dubey
➢ It is observed that the derivate of the loss function for MB-SGD is almost
the same as a derivate of the loss function for GD after some number of
iterations. But the number of iterations to achieve minima is large for MB-
SGD compared to GD and the cost of computation is also large.
➢ The update of weight is dependent on the derivate of loss for a batch of
points. The updates in the case of MB-SGD are much noisy because the
derivative is not always towards minima.

56
Advantages:
1. Less time complexity to converge compared to standard
SGD algorithm.

Dr. Rahul Dubey

Disadvantages:
1. The update of MB-SGD is much noisy compared to the
update of the GD algorithm.
2. May get stuck at local minima.

57
Linear Regression
Training set (data set) how do we used it?
Notation • Take training set
m = number of training examples • Pass into a learning algorithm
x's = input variables / features • Algorithm outputs a function
y's = output variable "target" variables • This function takes an input (e.g. size of
(x,y) - single training example new house) Tries to output the estimated

Dr. Rahul Dubey

(xi, yj) - specific example (ith training value of Y
example) i is an index to training set

What does this mean?

• Means Y is a linear function of x
Hypothesis
• θi are parameters θ0 is zero
hθ(x) = θ0 + θ1x
condition θ1 is gradient

This is actually a univariate linear regression

Linear Regression Cost function

Dr. Rahul Dubey

Gradient Descent
Do the following until convergence

Dr. Rahul Dubey

13 January 2025 61
Types of Regression
1) Simple Linear Regression
➢ In this case, we only have a single independent variable and a single
dependent variable.

➢ In linear regression, while developing the model we assume a linear

Dr. Rahul Dubey
relationship between the independent and dependent variable.

➢ In simple linear regression, we try to find a relationship between target

variable and input variables by fitting a line, known as the regression line.
y=m*x+b
y(x) = m0 + w1 * x
where w's are the parameters of the model, x is the input, and y is the target
variable.

13 January 2025 62


Dr. Rahul Dubey

13 January 2025 63
Multiple Linear Regression using
statistical approach

Dr. Rahul Dubey

64
Dr. Rahul Dubey

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
LInear
No ratings yet
LInear
14 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
AI33
No ratings yet
AI33
6 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
Gradient Descent Regression
No ratings yet
Gradient Descent Regression
14 pages
ML Notes
No ratings yet
ML Notes
14 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
GD Types
No ratings yet
GD Types
98 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
ML02
No ratings yet
ML02
25 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression Using Gradient Descent
No ratings yet
Linear Regression Using Gradient Descent
2 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Module 3
No ratings yet
Module 3
27 pages
DeepLearning Lect2 3
No ratings yet
DeepLearning Lect2 3
89 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
(ML&PR 2025) Lec2 Regression II
No ratings yet
(ML&PR 2025) Lec2 Regression II
41 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Numerical Methods in Civil Engineering: Inst Ruct Ions T O Candidat Es
No ratings yet
Numerical Methods in Civil Engineering: Inst Ruct Ions T O Candidat Es
3 pages
(W-6a) SI-5101 Linear Programming (2021)
No ratings yet
(W-6a) SI-5101 Linear Programming (2021)
73 pages
Book1
No ratings yet
Book1
176 pages
Gaussian Quadrature
No ratings yet
Gaussian Quadrature
4 pages
Lesson 5
No ratings yet
Lesson 5
37 pages
Syllabus Fall2024
No ratings yet
Syllabus Fall2024
2 pages
MAK202E Numerical Methods Formula-Sheet-V3-20230606 (ATÇ)
No ratings yet
MAK202E Numerical Methods Formula-Sheet-V3-20230606 (ATÇ)
2 pages
Appc Unit 1a Review Solutions
No ratings yet
Appc Unit 1a Review Solutions
5 pages
Math30-1 Workbook One
No ratings yet
Math30-1 Workbook One
257 pages
Bspline
No ratings yet
Bspline
12 pages
Bayesian Multiple Linear Regression
No ratings yet
Bayesian Multiple Linear Regression
7 pages
f4 ch9 Rational Function
No ratings yet
f4 ch9 Rational Function
10 pages
Metode Numerik - Full
No ratings yet
Metode Numerik - Full
18 pages
IB Math HL Options Infinite Series Marathon
No ratings yet
IB Math HL Options Infinite Series Marathon
17 pages
Set4 - Revised Simplex Method
No ratings yet
Set4 - Revised Simplex Method
14 pages
Lecture 6
No ratings yet
Lecture 6
4 pages
18mab202t - Nm-Unit4
No ratings yet
18mab202t - Nm-Unit4
15 pages
Problem 5
No ratings yet
Problem 5
2 pages
Distribution and Network Models: Solutions
No ratings yet
Distribution and Network Models: Solutions
10 pages
Operations Research: CT-4-BCA-601
No ratings yet
Operations Research: CT-4-BCA-601
2 pages
Advances in Shell Finite Elements
No ratings yet
Advances in Shell Finite Elements
34 pages
Chicago16 Correia
No ratings yet
Chicago16 Correia
33 pages
TS1
No ratings yet
TS1
5 pages
Addition of Polynomials
No ratings yet
Addition of Polynomials
34 pages
Forelæsning 3 Linear and Integer Programming II
No ratings yet
Forelæsning 3 Linear and Integer Programming II
45 pages
Performance Tasks Grade 8 2ND Quarter 1 1
No ratings yet
Performance Tasks Grade 8 2ND Quarter 1 1
3 pages
An Introduction To Numerical Analysis: Second Edition
No ratings yet
An Introduction To Numerical Analysis: Second Edition
5 pages
U3 Polynomials Send in
No ratings yet
U3 Polynomials Send in
12 pages
Euler's Method and Error Analysis PDF
No ratings yet
Euler's Method and Error Analysis PDF
10 pages
Numerical Integration
No ratings yet
Numerical Integration
29 pages

Notes Unit 1-3 Part-III

Uploaded by

Notes Unit 1-3 Part-III

Uploaded by

Dr.

Dr. Rahul Dubey

f(x) = 6x^2 – 12x + 3

Dr. Rahul Dubey

d/dx = 12x – 12.

Dr. Rahul Dubey

➢ We adjust x by a small amount in the opposite direction of the gradient.

➢ The gradient descent algorithm is

x= -0.9 – 0.03 * (-22.8)= -0.216

Dr. Rahul➢ opposite

➢ As we continue, x will eventually reach

f(x,y) = 6x^2 + 9y^2 – 12x – 14y + 3.

d/dx = 12x – 12 x = x – η * d/dx

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

Dr. Rahul Dubey

What does this mean?

This is actually a univariate linear regression

Dr. Rahul Dubey

Dr. Rahul Dubey

➢ In linear regression, while developing the model we assume a linear

➢ In simple linear regression, we try to find a relationship between target

Dr. Rahul Dubey

Dr. Rahul Dubey

You might also like