Notes Unit 1-3 Part-III
Notes Unit 1-3 Part-III
Rahul Dubey
41
Intuition Behind Linear Regression
13 January 2025 42
Dr. Rahul Dubey
13 January 2025 43
Dr. Rahul Dubey
13 January 2025 44
Dr. Rahul Dubey
13 January 2025 45
Dr. Rahul Dubey
13 January 2025 46
Gradient Descent
➢ Gradient descent was initially discovered by "Augustin-Louis
Cauchy" in mid of 18th century. Gradient Descent is defined as one
of the most commonly used iterative optimization algorithms of
machine learning to train the machine learning and deep learning
models.
Dr. Rahul Dubey
➢ It helps in finding the local minimum or local maxima of a function.
The main objective of using a gradient descent algorithm is to
minimize the cost function using iteration. To achieve this goal, it
performs two steps iteratively:
1) Calculates the first-order derivative of the function to compute the
gradient or slope of that function.
2) Move away from the direction of the gradient, which means slope
increased from the current point by alpha times, where Alpha is
defined as Learning Rate.
47
Dr. Rahul Dubey
➢ If we move towards a negative gradient or away from the gradient of the
function at the current point, it will give the local minimum of that
function.
➢ Whenever we move towards a positive gradient or towards the gradient of
the function at the current point, we will get the local maximum of that
function.
48
➢ To understand how gradient descent works, let’s consider a simple one-
dimensional problem. Imagine we have a machine learning model with a
single parameter, x. Our goal is to find the value for x that minimizes the
loss function.
49
➢ In gradient descent, we calculate the
gradient, which for a one-dimensional
problem is essentially its first derivative.
x = x – η * d/dx. 50
➢ Suppose we initialize x at -0.9 and eta at 0.03.
51
➢ Consider a two-dimensional function
52
Advantages:
1. Very simple to implement.
Disadvantages:
53
Batch, Stochastic and Mini Batch GD
54
Stochastic Gradient Descent (SGD)
55
Mini Batch Stochastic Gradient
Descent (MB-SGD)
➢ MB-SGD algorithm is an extension of the SGD algorithm and it overcomes
the problem of large time complexity in the case of the SGD algorithm.
MB-SGD algorithm takes a batch of points or subset of points from the
dataset to compute derivate.
Dr. Rahul Dubey
➢ It is observed that the derivate of the loss function for MB-SGD is almost
the same as a derivate of the loss function for GD after some number of
iterations. But the number of iterations to achieve minima is large for MB-
SGD compared to GD and the cost of computation is also large.
➢ The update of weight is dependent on the derivate of loss for a batch of
points. The updates in the case of MB-SGD are much noisy because the
derivative is not always towards minima.
56
Advantages:
1. Less time complexity to converge compared to standard
SGD algorithm.
57
Linear Regression
Training set (data set) how do we used it?
Notation • Take training set
m = number of training examples • Pass into a learning algorithm
x's = input variables / features • Algorithm outputs a function
y's = output variable "target" variables • This function takes an input (e.g. size of
(x,y) - single training example new house) Tries to output the estimated
13 January 2025 61
Types of Regression
1) Simple Linear Regression
➢ In this case, we only have a single independent variable and a single
dependent variable.
13 January 2025 62
13 January 2025 63
Multiple Linear Regression using
statistical approach
64
Dr. Rahul Dubey
65