0% found this document useful (0 votes)
14 views25 pages

Notes Unit 1-3 Part-III

The document discusses the concept of gradient descent, an optimization algorithm used in machine learning to minimize cost functions by iteratively adjusting parameters based on the gradient. It explains the process of gradient descent, including its advantages and disadvantages, as well as variations like Stochastic Gradient Descent and Mini Batch Stochastic Gradient Descent. Additionally, it touches on linear regression, highlighting the relationship between independent and dependent variables and the formulation of the regression line.

Uploaded by

Mayank Purohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views25 pages

Notes Unit 1-3 Part-III

The document discusses the concept of gradient descent, an optimization algorithm used in machine learning to minimize cost functions by iteratively adjusting parameters based on the gradient. It explains the process of gradient descent, including its advantages and disadvantages, as well as variations like Stochastic Gradient Descent and Mini Batch Stochastic Gradient Descent. Additionally, it touches on linear regression, highlighting the relationship between independent and dependent variables and the formulation of the regression line.

Uploaded by

Mayank Purohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Dr.

Rahul Dubey

41
Intuition Behind Linear Regression

Dr. Rahul Dubey

13 January 2025 42
Dr. Rahul Dubey

13 January 2025 43
Dr. Rahul Dubey

13 January 2025 44
Dr. Rahul Dubey

13 January 2025 45
Dr. Rahul Dubey

13 January 2025 46
Gradient Descent
➢ Gradient descent was initially discovered by "Augustin-Louis
Cauchy" in mid of 18th century. Gradient Descent is defined as one
of the most commonly used iterative optimization algorithms of
machine learning to train the machine learning and deep learning
models.
Dr. Rahul Dubey
➢ It helps in finding the local minimum or local maxima of a function.
 The main objective of using a gradient descent algorithm is to
minimize the cost function using iteration. To achieve this goal, it
performs two steps iteratively:
1) Calculates the first-order derivative of the function to compute the
gradient or slope of that function.
2) Move away from the direction of the gradient, which means slope
increased from the current point by alpha times, where Alpha is
defined as Learning Rate.
47
Dr. Rahul Dubey
➢ If we move towards a negative gradient or away from the gradient of the
function at the current point, it will give the local minimum of that
function.
➢ Whenever we move towards a positive gradient or towards the gradient of
the function at the current point, we will get the local maximum of that
function.

48
➢ To understand how gradient descent works, let’s consider a simple one-
dimensional problem. Imagine we have a machine learning model with a
single parameter, x. Our goal is to find the value for x that minimizes the
loss function.

f(x) = 6x^2 – 12x + 3

Dr. Rahul Dubey


➢ Analytically, we can find the function’s minimum by setting the first
derivative to zero and solving for x.
➢ However, in real-world applications, problems are often so complex that we
don’t know the form of the function in advance and can’t solve it
analytically. This is where gradient descent comes into play and helps us
find the minimum value

49
➢ In gradient descent, we calculate the
gradient, which for a one-dimensional
problem is essentially its first derivative.

d/dx = 12x – 12.


➢ We then initialize x to a random value and

Dr. Rahul Dubey


calculate the output of the function. By
calculating the gradient for x, we obtain the
direction of the slope.

➢ We adjust x by a small amount in the opposite direction of the gradient.


The step size is often referred to as the learning rate. This is to avoid
overshooting and missing the minimum.

➢ The gradient descent algorithm is

x = x – η * d/dx. 50
➢ Suppose we initialize x at -0.9 and eta at 0.03.

x= -0.9 – 0.03 * (-22.8)= -0.216

Dr. Rahul➢ opposite


Dubey
We adjust x by a small amount in the
direction of the gradient.

➢ As we continue, x will eventually reach


1. As you can see, as we get closer to
the minimum, the changes become
smaller because the slope softens.

51
➢ Consider a two-dimensional function

f(x,y) = 6x^2 + 9y^2 – 12x – 14y + 3.

d/dx = 12x – 12 x = x – η * d/dx


d/dy = 18y – 14 y = y – η * d/dy

Dr. Rahul Dubey


➢ In this way, we adjust each parameter in the direction that reduces the
function’s value the most, guided by the corresponding partial derivative.

52
Advantages:
1. Very simple to implement.

Disadvantages:

Dr. Rahul Dubey


1. This algorithm takes an entire dataset of n-points at a time
to compute the derivative to update the weights which
require a lot of memory.
2. Minima is reached after a long time or is never reached.
3. This algorithm can be stuck at local minima.

53
Batch, Stochastic and Mini Batch GD

Dr. Rahul Dubey

54
Stochastic Gradient Descent (SGD)

Dr. Rahul Dubey

55
Mini Batch Stochastic Gradient
Descent (MB-SGD)
➢ MB-SGD algorithm is an extension of the SGD algorithm and it overcomes
the problem of large time complexity in the case of the SGD algorithm.
MB-SGD algorithm takes a batch of points or subset of points from the
dataset to compute derivate.
Dr. Rahul Dubey
➢ It is observed that the derivate of the loss function for MB-SGD is almost
the same as a derivate of the loss function for GD after some number of
iterations. But the number of iterations to achieve minima is large for MB-
SGD compared to GD and the cost of computation is also large.
➢ The update of weight is dependent on the derivate of loss for a batch of
points. The updates in the case of MB-SGD are much noisy because the
derivative is not always towards minima.

56
Advantages:
1. Less time complexity to converge compared to standard
SGD algorithm.

Dr. Rahul Dubey


Disadvantages:
1. The update of MB-SGD is much noisy compared to the
update of the GD algorithm.
2. May get stuck at local minima.

57
Linear Regression
Training set (data set) how do we used it?
Notation • Take training set
m = number of training examples • Pass into a learning algorithm
x's = input variables / features • Algorithm outputs a function
y's = output variable "target" variables • This function takes an input (e.g. size of
(x,y) - single training example new house) Tries to output the estimated

Dr. Rahul Dubey


(xi, yj) - specific example (ith training value of Y
example) i is an index to training set

What does this mean?


• Means Y is a linear function of x
Hypothesis
• θi are parameters θ0 is zero
hθ(x) = θ0 + θ1x
condition θ1 is gradient

This is actually a univariate linear regression


Linear Regression Cost function

Dr. Rahul Dubey


Gradient Descent
Do the following until convergence

Dr. Rahul Dubey


Dr. Rahul Dubey

13 January 2025 61
Types of Regression
1) Simple Linear Regression
➢ In this case, we only have a single independent variable and a single
dependent variable.

➢ In linear regression, while developing the model we assume a linear


Dr. Rahul Dubey
relationship between the independent and dependent variable.

➢ In simple linear regression, we try to find a relationship between target


variable and input variables by fitting a line, known as the regression line.
y=m*x+b
y(x) = m0 + w1 * x
where w's are the parameters of the model, x is the input, and y is the target
variable.

13 January 2025 62

Dr. Rahul Dubey

13 January 2025 63
Multiple Linear Regression using
statistical approach

Dr. Rahul Dubey

64
Dr. Rahul Dubey

65

You might also like