0% found this document useful (0 votes)
48 views30 pages

Gradient Descent in Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views30 pages

Gradient Descent in Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Gradient Descent

in Linear Regression
In any machine learning model development our main
objective is how good model accuracy is or how much
model prediction differs from actual data points

Based on difference between model prediction and


actual data points we try to find parameters of model
which gives better accuracy on dataset concerned

In order to find these parameters we apply gradient


descent on cost function of machine learning model
What is Gradient Descent
• Gradient Descent is an iterative optimization algorithm which tries to
find optimum value (Minimum/Maximum) of an objective function
• Gradient Descent is one of the most used optimization techniques in
machine learning models for updating parameters of model in order
to minimize cost function
• Gradient Descent finds best parameters of model which gives highest
accuracy on training as well as testing datasets
• In gradient descent gradient is a vector which points in direction of
steepest increase of function at specific point
• Moving in opposite direction of gradient allows algorithm to gradually
descend towards lower values of function and eventually reaching to
minimum of function
Steps Required in Gradient Descent Algorithm

• Step 1 We first initialize parameters of the model randomly


• Step 2 Compute gradient of cost function with respect to each
parameter [It involves making partial differentiation of cost
function with respect to parameters]
• Step 3 Update parameters of model by taking steps in opposite
direction of model [Here we choose hyperparameter learning
rate which helps in deciding step size of gradient]
• Step 4 Repeat steps 2 and 3 iteratively to get best parameter
for defined model
max_iterations is number of iterations required to
update parameters

w, b are weights and bias parameters

η is learning rate parameter


In order to apply gradient descent on any dataset
we have functions using which we can update
parameters and apply it for making predictions
• gradient_descent
• compute_predictions
• compute_gradient
• update_parameters
In Regression problems our model targets to get best-fit regression
line in order to predict value y based on given input value (x)

While training the model, it calculates cost function like Root Mean
Squared error between predicted value and true value [Our model
minimizes cost function]

To minimize this cost function, model needs to have best value of


θ1 and θ2

Initially model selects θ1 and θ2 values randomly and then iteratively


update these value in order to minimize cost function until it reaches
minimum
By the time model achieves minimum cost function, it will have
best θ1 and θ2 values

Using these updated values of θ1 and θ2 in hypothesis equation of


linear equation, our model predicts output value y
How do θ1 and θ2 values get updated?
Gradient Descent Algorithm For Linear Regression
How Does Gradient Descent Work

• Gradient descent works by moving downward towards pits or


valleys in graph to find minimum value
• This is achieved by taking derivative of cost function
• During each iteration, gradient descent step-down cost
function in direction of steepest descent
• By adjusting parameters in this direction, it seeks to reach
minimum of cost function and finds best-fit values for
parameters
• The size of each step is determined by Learning Rate
In Gradient Descent algorithm we have two points

If slope is +ve: θj = θj – (+ve value) [Hence value of θj decreases]

If slope is -ve : θj = θj – (-ve value) [Hence value of θj increases]


If slope is +ve: θj = θj – (+ve value) [Hence value of θj decreases]
If slope is -ve : θj = θj – (-ve value) [Hence value of θj increases]
How to Choose Learning Rate
Python implementation of Gradient Descent Algorithm
attached alongwith
Python code output

100 epochs elapsed


Current accuracy is : 0.9836456109008862
Regression line before gradient descent iteration
Regression line after gradient descent iteration
Accuracy graph for gradient descent on model
Advantages of Gradient Descent

• Flexibility: Gradient Descent can be used with various cost


functions and can handle non-linear regression problems

• Scalability: Gradient Descent is scalable to large datasets


since it updates parameters for each training example one
at a time

• Convergence: Gradient Descent can converge to global


minimum of cost function provided that learning rate is set
appropriately
Disadvantages of Gradient Descent

• Sensitivity to Learning Rate: The choice of learning rate can be


critical since using high learning rate can cause algorithm to
overshoot minimum while low learning rate can make algorithm
converge slowly
• Slow Convergence: Algorithm may require more iterations to
converge to minimum since it updates parameters for each training
example one at a time
• Local Minima: Algorithm can get stuck in local minima if cost
function has multiple local minima
• Noisy updates: Algorithm updates are noisy and have high
variance which can make optimization process less stable and lead
to oscillations around minimum

You might also like