0% found this document useful (0 votes)

57 views5 pages

Linear Regression Gradient Descent Vs Analytical Solution

The document compares gradient descent and analytical solutions for linear regression. Gradient descent is an iterative algorithm that minimizes a loss function by taking steps in the opposite direction of the gradient. For simple linear regression, it can minimize the sum of squared errors loss function. An analytical solution also exists by setting the partial derivatives of the loss function to zero. However, gradient descent is preferred for large datasets since it has lower computational complexity than the analytical solution. Gradient descent must also be used for models like logistic regression that lack closed-form analytical solutions.

Uploaded by

yt peek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views5 pages

Linear Regression Gradient Descent Vs Analytical Solution

Uploaded by

yt peek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

Linear Regression: Gradient Descent Vs Analytical

Solution

Egor Howell

An explanation of why Gradient Descent is frequently used in Data

Science with an implementation in C

Photo by Trần Ngọc Vân on Unsplash

Introduction

Gradient Descent is a ubiquitous optimization algorithm used throughout Data Science in

algorithms such as Neural Networks, Linear Regression, and Gradient Boosting
Machines. However, why is it used so frequently?

Gradient Descent Intuition

Let’s begin by explaining Gradient Descent. This is going to be a brief description as this topic has
been covered thoroughly, so please refer to other blogs or tutorials if you want a more dense

1 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

explanation.

Gradient Descent is a first-order iterative method to find the minimum of a differentiable

function. We start with an initial guess and slowly descend in the opposite direction of the
computed gradient at our current guess. The initial guess is then updated producing a new
improved value. This process is sequentially implemented until we converge to the minimum.

In Machine Learning, this differential function is the Loss Function, which tells us how well our
current model fits the data. Gradient Descent is then used to update the current parameters of the
model to minimize the Loss Function.

Gradient Descent For Simple Linear Regression

Perhaps the easiest example to demonstrate Gradient Descent is for a Simple Linear
Regression Model. In this case, our hypothesis function, h(x), depends on a single feature
variable, x:

Hypothesis for our model by author

Where θ_0 and θ_1 are the parameters of the model. The Loss Function for this problem is
the Sum of Squares Error (SSE):

Sum or Square Loss Function by author

Therefore, we will use Gradient Descent to find the value of the parameters that minimize the
above Loss Function.

As you can see, the Loss Function is differentiable and has a parabolic shape, hence it has a
minimum. As mentioned before, Gradient Descent updates the parameters of the model by taking
small steps in the opposite direction of the gradient. Therefore, we need to compute the
gradient of the Loss Function with respect to the two parameters:

Gradient for the intercept by author

Gradient for the slope by author

2 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

These parameters are then updated as:

Intercept update by author

Slope update by author

Where η is the learning rate that determines the step size that each parameter is updated by.
The learning rate is between zero and one and specifies how quickly we converge to the
minimum. If it is too large, we may overshoot the minimum, however too small leads to a
greater computational time. Therefore, a happy medium needs to be found. This is where
Hyperparameter Tuning is used through methods such as Grid and Random Search or
even a Bayesian approach.

Given enough iterations, in Data Science this is known as Training Epochs, the gradient will
tend to zero. Therefore, the current values of the parameters have minimized the Loss Function
and have converged.

Analytical Solution

However, there does exist an analytical solution to the Simple Linear Regression Model. Instead
of using numerical techniques, we can simply set the partial derivative equations to zero:

Equation by author

This is a system of two linear equations with two unknowns that can be solved analytically.
Through mathematical derivation and rearranging, the values of the parameters that satisfy the
above equations are:

Equation for the intercept by author

3 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

Equation for the slope by author

Where x̅ and ȳ are the mean of the data and the mean of the target variable respectively.
Therefore, through calculating these averages we can find the parameters that minimize the Loss
Function without using an iterative approach!

The above equation is the analytical solution for the Simple Linear Regression Model. This
is just a reduced version of the general solution for Linear Regression Models where we
could have more than two unknown parameters:

General solution, Equation by author

Where X is the matrix of the data, Y, is the target variable matrix and ϴ is the matrix of
parameters.

Why Gradient Descent Then?

So why do we use Gradient Descent when an analytical solution exists? This answer is solely based
on the computational time and space cost.

The time complexity of Gradient Descent is O(kn²) where k is the number of features and n is the
total number of data points. This complexity can further be improved through vectorized
implementations. This is how most Machine Learning algorithms are implemented today.

However, the general analytical solution for Linear Regression has a time complexity of O(�³).
Therefore, for small datasets the difference is negligible but the computational time difference
grows exponentially as the data size increases. Most datasets in practice are around 100 features
with 1 million rows. Thus, the analytical solution is not feasible for these scenarios.

Furthermore, for some models, such as the Poisson Regression and Logistic Regression,
setting the derivatives to zero leads to a set of non-linear equations with no closed-form
analytical solution, Thus, we are forced to use numerical methods such as Gradient Descent.

Conclusion

Gradient Descent is preferred over analytical solutions due to its computational speed and the lack
of closed-form solutions for some Regression models. This necessitates the implementation of
iterative numerical methods.

I hope you guys enjoyed this article and have learned something new! There are plenty of other
articles taking a more deep dive into some of the derivations I condensed in this post, so I would
recommend checking them out!

Connect With Me!

4 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

To read unlimited stories on Medium make sure to sign up here! �

To get an update when I post sign up to email notifications here! �

LinkedIn �

Twitter �

GitHub �

Kaggle �

(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA
4.0)

Something Extra!

Shown below is a sample code I wrote in C to showcase how Gradient Descent can be
programmed!

#include <stdio.h>
#include <stdlib.h>
#include <math.h>double dydx(double x);int main(){ int epochs, i;
double learning_rate, x, x_new; printf("Enter your intial guess integer: ");
scanf("%lf", &x);
printf("Enter how many epochs: ");
scanf("%d", &epochs);

printf("Enter your learning rate: ");

scanf("%lf", &learning_rate); for (i=1;i<epochs+1;++i){
x_new = x;
x_new = x_new - learning_rate*dydx(x_new);

if ((x - x_new) < 0.000001){

printf("number of epochs to coverge %d\n", i);
break;
}

x = x_new; } printf("The value of x that minimises is %lf", x);}double dydx(double x){

return 2*x - 5;
}

The full code can be found at my GitHub:

5 of 5 2/22/2023, 6:28 PM

Gradient Descent
No ratings yet
Gradient Descent
108 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
DeepLearning Lect2 3
No ratings yet
DeepLearning Lect2 3
89 pages
Introduction To Machine Learning: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
No ratings yet
Introduction To Machine Learning: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
59 pages
MLPPT
No ratings yet
MLPPT
36 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Regression
No ratings yet
Regression
25 pages
Numerical Methods and Modelling
100% (9)
Numerical Methods and Modelling
343 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Ch2 - Lec3 - Linear Regression and Gradient Descent
No ratings yet
Ch2 - Lec3 - Linear Regression and Gradient Descent
60 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Gradient Descent
No ratings yet
Gradient Descent
58 pages
SVM, RF, Decision Tree
No ratings yet
SVM, RF, Decision Tree
17 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Regression
No ratings yet
Regression
16 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Science of Programming Matrix Computations
100% (2)
Science of Programming Matrix Computations
178 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Gradient Descent
No ratings yet
Gradient Descent
7 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Optimization Techniques Draft
No ratings yet
Optimization Techniques Draft
4 pages
Models PDF
No ratings yet
Models PDF
86 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Sample Midterm With Solutions (Updated)
No ratings yet
Sample Midterm With Solutions (Updated)
26 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
LInear
No ratings yet
LInear
14 pages
Lecture 7
No ratings yet
Lecture 7
5 pages
Stochastic Gradient Descent Algorithm With Python and NumPy - Real
No ratings yet
Stochastic Gradient Descent Algorithm With Python and NumPy - Real
21 pages
The Binomial of Degree 100 Can Be
No ratings yet
The Binomial of Degree 100 Can Be
84 pages
CSE 412 Lab Manual 3 Linear Regression
No ratings yet
CSE 412 Lab Manual 3 Linear Regression
10 pages
ML Notes
No ratings yet
ML Notes
14 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent From Scratch Complete Intuition
No ratings yet
Gradient Descent From Scratch Complete Intuition
8 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Linear Regression: Normal Equation and Gradient Descent
No ratings yet
Linear Regression: Normal Equation and Gradient Descent
17 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Solver in Excel (Easy Tutorial) 4
No ratings yet
Solver in Excel (Easy Tutorial) 4
1 page
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
LLM Compact Guide
No ratings yet
LLM Compact Guide
9 pages
Gradient Descent Example PDF
No ratings yet
Gradient Descent Example PDF
3 pages
Linear Regression Using Batch Gradient Descent
No ratings yet
Linear Regression Using Batch Gradient Descent
7 pages
Gradient Descent For Linear Regression - Coursera
No ratings yet
Gradient Descent For Linear Regression - Coursera
1 page
Your Results For: "Multiple Choice"
No ratings yet
Your Results For: "Multiple Choice"
12 pages
Alpha-Beta Pruning - Example: Minimax On A 6-Ply Game Horizon Depth: H 6
No ratings yet
Alpha-Beta Pruning - Example: Minimax On A 6-Ply Game Horizon Depth: H 6
28 pages
An Introduction To Gradient Descent and Linear Regression
No ratings yet
An Introduction To Gradient Descent and Linear Regression
8 pages
Simplification and Minimization of Boolean Functions
No ratings yet
Simplification and Minimization of Boolean Functions
48 pages
Team Exercise 5 2022-23 - Exercise Set Simplex - Set 1 of 3
No ratings yet
Team Exercise 5 2022-23 - Exercise Set Simplex - Set 1 of 3
2 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
3 pages
Gradient Descent Algorithm Matlab
No ratings yet
Gradient Descent Algorithm Matlab
3 pages
Maths 35
No ratings yet
Maths 35
11 pages
Lecture 4: The Simplex Method
No ratings yet
Lecture 4: The Simplex Method
8 pages
Counting Sort - Good
No ratings yet
Counting Sort - Good
8 pages
LLM Attention
No ratings yet
LLM Attention
13 pages
Full Quadrant Approximations For Arctangent Tips and Tricks
No ratings yet
Full Quadrant Approximations For Arctangent Tips and Tricks
6 pages
Introduction To: Artificial Intelligence
No ratings yet
Introduction To: Artificial Intelligence
86 pages
Daa Unit 2 2 Daa
No ratings yet
Daa Unit 2 2 Daa
24 pages
Fuzzy Systems Lecture 6 Not Important PDF
No ratings yet
Fuzzy Systems Lecture 6 Not Important PDF
73 pages
Mth-III Assignment I
No ratings yet
Mth-III Assignment I
2 pages
Abhay Sharma DSC
No ratings yet
Abhay Sharma DSC
11 pages
Roots of Nonlinear Functions
No ratings yet
Roots of Nonlinear Functions
9 pages
Lab 5
No ratings yet
Lab 5
5 pages
Notes On Implementing Attention - Eli Bendersky
No ratings yet
Notes On Implementing Attention - Eli Bendersky
12 pages
Linear Programming Solution Methods
No ratings yet
Linear Programming Solution Methods
38 pages
L03 PDF
No ratings yet
L03 PDF
16 pages
NUM METH No. 1 4
No ratings yet
NUM METH No. 1 4
10 pages
Linear System Equations: Mongi BLEL King Saud University
No ratings yet
Linear System Equations: Mongi BLEL King Saud University
24 pages
Finite Volume Interpolation Schemes
No ratings yet
Finite Volume Interpolation Schemes
2 pages
A Numerical Algorithm To Find A Root of Non-Linear Equations Using Householder's Method
No ratings yet
A Numerical Algorithm To Find A Root of Non-Linear Equations Using Householder's Method
8 pages
2019 Sem 2 Kte Kulim, Kedah (A)
No ratings yet
2019 Sem 2 Kte Kulim, Kedah (A)
4 pages
Week 4: Linear Programming Modeling Quiz: Your Email Will Be Recorded When You Submit This Form
No ratings yet
Week 4: Linear Programming Modeling Quiz: Your Email Will Be Recorded When You Submit This Form
4 pages
Fuzzy Neutrosophic Quadratic Programming Problem As A Linear Complementarity Problem
No ratings yet
Fuzzy Neutrosophic Quadratic Programming Problem As A Linear Complementarity Problem
7 pages
L2 Introduction
No ratings yet
L2 Introduction
8 pages
Mock Quiz 6 With Sol.
No ratings yet
Mock Quiz 6 With Sol.
2 pages
Sors 2105 Ga-1
No ratings yet
Sors 2105 Ga-1
2 pages

Linear Regression Gradient Descent Vs Analytical Solution

Uploaded by

Linear Regression Gradient Descent Vs Analytical Solution

Uploaded by

Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

Linear Regression: Gradient Descent Vs Analytical

An explanation of why Gradient Descent is frequently used in Data

Photo by Trần Ngọc Vân on Unsplash

Gradient Descent is a ubiquitous optimization algorithm used throughout Data Science in

Gradient Descent Intuition

Gradient Descent is a first-order iterative method to find the minimum of a differentiable

Gradient Descent For Simple Linear Regression

Hypothesis for our model by author

Sum or Square Loss Function by author

Gradient for the intercept by author

Gradient for the slope by author

These parameters are then updated as:

Intercept update by author

Slope update by author

Equation for the intercept by author

Equation for the slope by author

General solution, Equation by author

Why Gradient Descent Then?

Connect With Me!

To read unlimited stories on Medium make sure to sign up here! �

To get an update when I post sign up to email notifications here! �

printf("Enter your learning rate: ");

if ((x - x_new) < 0.000001){

x = x_new; } printf("The value of x that minimises is %lf", x);}double dydx(double x){

The full code can be found at my GitHub:

You might also like