0% found this document useful (0 votes)
57 views5 pages

Linear Regression Gradient Descent Vs Analytical Solution

The document compares gradient descent and analytical solutions for linear regression. Gradient descent is an iterative algorithm that minimizes a loss function by taking steps in the opposite direction of the gradient. For simple linear regression, it can minimize the sum of squared errors loss function. An analytical solution also exists by setting the partial derivatives of the loss function to zero. However, gradient descent is preferred for large datasets since it has lower computational complexity than the analytical solution. Gradient descent must also be used for models like logistic regression that lack closed-form analytical solutions.

Uploaded by

yt peek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views5 pages

Linear Regression Gradient Descent Vs Analytical Solution

The document compares gradient descent and analytical solutions for linear regression. Gradient descent is an iterative algorithm that minimizes a loss function by taking steps in the opposite direction of the gradient. For simple linear regression, it can minimize the sum of squared errors loss function. An analytical solution also exists by setting the partial derivatives of the loss function to zero. However, gradient descent is preferred for large datasets since it has lower computational complexity than the analytical solution. Gradient descent must also be used for models like logistic regression that lack closed-form analytical solutions.

Uploaded by

yt peek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

Linear Regression: Gradient Descent Vs Analytical


Solution

Egor Howell

An explanation of why Gradient Descent is frequently used in Data


Science with an implementation in C

Photo by Trần Ngọc Vân on Unsplash

Introduction

Gradient Descent is a ubiquitous optimization algorithm used throughout Data Science in


algorithms such as Neural Networks, Linear Regression, and Gradient Boosting
Machines. However, why is it used so frequently?

Gradient Descent Intuition

Let’s begin by explaining Gradient Descent. This is going to be a brief description as this topic has
been covered thoroughly, so please refer to other blogs or tutorials if you want a more dense

1 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

explanation.

Gradient Descent is a first-order iterative method to find the minimum of a differentiable


function. We start with an initial guess and slowly descend in the opposite direction of the
computed gradient at our current guess. The initial guess is then updated producing a new
improved value. This process is sequentially implemented until we converge to the minimum.

In Machine Learning, this differential function is the Loss Function, which tells us how well our
current model fits the data. Gradient Descent is then used to update the current parameters of the
model to minimize the Loss Function.

Gradient Descent For Simple Linear Regression

Perhaps the easiest example to demonstrate Gradient Descent is for a Simple Linear
Regression Model. In this case, our hypothesis function, h(x), depends on a single feature
variable, x:

Hypothesis for our model by author

Where θ_0 and θ_1 are the parameters of the model. The Loss Function for this problem is
the Sum of Squares Error (SSE):

Sum or Square Loss Function by author

Therefore, we will use Gradient Descent to find the value of the parameters that minimize the
above Loss Function.

As you can see, the Loss Function is differentiable and has a parabolic shape, hence it has a
minimum. As mentioned before, Gradient Descent updates the parameters of the model by taking
small steps in the opposite direction of the gradient. Therefore, we need to compute the
gradient of the Loss Function with respect to the two parameters:

Gradient for the intercept by author

Gradient for the slope by author

2 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

These parameters are then updated as:

Intercept update by author

Slope update by author

Where η is the learning rate that determines the step size that each parameter is updated by.
The learning rate is between zero and one and specifies how quickly we converge to the
minimum. If it is too large, we may overshoot the minimum, however too small leads to a
greater computational time. Therefore, a happy medium needs to be found. This is where
Hyperparameter Tuning is used through methods such as Grid and Random Search or
even a Bayesian approach.

Given enough iterations, in Data Science this is known as Training Epochs, the gradient will
tend to zero. Therefore, the current values of the parameters have minimized the Loss Function
and have converged.

Analytical Solution

However, there does exist an analytical solution to the Simple Linear Regression Model. Instead
of using numerical techniques, we can simply set the partial derivative equations to zero:

Equation by author

Equation by author

This is a system of two linear equations with two unknowns that can be solved analytically.
Through mathematical derivation and rearranging, the values of the parameters that satisfy the
above equations are:

Equation for the intercept by author

3 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

Equation for the slope by author

Where x̅ and ȳ are the mean of the data and the mean of the target variable respectively.
Therefore, through calculating these averages we can find the parameters that minimize the Loss
Function without using an iterative approach!

The above equation is the analytical solution for the Simple Linear Regression Model. This
is just a reduced version of the general solution for Linear Regression Models where we
could have more than two unknown parameters:

General solution, Equation by author

Where X is the matrix of the data, Y, is the target variable matrix and ϴ is the matrix of
parameters.

Why Gradient Descent Then?

So why do we use Gradient Descent when an analytical solution exists? This answer is solely based
on the computational time and space cost.

The time complexity of Gradient Descent is O(kn²) where k is the number of features and n is the
total number of data points. This complexity can further be improved through vectorized
implementations. This is how most Machine Learning algorithms are implemented today.

However, the general analytical solution for Linear Regression has a time complexity of O(�³).
Therefore, for small datasets the difference is negligible but the computational time difference
grows exponentially as the data size increases. Most datasets in practice are around 100 features
with 1 million rows. Thus, the analytical solution is not feasible for these scenarios.

Furthermore, for some models, such as the Poisson Regression and Logistic Regression,
setting the derivatives to zero leads to a set of non-linear equations with no closed-form
analytical solution, Thus, we are forced to use numerical methods such as Gradient Descent.

Conclusion

Gradient Descent is preferred over analytical solutions due to its computational speed and the lack
of closed-form solutions for some Regression models. This necessitates the implementation of
iterative numerical methods.

I hope you guys enjoyed this article and have learned something new! There are plenty of other
articles taking a more deep dive into some of the derivations I condensed in this post, so I would
recommend checking them out!

Connect With Me!

4 of 5 2/22/2023, 6:28 PM
Linear Regression: Gradient Descent Vs Analytical Solution https://towardsdatascience.com/why-gradient-descent-is-so-common-...

To read unlimited stories on Medium make sure to sign up here! �

To get an update when I post sign up to email notifications here! �

LinkedIn �

Twitter �

GitHub �

Kaggle �

(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA
4.0)

Something Extra!

Shown below is a sample code I wrote in C to showcase how Gradient Descent can be
programmed!

#include <stdio.h>
#include <stdlib.h>
#include <math.h>double dydx(double x);int main(){ int epochs, i;
double learning_rate, x, x_new; printf("Enter your intial guess integer: ");
scanf("%lf", &x);
printf("Enter how many epochs: ");
scanf("%d", &epochs);

printf("Enter your learning rate: ");


scanf("%lf", &learning_rate); for (i=1;i<epochs+1;++i){
x_new = x;
x_new = x_new - learning_rate*dydx(x_new);

if ((x - x_new) < 0.000001){


printf("number of epochs to coverge %d\n", i);
break;
}

x = x_new; } printf("The value of x that minimises is %lf", x);}double dydx(double x){


return 2*x - 5;
}

The full code can be found at my GitHub:

5 of 5 2/22/2023, 6:28 PM

You might also like