0% found this document useful (0 votes)

43 views23 pages

Everything You Need To Know About Linear Regression

Uploaded by

jabidabrarctg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views23 pages

Everything You Need To Know About Linear Regression

Uploaded by

jabidabrarctg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Home Beginner Everything you need to Know about Linear Regression!

K KAVITA MALI
24 May, 2024 • 19 min read

Introduction
Linear Regression, a foundational algorithm in data science, plays a pivotal role
in predicting continuous outcomes. This guide provides an in-depth
exploration of Linear Regression, covering its principles, applications, and
implementation in Python on a real-world dataset. From understanding simple
and multiple linear regression to unveiling its significance, limitations, and
practical use cases, this article serves as a comprehensive resource for both
beginners and practitioners. Join us on this journey through the intricacies of
linear regression, offering insights into its workings and hands-on application.
This article is part of the Data Science Blogathon, delivering valuable

tsil gnidaeR
knowledge for data enthusiasts.
Learning Objective
Understand the principles and applications of linear regression.
Differentiate between simple and multiple linear regression.
Learn how to implement linear regression in Python.
Grasp the concept of gradient descent and its use in optimizing linear
regression.
Explore evaluation metrics for assessing linear regression models.
Recognize the assumptions and potential pitfalls, such as overfitting and
multicollinearity, in linear regression.
This article was published as a part of the Data Science Blogathon.

Table of contents

Everything
What is Linear Regression?
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
Linear regression predicts the relationship between two variables by assuming
a linear connection between the independent and dependent variables. It
seeks the optimal line that minimizes the sum of squared differences between
predicted and actual values. Applied in various domains like economics and
finance, this method analyzes and forecasts data trends. It can extend to
multiple linear regression involving several independent variables and logistic
regression, suitable for binary classification problems

Simple Linear Regression

In a simple linear regression, there is one independent variable and one
dependent variable. The model estimates the slope and intercept of the line of
best fit, which represents the relationship between the variables. The slope
represents the change in the dependent variable for each unit change in the
independent variable, while the intercept represents the predicted value of the
dependent variable when the independent variable is zero.
Linear regression is a quiet and the simplest statistical regression technique
used for predictive analysis in machine learning. Linear regression shows the
linear relationship between the independent(predictor) variable i.e. X-axis and
the dependent(output) variable i.e. Y-axis, called linear regression. If there is a
single input variable X(independent variable), such linear regression is simple
linear regression.

403

The graph above presents the linear relationship between the output(y) and
predictor(X) variables. The blue line is referred to as the best-fit straight line.
Based on the given data points, we attempt to plot a line that fits the points
the best.

Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
Simple Regression
agree to ourCalculation
Privacy Policy and Terms of Use. Accept
To calculate best-fit line linear regression uses a traditional slope-intercept
form which is given below,
Yi= β0+β1Xi
where Y i = Dependent variable, β 0 = constant/Intercept, β 1 =
Slope/Intercept, X i = Independent variable.
This algorithm explains the linear relationship between the dependent(output)
variable y and the independent(predictor) variable X using a straight line Y= B
0 + B 1 X.
But how does the linear regression find out which is the best-fit line?
The goal of the linear regression algorithm is to get the best values for B 0
and B 1 to find the best-fit line. The best-fit line is a line that has the least
error which means the error between predicted values and actual values
should be minimum.

But how the linear regression finds out which is the best fit line?
The goal of the linear regression algorithm is to get the best values for B0 and
B1 to find the best fit line. The best fit line is a line that has the least error
which means the error between predicted values and actual values should be
minimum.
Random Error(Residuals)
In regression, the difference between the observed value of the dependent
variable(y i ) and the predicted value(predicted) is called the residuals.
ε i = y predicted – y i
Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
where y predicted
agree= toBour
0 +Privacy
B 1 X Policy
i and Terms of Use. Accept
What is the Best Fit Line?
In simple terms, the best-fit line is a line that fits the given scatter plot in the
best way. Mathematically, the best-fit line is obtained by minimizing the
Residual Sum of Squares (RSS).

Cost Function for Linear Regression

The cost function helps to work out the optimal values for B 0 and B 1 , which
provides the best-fit line for the data points.
In Linear Regression, generally Mean Squared Error (MSE) cost function is
used, which is the average squared error that occurred between the y
predicted and y i.
We calculate MSE using the simple linear equation y=mx+b:

Using the MSE function, we’ll update the values of B 0 and B 1 such that the
MSE value settles at the minima. These parameters can be determined using
the gradient descent method such that the value for the cost function is
minimum.

Gradient Descent for Linear Regression

Gradient Descent is one of the optimization algorithms that optimize the cost
function (objective function) to reach the optimal minimal solution. To find the
optimum solution, we need to reduce the cost function (MSE) for all data
points. This is done by updating the values of the slope coefficient (B1) and
the constant coefficient (B0) iteratively until we get an optimal solution for the
linear function.
A regression model optimizes the gradient descent algorithm to update the
coefficients of the line by reducing the cost function by randomly selecting
coefficient values and then iteratively updating the coefficient values to reach
the minimum cost function.

Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
Gradient Descent Example
Let’s take an example to understand this. Imagine a U-shaped pit. You are
standing at the uppermost point in the pit, and your motive is to reach the
bottom of the pit. Suppose there is a treasure at the bottom of the pit, and you
can only take a discrete number of steps to reach the bottom. If you opted to
take one step at a time, you would get to the bottom of the pit in the end but,
this would take a longer time. If you decide to take larger steps each time, you
may achieve the bottom sooner but, there’s a probability that you could
overshoot the bottom of the pit and not even near the bottom. In the gradient
descent algorithm, the number of steps you’re taking can be considered as the
learning rate, and this decides how fast the algorithm converges to the
minima.

To update B 0 and B 1, we take gradients from the cost function. To find these
gradients, we take partial derivatives for B 0 and B 1.

Why Linear Regression is Important?

Linear regression is important for a few reasons:
Simplicity and interpretability: It’s a relatively easy concept to understand
and apply. The resulting simple linear regression model is a straightforward
equation that shows how one variable affects another. This makes it easier
to explain and trust the results compared to more complex models.
Prediction: Linear regression allows you to predict future values based on
existing data. For instance, you can use it to predict sales based on
marketing spend or house prices based on square footage.
Foundation for other techniques: It serves as a building block for many
Everything you need to Knowother
aboutdataLinear
science and machine learning methods. Even complex
Regression!
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
algorithms often rely on linear regression as a starting point or for
comparison purposes.
Widespread applicability: Linear regression can be used in various fields,
from finance and economics to science and social sciences. It’s a versatile
tool for uncovering relationships between variables in many real-world
scenarios.
In essence, linear regression provides a solid foundation for understanding
data and making predictions. It’s a cornerstone technique that paves the way
for more advanced data analysis methods.

Evaluation Metrics for Linear Regression

The strength of any linear regression model can be assessed using various
evaluation metrics. These evaluation metrics usually provide a measure of how
well the observed outputs are being generated by the model.
The most used metrics are,
1. Coefficient of Determination or R-squared (R2)
2. Root Mean Squared Error (RSME) and Residual Standard Error (RSE)
Coefficient of Determination or R-Squared (R2)
R-squared is a number that explains the amount of variation that is
explained/captured by the developed model. It always ranges between 0 & 1 .
Overall, the higher the value of R-squared, the better the model fits the data.
Mathematically it can be represented as,
R2 = 1 – ( RSS/TSS )
Residual sum of Squares (RSS) is defined as the sum of squares of the
residual for each data point in the plot/data. It is the measure of the
difference between the expected and the actual observed output.

Total Sum of Squares (TSS) is defined as the sum of errors of the data
points from the mean of the response variable. Mathematically TSS is,

Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
where y hat is theagree
mean of Privacy
to our the sample
Policy data points.
and Terms of Use. Accept
The significance of R-squared is shown by the following figures,

Root Mean Squared Error

The Root Mean Squared Error is the square root of the variance of the
residuals. It specifies the absolute fit of the model to the data i.e. how close
the observed data points are to the predicted values. Mathematically it can be
represented as,

To make this estimate unbiased, one has to divide the sum of the squared
residuals by the degrees of freedom rather than the total number of data
points in the model. This term is then called the Residual Standard Error(RSE).
Mathematically it can be represented as,

R-squared is a better measure than RSME. Because the value of Root Mean
Squared Error depends on the units of the variables (i.e. it is not a normalized
measure), it can change with the change in the unit of the variables.

Assumptions of Linear Regression

Regression is a parametric approach, which means that it makes assumptions
about the data for analysis. For successful regression analysis, it’s essential to
validate the following assumptions.
1. Linearity of residuals: There needs to be a linear relationship between the
dependent variable and independent variable(s).

3. Normal distribution of residuals: The mean of residuals should follow a

normal distribution with a mean equal to zero or close to zero. This is done
to check whether the selected line is the line of best fit or not. If the error
terms are non-normally distributed, suggests that there are a few unusual
data points that must be studied closely to make a better model.

4. The equal variance of residuals: The error terms must have constant
variance. This phenomenon is known as Homoscedasticity. The presence of
non-constant variance in the error terms is referred to as Heteroscedasticity.
Generally, non-constant variance arises in the presence of outliers or extreme
leverage values.

Assessing the Model Fit

Some other parameters to assess a model are:
t statistic: It is used to determine the p-value and hence, helps in
determining whether the coefficient is significant or not
F statistic: It is used to assess whether the overall model fit is significant
or not. Generally, the higher the value of the F-statistic, the more
significant a model turns out to be.

Multiple Linear Regression

Multiple linear regression is a technique to understand the relationship
between a single dependent variable and multiple independent variables.
The formulation for multiple linear regression is also similar to simple linear
regression with
the small change that instead of having one beta variable, you will now have
betas for all the variables used. The formula is given as:
Y = B0 + B1X1 + B2X2 + … + BpXp + ε

Considerations of Multiple Linear

Regression
Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
All four assumptions made for Simple Linear Regression still hold for Multiple
Linear Regression along with a few new additional assumptions.
Overfitting: When more and more variables are added to a model, the
model may become far too complex and usually ends up memorizing all the
data points in the training set. This phenomenon is known as the
overfitting of a model. This usually leads to high training accuracy and very
low test accuracy.
Multicollinearity: It is the phenomenon where a model with several
independent variables, may have some variables interrelated.
Feature Selection: With more variables present, selecting the optimal set
of predictors from the pool of given features (many of which might be
redundant) becomes an important task for building a relevant and better
model.

Multicollinearity
As multicollinearity makes it difficult to find out which variable is contributing
towards the prediction of the response variable, it leads one to conclude
incorrectly, the effects of a variable on the target variable. Though it does not
affect the precision of the model predictions, it is essential to properly detect
and deal with the multicollinearity present in the model, as random removal of
any of these correlated variables from the model causes the coefficient values
to swing wildly and even change signs.
Multicollinearity can be detected using the following methods.
Pairwise Correlations: Checking the pairwise correlations between
different pairs of independent variables can throw useful insights into
detecting multicollinearity.
Variance Inflation Factor (VIF): Pairwise correlations may not always be
useful as it is possible that just one variable might not be able to
completely explain some other variable but some of the variables
combined could be ready to do this. Thus, to check these sorts of relations
between variables, one can use VIF. VIF explains the relationship of one
independent variable with all the other independent variables. VIF is given
by,

refers to the ith variable which is being represented as a linear combination of

where i
Everything
We use cookiesyou need toVidhya
on Analytics Know about
thewebsites
rest toLinear
of the Regression!
independent
deliver our variables.
services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
The common heuristic followed for the VIF values is if VIF > 10 then the value
is high and it should be dropped. And if the VIF=5 then it may be valid but
should be inspected first. If VIF < 5, then it is considered a good VIF value.

Overfitting and Underfitting in Linear

Regression
There have always been situations where a model performs well on training
data but not on the test data. While training models on a dataset, overfitting,
and underfitting are the most common problems faced by people.
Before understanding overfitting and underfitting one must know about bias
and variance.
Bias
Bias is a measure to determine how accurate a model’s predictions are likely to
be on future unseen data. Complex models, assuming there is enough training
data available, can make accurate model predictions. Whereas the models that
are too naive, are very likely to perform badly concerning model predictions.
Simply, Bias is errors made by training data.
Generally, linear algorithms have a high bias which makes them fast to learn
and easier to understand but in general, are less flexible. Implying lower
predictive performance on complex problems that fail to meet the expected
outcomes.
Variance
Variance is the sensitivity of the model towards training data, that is it
quantifies how much the model will react when input data is changed.
Ideally, the model shouldn’t change too much from one training dataset to the
next training data, which means that the algorithm is good at picking out the
hidden underlying patterns between the inputs and the output variables.
Ideally, a model should have lower variance which means that the model
doesn’t change drastically after changing the training data(it is generalizable).
Having higher variance will make a model change drastically even on a small
change in the training dataset.
Let’s understand what a bias-variance tradeoff is.
Everything
We use cookiesyou need toVidhya
on Analytics Bias Variance
Knowwebsites
abouttoLinear Tradeoff
Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
In the pursuit of optimal performance, a supervised machine learning algorithm
seeks to strike a balance between low bias and low variance for increased
robustness.
In the realm of machine learning, there exists an inherent relationship between
bias and variance, characterized by an inverse correlation.
Increased bias leads to reduced variance.
Conversely, heightened variance results in diminished bias.
Finding an equilibrium between bias and variance is crucial, and algorithms
must navigate this trade-off for optimal outcomes.
In practice, calculating precise bias and variance error terms is challenging
due to the unknown nature of the underlying target function.
Now, let’s delve into the nuances of overfitting and underfitting.

Overfitting
When a model learns every pattern and noise in the data to such an extent
that it affects the performance of the model on the unseen future dataset, it is
referred to as overfitting. The model fits the data so well that it interprets
noise as patterns in the data.
When a model has low bias and higher variance it ends up memorizing the
data and causing overfitting. Overfitting causes the model to become specific
rather than generic. This usually leads to high training accuracy and very low
test accuracy.
Detecting overfitting is useful, but it doesn’t solve the actual problem. There
are several ways to prevent overfitting, which are stated below:
Cross-validation
Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
If abouttoLinear
the training Regression!
data
deliver our
is too small to train add more relevant and clean data.
services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
If the training data is too large, do some feature selection and remove
unnecessary features.
Regularization

Underfitting
Underfitting is not often discussed as often as overfitting is discussed. When
the model fails to learn from the training dataset and is also not able to
generalize the test dataset, is referred to as underfitting. This type of problem
can be very easily detected by the performance metrics.
When a model has high bias and low variance it ends up not generalizing the
data and causing underfitting. It is unable to find the hidden underlying
patterns in the data. This usually leads to low training accuracy and very low
test accuracy. The ways to prevent underfitting are stated below,
Increase the model complexity
Increase the number of features in the training data
Remove noise from the data.

Hands-on Coding: Linear Regression Model

This is the section where you’ll find out how to perform the regression in
Python. We will use Advertising sales channel prediction data. You can
access the data here.

TV Radio Newspaper Sales

230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 12.0
151.5 41.3 58.5 16.5
180.8 10.8 58.4 17.9
8.7 48.9 75.0 7.2
57.5 32.8 23.5 11.8
‘Sales’ is the target variable that needs to be predicted. Now, based on this
data, our objective is to create a predictive model, that predicts sales based
on the money spent on different platforms for marketing.
Let us straightaway get right down to some hands-on coding to get this
prediction
Everything you need to Know aboutdone.
LinearPlease don’t feel overlooked if you do not have experience
Regression!
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
with Python. The best way to learn is to get your hands dirty by solving a
problem – like the one we are doing.
Step 1: Importing Python Libraries
The first step is to fire up your Jupyter notebook and load all the prerequisite
libraries in your Jupyter notebook. Here are the important libraries that we will
be needing for this linear regression.
NumPy (to perform certain mathematical operations)
pandas (to storethe data in a pandas DataFrames)
matplotlib.pyplot (you will use matplotlib to plot the data)
In order to load these, just start with these few lines of codes in your first cell:
#Importing all the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Supress Warnings
import warnings
warnings.filterwarnings('ignore')

The last line of code helps in suppressing the unnecessary warnings.

Step 2: Loading the Dataset
Let us now import data into a DataFrame. A DataFrame is a data type in
Python. The simplest way to understand it would be that it stores all your data
in tabular format.
#Read the given CSV file, and view some sample records
advertising = pd.read_csv( "advertising.csv" )
advertising.head()

Step 3: Visualization
Everything
We use cookiesyou need toVidhya
on Analytics Know usabout
Letwebsites Linear
plottothe scatter
deliver Regression!
our plot for analyze
services, target web
variable
traffic,vs.andpredictor variables
improve your in aonsingle
experience the site. By using Analytics Vidhya, you
plot to get the intuition. Also,Privacy
agree to our plotting
Policyaand
heatmap
Terms offorUse.
all the variables,
Accept
#Importing seaborn library for visualizations
import seaborn as sns

#to plot all the scatterplots in a single plot

sns.pairplot(advertising, x_vars=[ 'TV', ' Newspaper.,'Radio' ], y_vars = 'Sales', si
plt.show()

#To plot heatmap to find out correlations

sns.heamap( advertising.corr(), cmap = 'YlGnBl', annot = True )
plt.show()

From the scatterplot and the heatmap, we can observe that ‘Sales’ and ‘TV’
have a higher correlation as compared to others because it shows a linear
pattern in the scatterplot as well as giving 0.9 correlation.
You can go ahead and play with the visualizations and can find out interesting
insights from the data.
Step 4: Performing Simple Linear Regression
Here, as the TV and Sales have a higher correlation we will perform the simple
linear regression for these variables.
We can use sklearn or statsmodels to apply linear regression. So we will go
ahead with statmodels.
We first assign the feature variable, `TV`, during this case, to the variable `X`
and the response variable, `Sales`, to the variable `y`.
Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
X = advertising[ 'TV' ]
y = advertising[ 'Sales' ]

And after assigning the variables you need to split our variable into training
and testing sets. You’ll perform this by importing train_test_split from
the sklearn.model_selection library. It is usually a good practice to keep 70% of
the data in your train dataset and the rest 30% in your test dataset.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, train_size = 0.7, test_siz

In this way, you can split the data into train and test sets.
One can check the shapes of train and test sets with the following code,
print( X_train.shape )
print( X_test.shape )
print( y_train.shape )
print( y_test.shape )

importing statmodels library to perform linear regression

import statsmodels.api as sm

By default, the statsmodels library fits a line on the dataset which passes
through the origin. But in order to have an intercept, you need to manually use
the add_constant attribute of statsmodels. And once you’ve added the constant
to your X_train dataset, you can go ahead and fit a regression line using
the OLS (Ordinary Least Squares) the attribute of statsmodels as shown below,
# Add a constant to get an intercept
X_train_sm = sm.add_constant(X_train)
# Fit the resgression line using 'OLS'
lr = sm.OLS(y_train, X_train_sm).fit()

One can see the values of betas using the following code,
# Print the parameters,i.e. intercept and slope of the regression line obtained
lr.params

Here, 6.948 is the intercept, and 0.0545 is a slope for the variable TV.
Now, let’s see the evaluation metrics for this linear regression operation. You
Everything
We use cookiesyou need toVidhya
on Analytics Know abouttoview
canwebsites
simply LineartheourRegression!
deliver summary using the
services, analyze webfollowing
traffic, andcode,
improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
#Performing a summary operation lists out all different parameters of the regression
print(lr.summary())

Summary
As you can see, this code gives you a brief summary of the linear regression.
Here are some key statistics from the summary:
1. The coefficient for TV is 0.054, with a very low p-value. The coefficient is
statistically significant. So the association is not purely by chance.
2. R – squared is 0.816 Meaning that 81.6% of the variance in `Sales` is
explained by `TV`. This is a decent R-squared value.
3. F-statistics has a very low p-value(practically low). Meaning that the
model fit is statistically significant, and the explained variance isn’t purely
by chance.
Step 5: Performing predictions on the test set
Now that you have simply fitted a regression line on your train dataset, it is
time to make some predictions on the test data. For this, you first need to add
a constant to the X_test data like you did for X_train and then you can simply
go on and predict the y values corresponding to X_test using the predict the
attribute of the fitted regression line.
# Add a constant to X_test
X_test_sm = sm.add_constant(X_test)
# Predict the y values corresponding to X_test_sm
y_pred = lr.predict(X_test_sm)

You can see the predicted values with the following code,
Everything
We use cookiesyou need toVidhya
on Analytics Knowwebsites
abouttoLinear Regression!
deliver our
y_pred.head() services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
To check how well the values are predicted on the test data we will check
some evaluation metrics using sklearn library.
#Imporitng libraries
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

#RMSE value
print( "RMSE: ",np.sqrt( mean_squared_error( y_test, y_pred ) )
#R-squared value
print( "R-squared: ",r2_score( y_test, y_pred ) )

We are getting a decent score for both train and test sets.
Apart from `statsmodels`, there is another package namely `sklearn` that can
be used to perform linear regression. We will use the `linear_model` library
from `sklearn` to build the model. Since we have already performed a train-
test split, we don’t need to do it again.
There’s one small step that we need to add, though. When there’s only a single
feature, we need to add an additional column in order for the linear regression
fit to be performed successfully. Code is given below,
X_train_lm = X_train_lm.values.reshape(-1,1)
X_test_lm = X_test_lm.values.reshape(-1,1)

One can check the change in the shape of the above data frames.
print(X_train_lm.shape)
print(X_train_lm.shape)

To fit the model, write the below code,

You can get the intercept and slope values with sklearn using the following
code,
#get intercept
print( lr.intercept_ )
#get slope
print( lr.coef_ )

This is how we can perform the simple linear regression.

Conclusion
This is how we can perform the simple linear regression.
In conclusion, Linear Regression is a cornerstone in data science, providing a
robust framework for predicting continuous outcomes. As we unravel its
intricacies and applications, it becomes evident that Linear Regression is a
versatile tool with widespread implications. This article is a comprehensive
guide from its role in modeling relationships to real-world implementations in
Python.
For those eager to delve deeper into the world of data science and machine
learning, Analytics Vidhya’s AI & ML BlackBelt+ program offers an immersive
learning experience. Elevate your skills and navigate the evolving landscape of
data science with mentorship and hands-on projects. Join BB+ today and
unlock the next level in your data science journey!
Key Takeaways
Linear regression predicts relationships between variables by fitting a line
that minimizes prediction errors.
Simple linear regression involves one predictor and one outcome variable,
while multiple linear regression includes several predictors.
The cost function, often minimized using gradient descent, determines the
best-fit line in linear regression.
Evaluation metrics like R-squared and RMSE measure the model’s
Everything you need to Know abouttoLinear
performance
We use cookies on Analytics Vidhya websites deliverandRegression!
ourfit.services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
Assumptions such as linearity, independence, normal distribution, and
constant variance of residuals are crucial for valid regression analysis.
Proper feature selection and validation techniques help mitigate overfitting
and multicollinearity in regression models.
The media shown in this article are not owned by Analytics Vidhya and are
used at the Author’s discretion.
blogathon linear regression

K KAVITA MALI
24 May 2024
A Mathematics student turned Data Scientist. I am an aspiring data scientist
who aims at learning all the necessary concepts in Data Science in detail. I am
passionate about Data Science knowing data manipulation, data visualization,
data analysis, EDA, Machine Learning, etc which will help to find valuable
insights from the data.

Beginner Guide Linear Regression Machine Learning R

Frequently Asked Questions

Q1. What are the parameters of a linear regression?
A. Linear regression has two main parameters: slope (weight) and intercept. The
slope represents the change in the dependent variable for a unit change in the
independent variable. The intercept is the value of the dependent variable when the
independent variable is zero. The goal is to find the best-fitting line that minimizes
the difference between predicted and actual values.

Q2. What is the formula for a linear regression line?

Q3. What is the application of linear regression?

Q4. What is a basic example of linear regression?

Everything
We use cookiesyou need toVidhya
Knowwebsites
abouttoLinear Regression!
on Analytics deliver our
Responses From Readers
services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
What are your thoughts?...

Submit reply

Write for us
Write, captivate, and earn accolades and rewards for your work

Reach a Global Audience

Get Expert Feedback
Build Your Brand & Audience
Cash In on Your Knowledge
Join a Thriving Community
Level Up Your Data Science Game

Sion Chakrabarti CHIRAG GOYAL

16 87

Company Discover Learn

About Us Blogs Free courses
Contact Us Expert session Learning path
Careers Podcasts BlackBelt program
Comprehensive Guides Gen AI
Engage Contribute Enterprise
Community Contribute & win Our offerings
Everything
We use cookiesyou need toVidhya
on Analytics Know abouttoLinear
websites
Hackathons
Regression!
deliver our services, analyzea speaker
Become web traffic, and improve
Caseyourstudies
experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept
Events Become a mentor Industry report
Daily challenges Become an instructor quexto.ai

Download App

We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you
agree to our Privacy Policy and Terms of Use. Accept

Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Regression Analysis
100% (2)
Regression Analysis
11 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Director's Concept & Vision Slides
100% (2)
Director's Concept & Vision Slides
14 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
COHESIVE DEVICES-Advanced
100% (2)
COHESIVE DEVICES-Advanced
2 pages
Unit 2
No ratings yet
Unit 2
19 pages
OPCRF Plan Template For School Heads Elem 2023 2024
No ratings yet
OPCRF Plan Template For School Heads Elem 2023 2024
11 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Mazda Engineering Standard: Teruhisa Morishige
No ratings yet
Mazda Engineering Standard: Teruhisa Morishige
10 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
15 Types of Regression in Data Science PDF
No ratings yet
15 Types of Regression in Data Science PDF
42 pages
Exploratory Data Analytics-1
No ratings yet
Exploratory Data Analytics-1
27 pages
P-1.3.1 Linear Regression Analysis
No ratings yet
P-1.3.1 Linear Regression Analysis
9 pages
6 RegressionEDITED
No ratings yet
6 RegressionEDITED
26 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Data Analytics Regression Unit III
No ratings yet
Data Analytics Regression Unit III
27 pages
DSR Notes 3 To 5
No ratings yet
DSR Notes 3 To 5
70 pages
Math Wizard 2023 - R2 - QP - Set A Sample - Class 04 PDF
No ratings yet
Math Wizard 2023 - R2 - QP - Set A Sample - Class 04 PDF
1 page
BA Unit 2 Notes
No ratings yet
BA Unit 2 Notes
5 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Model Development
No ratings yet
Model Development
80 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear Regression - 1st Draft
No ratings yet
Linear Regression - 1st Draft
5 pages
Hanan
No ratings yet
Hanan
9 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Module 3
No ratings yet
Module 3
34 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Unit 2
No ratings yet
Unit 2
136 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Solving One Variable Linear Equations
No ratings yet
Solving One Variable Linear Equations
10 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
7 Regresión Lineal
No ratings yet
7 Regresión Lineal
28 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
11.metar and Taf
No ratings yet
11.metar and Taf
51 pages
Regression
No ratings yet
Regression
4 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Assignment Group C
No ratings yet
Assignment Group C
8 pages
Group 1 Practical
No ratings yet
Group 1 Practical
16 pages
THESESAASTU 2019 Diversion Weir
50% (2)
THESESAASTU 2019 Diversion Weir
68 pages
Module 4.1 - Minimum Design Lateral Force
No ratings yet
Module 4.1 - Minimum Design Lateral Force
6 pages
Astm E1269 - 11 (2018)
No ratings yet
Astm E1269 - 11 (2018)
2 pages
Department of Electrical Engineering: M.B.M Engineering College, Jodhpur
No ratings yet
Department of Electrical Engineering: M.B.M Engineering College, Jodhpur
16 pages
Jayson Dr. Palisoc Domain 3 Diversity of Learners
No ratings yet
Jayson Dr. Palisoc Domain 3 Diversity of Learners
7 pages
Impact of Colonialism On Africa and Its Economic Development
No ratings yet
Impact of Colonialism On Africa and Its Economic Development
8 pages
ACR-Orientation Work Arrangement
No ratings yet
ACR-Orientation Work Arrangement
10 pages
0471 Thermal Insulation and Pliable Membranes
No ratings yet
0471 Thermal Insulation and Pliable Membranes
9 pages
Beta Catalog Et b1 2005
No ratings yet
Beta Catalog Et b1 2005
317 pages
Regression
No ratings yet
Regression
4 pages
Problems On Ages
No ratings yet
Problems On Ages
3 pages
Library Management System Using Java: ASHUTOSH PATRA (2001229024) LALAJI PRASAD PANDA (2001229088) BINAYAK BAL (2001229025)
No ratings yet
Library Management System Using Java: ASHUTOSH PATRA (2001229024) LALAJI PRASAD PANDA (2001229088) BINAYAK BAL (2001229025)
28 pages
Steam Purity of 500 MW
No ratings yet
Steam Purity of 500 MW
1 page
Advertisement For 2024 - 2025
No ratings yet
Advertisement For 2024 - 2025
2 pages
GS Paper 1 1 Geography of India
No ratings yet
GS Paper 1 1 Geography of India
42 pages
Standard Operating Procedure Title: Determination of PH GTP Number Supersedes Standard Effective Date
No ratings yet
Standard Operating Procedure Title: Determination of PH GTP Number Supersedes Standard Effective Date
2 pages
Y9 2. Possibility Diagram
No ratings yet
Y9 2. Possibility Diagram
13 pages
WC4331
No ratings yet
WC4331
4 pages
12322-Article Text-35970-1-10-20160617
No ratings yet
12322-Article Text-35970-1-10-20160617
5 pages
Konica Monolta Drum (Photoconductor) DR512-DR512K
No ratings yet
Konica Monolta Drum (Photoconductor) DR512-DR512K
4 pages
SATs Revision Pack - 20-04-2025
No ratings yet
SATs Revision Pack - 20-04-2025
9 pages
Career Opportunities - Food Security Cluster Coordinator - WFP
No ratings yet
Career Opportunities - Food Security Cluster Coordinator - WFP
4 pages
HW1 - Q1 - Bhargavi: Sunday, January 29, 2023 1:54 AM
No ratings yet
HW1 - Q1 - Bhargavi: Sunday, January 29, 2023 1:54 AM
7 pages
1.develop A Program To Draw A Line Using Bresenham's Line Drawing Technique
No ratings yet
1.develop A Program To Draw A Line Using Bresenham's Line Drawing Technique
1 page
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
Applied Linear Algebra: Core Principles
From Everand
Applied Linear Algebra: Core Principles
Kartikeya Dutta
No ratings yet

Everything You Need To Know About Linear Regression

Uploaded by

Everything You Need To Know About Linear Regression

Uploaded by

Home Beginner Everything you need to Know about Linear Regression!

Simple Linear Regression

Cost Function for Linear Regression

Gradient Descent for Linear Regression

Why Linear Regression is Important?

Evaluation Metrics for Linear Regression

Root Mean Squared Error

Assumptions of Linear Regression

3. Normal distribution of residuals: The mean of residuals should follow a

Assessing the Model Fit

Multiple Linear Regression

Considerations of Multiple Linear

refers to the ith variable which is being represented as a linear combination of

Overfitting and Underfitting in Linear

Hands-on Coding: Linear Regression Model

TV Radio Newspaper Sales

The last line of code helps in suppressing the unnecessary warnings.

#to plot all the scatterplots in a single plot

#To plot heatmap to find out correlations

importing statmodels library to perform linear regression

To fit the model, write the below code,

This is how we can perform the simple linear regression.

Beginner Guide Linear Regression Machine Learning R

Frequently Asked Questions

Q2. What is the formula for a linear regression line?

Q3. What is the application of linear regression?

Q4. What is a basic example of linear regression?

Reach a Global Audience

Sion Chakrabarti CHIRAG GOYAL

Company Discover Learn

Terms & conditions Refund Policy Privacy Policy Cookies Policy ©

You might also like