0% found this document useful (0 votes)
29 views49 pages

Linear Regression

The document provides an overview of linear regression, explaining its definition, the relationship between dependent and independent variables, and the difference between simple and multivariate linear regression. It discusses the process of building linear regression models in Python, including the use of least squares regression to minimize errors and various metrics for evaluating model accuracy such as Mean Squared Error (MSE) and R-squared. Additionally, it includes examples and references for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views49 pages

Linear Regression

The document provides an overview of linear regression, explaining its definition, the relationship between dependent and independent variables, and the difference between simple and multivariate linear regression. It discusses the process of building linear regression models in Python, including the use of least squares regression to minimize errors and various metrics for evaluating model accuracy such as Mean Squared Error (MSE) and R-squared. Additionally, it includes examples and references for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Linear Regression

Roadmap
• How linear regression works?

• Common pitfalls to avoid while building linear


regression

• How to build linear regression in Python


Linear Regression

• Linear: Arranged in or extending along a straight or nearly


straight line, as in “linear movement.”

• Regression: A technique for determining the statistical


relationship between two or more variables where a change
in one variable is caused by a change in another variable

• Linear regression as a relationship between two variables


where an increase in one variable impacts another variable
to increase or decrease proportionately
Linear Regression

• Linear regression helps in interpolating the value of an unknown


variable (a continuous variable) based on a known value
An application of it could be, “What is the demand for a product as the price of the
product is varied?” In this application, we would have to look at the demand based on
historical prices and make an estimate of demand given a new price point.

Given that we are looking at history in order to estimate a new price point, it
becomes a regression problem

The fact that price and demand are linearly related to each other (the higher the price,
the lower the demand and vice versa) makes it a linear regression problem.
Variables: Dependent
and Independent

• A dependent variable (response, outcome)is the value that we are


predicting for, and an independent variable is the variable that we are
using to predict a dependent variable

➢ For example, temperature is directly proportional to the number of ice creams


purchased
➢ As temperature increases, the number of ice creams purchased would also
increase
➢ Here temperature is the independent variable, and based on it the number of ice
creams purchased (the dependent variable) is predicted
Simple vs. Multivariate Linear
Regression
• A dependent variable is not influenced by just one variable but by a
multitude of variables

• For example, ice cream sales are influenced by temperature, but they
are also influenced by the price at which ice cream is being sold, along
with other factors such as location, ice cream brand, and so on

• In the case of multivariate linear regression, some of the variables will


be positively correlated with the dependent variable and some will be
negatively correlated with it
Formalizing Simple Linear
Regression
• A simple linear regression is
represented as:
Y = a + b*X
• Y is the dependent variable that
we are predicting for.
• X is the independent variable.
• a is the bias term.
• b is the slope of the variable (the
weight assigned to the
independent variable).

The baby’s weight starts at 3 (a, the bias) and increases linearly by 0.75 (b, the slope) every month.
Note that, a bias term is the value of the dependent variable when all the independent variables are 0
Formalizing Simple Linear
Regression
• The slope of a line is the difference
• A simple linear regression is represented between the x and y coordinates at
as: both extremes of the line upon the
length of line
Y = a + b*X
• Y is the dependent variable that we are
predicting for.
• X is the independent variable.
• a is the bias term.
• b is the slope of the variable (the
weight assigned to the independent
variable).
Example
• Let’s take a real world example of the price of agricultural products
and how it varies based on the location its sold. The price will be low
when bought directly from farmers and high when brought from the
downtown area.
• Given this dataset, we can predict the price of the product in
intermediate locations.

When a dataset is used for


predictions, it’s also called as
Training Data Set
Example
• Problem Statement :Given this
dataset, predict the price of
agricultural product, if it’s sold in
intermediate locations between
farmers house and city downtown
• Training DataSet
• Aim is to come with a straight line
which minimizes the error between
training data and our prediction
model when we draw the line using
the equation of straight line.
Example – Finding y = mx + b

• Equation of Straight Line (y = mx + b)


• let’s consider farmers home and price as
starting point and city downtown as
ending point.
➢ (x1,y1) = (1, 4)
➢ (x2,y2) = (5, 80)
• The first step is to come up with a formula
in the form of y = mx + b where x is a
known value and y is the predicted value.

• To calculate the Prediction y for any Input


value x we have two unknowns, the m =
slope(Gradient) and b = y-intercept(also
called bias)
Slope (m = Change in y/ Change in x)
Example - Finding y = mx + b

• Equation of Straight Line (y = mx + b)


• let’s consider farmers home and price as
starting point and city downtown as
ending point.
➢ (x1,y1) = (1, 4)
➢ (x2,y2) = (5, 80)
• The first step is to come up with a formula
in the form of y = mx + b where x is a
known value and y is the predicted value.

• To calculate the Prediction y for any Input


value x we have two unknowns, the m =
slope(Gradient) and b = y-intercept(also The y-intercept / bias shall be calculated
called bias) using the formula y-y1 = m(x-x1)
Example – Verifying y = mx + b

Predicting Y values for unknown X values


Example : Actual Line Vs Projected
Straight Line
Minimizing the Error

• How do we determine the best fit


line?

➢The line for which the


the error between the predicted
values and the observed values is
minimum is called the best fit line or
the regression line.

➢These errors are also called


as residuals. The residuals can be
visualized by the vertical lines from
the observed data value to the
regression line.
Minimizing the Error

• Least Square Regression is a method which minimizes the error


in such a way that the sum of all square error is minimized.
Least Square Regression Example
Least Square Regression Example
Least Square Regression Example
Least Square Regression Example

Overall formula can now be written in the form of y = mx + b as


Using Least Square Regression on X,Y values
Why this method is called Least Square Regression ?
Why this method is called Least Square
Regression ?
Solve it
• Compute
Goodness of Fit / Loss Functions
Mean Squared Error (MSE)

• MSE is calculated by taking the average of the square of the


difference between the original and predicted values of the
data.
Root Mean Square Error (RMSE)

• Root Mean Square Error (RMSE) is


the standard deviation of
the residuals (prediction errors). P = forecasts (expected values or unknown

• Residuals are a measure of how far results)


from the regression line data points O = observed values (known results).
are; RMSE is a measure of how
spread out these residuals are. Find the RMSE by:
• Squaring the residuals.
• In other words, it tells you how • Finding the average of the residuals.
concentrated the data is around • Taking the square root of the result.
the line of best fit.
• Root mean square error is commonly RMSE
used in climatology, forecasting, Measures vertically spread around the
and regression analysis to verify regression line in absolute y-terms.
experimental results
MAE

• The Mean Absolute Error (MAE) is a very good KPI to measure forecast
accuracy.
• Error basically is the absolute difference between the actual or true values
and the values that are predicted.
• Absolute difference means that if the result has a negative sign, it is ignored.
• Hence, MAE = True values – Predicted values
• MAE takes the average of this error from every sample in a dataset and gives
the output.
MAE Vs RMSE
 Error is given in terms of the  RMSE is more sensitive to
value you are predicting for outliers

 RMSE penalises large errors


 The lower the value the more more than MAE due to the
accurate the model is fact that errors are squared
initially

 The resulting values can be  MAE returns values that are


between 0 and infinity more interpretable as it is
simply the average of
absolute error
MAE Vs RMSE

RMSE is higher as there are outliers in the dataset which


increase the error
R2

• R-squared is a statistical measure of how close the data are to the


fitted regression line. ( R (Regular) - Correlation)
• It is also known as the coefficient of determination, or the coefficient
of multiple determination for multiple regression.
• The definition of R-squared; it is the percentage of the response
variable variation that is explained by a linear model. Or:

In general, the higher the R-squared, the better the model fits your
data.
R2
R2
R2
R2
R2
R2
Linear Regression-- R2
Formula for R Squared
Example 1

Calculate the correlation


coefficient for the following data.

X = 4, 8 ,12, 16 and

Y = 5, 10, 15, 20.


Example 1
Example 1
Interpretation of the R^2 value
R Squared Vs RMSE
R Squared Vs RMSE
References

• https://www.mathsisfun.com/data/least-squares-regression.html

• https://towardsdatascience.com/implementation-linear-regression-in-
python-in-5-minutes-from-scratch-f111c8cc5c99

• https://towardsdatascience.com/mathematics-for-machine-learning-li
near-regression-least-square-regression-de09cf53757c

You might also like