0% found this document useful (0 votes)
11 views

5_AML Lecture 5_Linear regression

The document provides an overview of regression analysis, focusing on linear and polynomial regression, including their definitions, formulas, assumptions, and evaluation metrics. It explains the concepts of simple and multiple linear regression, the need for polynomial regression in non-linear datasets, and the methods for calculating regression coefficients. Additionally, it discusses the key benefits of linear regression and the assumptions required for accurate modeling.

Uploaded by

hetvibhora192
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

5_AML Lecture 5_Linear regression

The document provides an overview of regression analysis, focusing on linear and polynomial regression, including their definitions, formulas, assumptions, and evaluation metrics. It explains the concepts of simple and multiple linear regression, the need for polynomial regression in non-linear datasets, and the methods for calculating regression coefficients. Additionally, it discusses the key benefits of linear regression and the assumptions required for accurate modeling.

Uploaded by

hetvibhora192
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Regression

Read this

For linear reg : Basics + formula + graph with


line+ assumptions (nilh)+ calculation with
x=CGPA , Y= package, 2 ways to find sol – OLS,
Gradient descent + key benefits –ioes + eval
metries- MAE,MSE, RMSE, R2, Adjusted R2
Read this

For polynomial reg : Basics + formula + graph


with line and curve + accuracy , nonliner
dataset, need of it pts(2) , 2 ways to find sol –
OLS, Gradient descent + assumptions (nilh), key
benefits –ioes + eval metries- MAE,MSE, RMSE,
R2, Adjusted R2
• The term regression is used when you try to find the relationship
between variables.
• In Machine Learning, and in statistical modeling, that relationship is
used to predict the outcome of future events.
What Is Linear Regression?
• Linear regression is an algorithm that provides a linear relationship
between an independent variable and a dependent variable to predict the
outcome of future events.
• It is a statistical method used in data science and machine learning for
predictive analysis.
• The independent variable is also the predictor or explanatory variable that
remains unchanged due to the change in other variables.
• However, the dependent variable changes with fluctuations in the
independent variable.
• The regression model predicts the value of the dependent variable, which is
the response or outcome variable being analyzed or studied.
• Thus, linear regression is a supervised learning algorithm that
simulates a mathematical relationship between variables and makes
predictions for continuous or numeric variables such as sales, salary,
age, product price, etc.
• This analysis method is advantageous when at least two variables are
available in the data, as observed in stock market forecasting,
portfolio management, scientific analysis, etc.
Examples
1. Let us say we want to have a system that can predict the price of a used car. Inputs are thecar attributes
âA˘T brand, year, engine capacity, mileage, and other information â ˇ A˘T that we ˇ
believe affect a car’s worth. The output is the price of the car.

2. Consider the navigation of a mobile robot, say an autonomous car. The output is the angle by
which the steering wheel should be turned at each time, to advance without hitting obstacles
and deviating from the route. Inputs are provided by sensors on the car like a video camera,
GPS, and so forth.

3. In finance, the capital asset pricing model uses regression for analyzing and quantifying the
systematic risk of an investment.

4. In economics, regression is the predominant empirical tool. For example, it is used to predict
consumption spending, inventory investment, purchases of a country’s exports, spending on
imports, labor demand, and labor supply
Types of linear regression
The different regression models are defined based on type of functions used to represent the relation
between the dependent variable y and the independent variables.
• Simple Linear Regression
This is the simplest form of linear regression, and it involves only one independent variable and one
dependent variable. The equation for simple linear regression is:
y=β0+β1Xy=β0​+β1​X
where:
Y is the dependent variable
X is the independent variable
β0 is the intercept
β1 is the slope
Example: The relationship between pollution levels and rising temperatures.
The value of the dependent variable is based on the value of the independent variable.
Multiple Linear Regression
• This involves more than one independent variable and one dependent variable. The equation for
multiple linear regression is:
y=β0+β1X1+β2X2+………βnXn

where:
• Y is the dependent variable
• X1, X2, …, Xn are the independent variables
• β0 is the intercept
• β1, β2, …, βn are the slopes
Example: Consider the task of calculating blood pressure. In this case, height, weight, and amount of
exercise can be considered independent variables. Here, we can use multiple linear regression to
analyze the relationship between the three independent variables and one dependent variable, as all
the variables considered are quantitative.
Polynomial Regression
• If your data points clearly will not fit a linear regression (a straight line
through all data points), it might be ideal for polynomial regression.
• Polynomial regression, like linear regression, uses the relationship
between the variables x and y to find the best way to draw a line
through the data points.
Read this highlighted parts from here till pg 21
• The Polynomial Regression equation is given below:

• It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.
• It is a linear model with some modification in order to increase the accuracy. A
• The dataset used in Polynomial regression for training is of non-linear nature.D
• It makes use of a linear regression model to fit the complicated and non-linear functions
and datasets.
• Hence, "In Polynomial regression, the original features are converted into Polynomial
features of required degree (2,3,..,n) and then modeled using a linear model."
Need for Polynomial Regression:
The need of Polynomial Regression in ML can be understood in the below
points:
• If we apply a linear model on a linear dataset, then it provides us a good
result as we have seen in Simple Linear Regression, but if we apply the
same model without any modification on a non-linear dataset, then it
will produce a drastic output. Due to which loss function will increase,
the error rate will be high, and accuracy will be decreased.
• So for such cases, where data points are arranged in a non-linear
fashion, we need the Polynomial Regression model. We can understand
it in a better way using the below comparison diagram of the linear
dataset and non-linear dataset.
In the above image, we have taken a dataset which is arranged non-linearly. So if we try to cover it with a
linear model, then we can clearly see that it hardly covers any data point. On the other hand, a curve is
suitable to cover most of the data points, which is of the Polynomial model.

Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial Regression
model instead of Simple Linear Regression.
When we compare the above three equations, we can clearly see that all three equations are Polynomial
equations but differ by the degree of variables.
The Simple and Multiple Linear equations are also Polynomial equations with a single degree, and the Polynomial
regression equation is Linear equation with the nth degree.
So if we add a degree to our linear equations, then it will be converted into Polynomial Linear equations.
Linear Regression
• If we have data of college student CGPA and LPA salary after
placement of the student as input feature..
• That means it should predict salary of student if we will provide
his/her CGPA.
• How to approach this problem.
• Suppose I have no CGPA and if someone ask me that what will be the
package of students.
• Then we will simply take average of package of students we will tell
the answer. This is very bad guess.
• But if we have data, then we will plot this data
Data is sort of linear not completely linear. A completely linear data looks like this
• This is a real world data with some excepts as seen in graph.
• Those errors which cant be quantified are called
Stochastic errors.
If the data is completely linear, we will draw line and
write line equation
y=mx + b where x is cgpa and y is salary
But our data is sort of linear
So we will draw a line called Best fit line
• Our primary objective while using linear regression is to locate the best-fit line, which is line that will
pass closest to all the data points in the data set. which implies that the error between the predicted and
actual values should be kept to a minimum. There will be the least error in the best-fit line.
• The best Fit Line equation provides a straight line that represents the relationship between the
dependent and independent variables. The slope of the line indicates how much the dependent variable
changes for a unit change in the independent variable(s).
In the above figure,

X-axis = Independent variable

Y-axis = Output / dependent variable

Line of regression = Best fit line for a model

Here, a line is plotted for the given data points that suitably fit all the issues. Hence, it is called
the ‘best fit line.’ The goal of the linear regression algorithm is to find this best fit line seen in
the above figure.
It is called best fit line because it is doing minimum error.
• So linear regression will draw a line on sort of linear data which pass
closest to all the data points.
• That means it will find that value of m(slope) and b (intercept) which
will do minimum error on the data points
Human intution
• So mathematically we understood that linear regression is drawing a best
fit line by finding the value of m and b in straight line equation
• Y= mx +c
• Package= m * CGPA +B
• Here m can be said as weights that means the way m value will
increase ,the impact of CGPA on package will also increase.
• To understand B lets put b to 0
• Y= mx
• Lets say we have x as experience in years and y is package ,so as per above
equation,if experience is 0 then package will also be 0 which is not correct
• To find value of m and b we can have 2 methods
1. Closed form solution where we can use direct mathematical
formulas to find value called OLS (Ordinary Least Square) and Scikit
learn library also use this method in its linear regression class.
2. Non Closed form solution where we have to use approximations
using integration and differentiation. The technique we use here is
called Gradient Descent.

When we work on higher dimensions then it is difficult to do


calculations by using formula, so we use approximations.
In sgdregressor class ,gradient descent is implemented.
In short : pg 23 to 35 calculations 
• We can calculate b using formula
• Where y is package and x is cgpa

• X bar and Y bar are mean values of X and Y


• Xi and Yi are current row ‘s CGPA and Package value
• So after calculating m we will calculate b
• So linear regression class of scikit learn is using these formulas and giving answers
How these formulas are coming
• Lets assume that our data points are sort of linear and we have to
draw a best fit line on them i.e we have to find values of m and b
• We know that best fit line is that line which is having minimum
distance between point and line.

If distance is d1 and d2 and so on then total error will be


d1+d2+d3……dn
• We have squared as the points are below and above the line so we
have to squared them.
• We cannot take modulus because we have to penalize the points
which are outliers .Also at one point we have to differentiate the
expression and we cannot do this for
modulus expression as it passes through 0.
• While square function can be differentiable at any point
• In machine learning language ,it is also called LOSS FUNCTION

• So earlier we were saying that we want a line segment which pass


closest to all data points.
• But now we can say that we need that value of m and b which
minimize the value of error function.
• For every point d can be represented as
• Where Yi is actual value and ŷi is predicted value
• So total error =

• Average error =

• So we need to find that value of m,b for which error should be minimum.but
where is m,b in this formula?
• So

• That means we need to find that value of m and b for which Error is
minimum.
• So E= f(m,b)
• That means Error will be impacted both with m and b.lets see this
impact separately
• E=f(m) b=0 (intercept is 0 so line will pas through origin)

• Here we will see the impact of moving m(slope ) on error (E)


• B=0
• Suppose if m is 10 then E is 50,Now we have moved line so error will
get reduced,if we move line more then again error will increase.

• That means when we are changing m,then


• E initially max then reduced and then icrease
• As shown in figure
Now if m is constant(slope is 1
suppose)
• Then equation will change to

• Here m is constant but we are changing b,


• For certain value of m and b,error will be maximum and for some value it
will be minimum

• So we need to find that value of m and b where error value will be


minimum
• At minimum point,slope will be 0
• And to find out slope the formula is to find minima that means find
derivative and equate it to 0
Divide by -2 on both sides
Key benefits of linear regression-
ioes
1. Easy implementation
The linear regression model is computationally simple to implement as it does not demand a lot of engineering overheads,
neither before the model launch nor during its maintenance.
2. Interpretability
Unlike other deep learning models (neural networks), linear regression is relatively straightforward. As a result, this
algorithm stands ahead of black-box models that fall short in justifying which input variable causes the output variable to
change.
3. Scalability
Linear regression is not computationally heavy and, therefore, fits well in cases where scaling is essential. For example, the
model can scale well regarding increased data volume (big data).

4. Optimal for online settings


The ease of computation of these algorithms allows them to be used in online settings. The model can be trained and
retrained with each new example to generate predictions in real-time, unlike the neural networks or support vector machines
that are computationally heavy and require plenty of computing resources and substantial waiting time to retrain on a new
dataset.
Assumptions of simple linear regression
-Linear
NILH regression is a powerful tool for understanding and predicting the behavior
of a variable, however, it needs to meet a few conditions in order to be accurate and
dependable solutions.

• Linearity: The independent and dependent variables have a linear relationship


with one another. This implies that changes in the dependent variable follow those
in the independent variable(s) in a linear fashion. This means that there should be
a straight line that can be drawn through the data points. If the relationship is not
linear, then linear regression will not be an accurate model.
• Independence: The observations in the dataset are independent of each
other. This means that the value of the dependent variable for one
observation does not depend on the value of the dependent variable for
another observation. If the observations are not independent, then
linear regression will not be an accurate model.
• Homoscedasticity: Across all levels of the independent variable(s),
the variance of the errors is constant. This indicates that the amount
of the independent variable(s) has no impact on the variance of the
errors. If the variance of the residuals is not constant, then linear
regression will not be an accurate model.
• Normality: The residuals should be normally distributed. This means
that the residuals should follow a bell-shaped curve. If the residuals
are not normally distributed, then linear regression will not be an
accurate model.
Evaluation Metrics for Linear Regression
A variety of evaluation measures can be used to determine the strength
of any linear regression model. These assessment metrics often give an
indication of how well the model is producing the observed outputs.
MEAN ABSOLUTE ERROR
• Total absolute error=

• For mean absolute error ,we will divide it by n


Advantage: ouliers robust
• The number we get here is mean absolute error is loss,which we
should get as less as possible.
• This number is exactly in units of y that means if x is CGPA and Y is
package in LPA,so MAE here will also come in LPA(unit of y).So
prediction becomes very easy.Like if MAE is coming 1.5 that means it
is 1.5 LPA and we can think of reducing it further.
• It is robust for outliers.
• DISADVANTAGES – non differentiable
• Here modulus function we are using and graph of modulus function is
not differentiable at 0.So MSE come
Mean Squared Error (MAE)
• Geometrically if we want to see what MSE is showing

• So it is trying to add all these squares area and trying to minimize it.
• Advantage
• We can use it as loss function as it is differentiable
• Disadvantage
• When the number comes then lets say if x is cgpa and y is package is
LPA then MSE will be in LPA square
• It penalize outliers a lot so not robust for outliers
Root Mean Squared Error (RMSE)
• The square root of the residuals’ variance is the Root Mean Squared
Error. It describes how well the observed data points match the expected
values, or the model’s absolute fit to the data.

Instead of mean, residuals ( y


actual – y predicted) is
there!!!

• MSE,MAE depends on context like 1.5 lpa error is fine but 1.5 degree
error in self driven car is not acceptable
R2 score
• Lets suppose we have dataset of CGPA and package
• Suppose we have cgpa and we have to find package,then we will use
average value(mean value)

• But we have CGPA and we will draw best fit line.

• R2 score will tell us how good your model is as compared to your mean
IMP

So in numerator ,it is best what we have done and denominator it is worst that can happen
R-Squared is a statistic that indicates how much variation the developed model can explain or
capture. It is always in the range of 0 to 1.
In general, the better the model matches the data, the greater the R-squared number.
Interpretation of R2 square
• If for CGPA and Package data if r2 score is 80%
• It means your cgpa which is input column is able to explain 80 % of
variance in the output column.

• Suppose we have one more data of IQ and CGPA as input and Package
as output,R2 score is 90 that means 90 % of variation in package
column justification is given by CGPA and IQ column while for 10 %
variation we have no justification
R2 score flaw
• As we increase more and more columns in input columns then R2
score start increasing.
• Because it is assumed that it is giving more justification in variance of
output.
• Bt problem comes when suppose we add one more column say
temperature along with IQ and CGPA.
• This column is totally irrelevant. So ideally R2 score should reduce but
it will either remain constant or increase.
• To solve this problem comes adjusted R2 score
Adjusted R-squared score
• Adjusted R-square accounts the number of predictors in the model and
penalizes the model for including irrelevant predictors that don’t
contribute significantly to explain the variance in the dependent variables.
• Mathematically, adjusted R2 is expressed as:

Predictor means num of inputs


Case 1:When we are adding a irrelevant column in input features
(temperature)
Then k will increase,so denominator term will decrease.
(n-1)term in numerator will be constant
Now since the column is irrelevant so R2 score will either be same or
increase slightly.
So we can assume that numerator is almost constant but as shown
above that denominator term is decreased so overall term will increase
Hence 1- term =decrease
That means adjusted R2 score will be decrease
• Adjusted R-square helps to prevent overfitting. It penalizes the model
with additional predictors that do not contribute significantly to
explain the variance in the dependent variable.
Case 2:When we are adding a relevant column in input features(IQ)
Then k will increase,so denominator term will decrease.
(n-1)term in numerator will be constant
Now since the column is relevant so R2 score will increase so (1- R2)
will decrease
So we can say that numerator term will decrease speedly as compared
to denominator .So overall term will decrease
Hence 1- term =increase
• https://www.geeksforgeeks.org/ml-linear-regression/
• https://www.javatpoint.com/linear-regression-in-machine-learning
• https://www.spiceworks.com/tech/artificial-intelligence/articles/
what-is-linear-regression/

You might also like