0% found this document useful (0 votes)
28 views25 pages

Regression Modelling

Uploaded by

Rohan Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views25 pages

Regression Modelling

Uploaded by

Rohan Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Introduction

 Regression analysis is a form of predictive modelling technique which


investigates the relationship between a dependent (target) and independent
variable (s) (predictor). This technique is used for forecasting, time series
modelling and finding the casual effect analysis between the variables. For
example, relationship between rash driving and number of road accidents by
a driver is best studied through regression.

 Regression analysis is an important tool for modelling and analyzing


data. Here, we fit a curve / line to the data points, in such a manner that the
differences between the distances of data points from the curve or line is
minimized. I’ll explain this in more details in coming sections.
Need of Regression Analysis
 As mentioned above, regression analysis estimates the relationship between
two or more variables. Let’s understand this with an easy example:
 Let’s say, you want to estimate growth in sales of a company based on current
economic conditions. You have the recent company data which indicates that
the growth in sales is around two and a half times the growth in the economy.
Using this insight, we can predict future sales of the company based on current
& past information.
 There are multiple benefits of using regression analysis. They are as follows:
 It indicates the significant relationships between dependent variable and
independent variable.
 It indicates the strength of impact of multiple independent variables on a
dependent variable.
 Regression analysis also allows us to compare the effects of variables measured
on different scales, such as the effect of price changes and the number of
promotional activities. These benefits help market researchers / data analysts /
data scientists to eliminate and evaluate the best set of variables to be used
for building predictive models.
Types of Regression

Regression Analysis Linear


Regression
Multiple
Regression
Logistic
Regression
Types of Regression
 Simple linear regression
 One dependent variable (interval or ratio)
 One independent variable (interval or ratio or dichotomous)
 Multiple linear regression
 One dependent variable (interval or ratio)
 Two or more independent variables (interval or ratio or dichotomous)
 Logistic regression
 One dependent variable (binary)
 Two or more independent variable(s) (interval or ratio or dichotomous)
 Ordinal regression
 One dependent variable (ordinal)
 One or more independent variable(s) (nominal or dichotomous)
 Multinomial regression
 One dependent variable (nominal)
 One or more independent variable(s) (interval or ratio or dichotomous)
 Discriminant analysis
 One dependent variable (nominal)
 One or more independent variable(s) (interval or ratio)
Linear Regression
 Linear regression is a quiet and the simplest statistical regression method used for
predictive analysis in machine learning. Linear regression shows the linear
relationship between the independent(predictor) variable i.e. X-axis and the
dependent(output) variable i.e. Y-axis, called linear regression. If there is a single
input variable X(dependent variable), such linear regression is called simple
linear regression.

The above graph presents the linear relationship between the output(y) variable
and predictor(X) variables. The blue line is referred to as the best fit straight
line. Based on the given data points, we attempt to plot a line that fits the points
the best.
Linear Regression
 Calculate best-fit line linear regression uses a traditional slope-intercept form which is
given below,
 Yi = β0 + β1Xi
 Where Yi = Dependent variable to the given value of Independent variable.
 β0 = constant/Intercept the predicted value of y when x is 0.
 β1 = Slope or regression coefficient (how much we except y to change as x increase).
 Xi = Independent variable (The variable we expect influencing the dependent variable
y).
 This algorithm explains the linear relationship between the dependent(output) variable
y and the independent(predictor) variable X using a straight line.

 But how the linear regression finds out which is the best fit line?

The goal of the linear regression algorithm is to get the best values for B0 and B1 to find
the best fit line. The best fit line is a line that has the least error which means the error
between predicted values and actual values should be minimum.
 You can use the simple linear regression when you want to know:
1. How strong the relationship between two variables.
2. The value of dependent variable at a certain value of independent variable.
Assumptions Linear Regression
1. Homogeneity of Variance: The size of the error in our prediction doesn’t change
significantly across the value of independent variable.
2. Independence of observations: the observations in the dataset were collected using
statistically valid sampling methods, and there is no hidden relationships among
variables.
3. Normality: The data follows a normal distribution.

Simple LR makes one additional assumption.


1. The relationship between the independent and dependent variable is linear. The line
of the best fit through the data point is a straight line rather than a curve.
Calculate R2
 To check the goodness of fit, R2 is calculated.
 R-Squarred value is the statistical measure of how close the data
are to the fitted regression line.
 It determines that x and y are correlated or not. Large value shows
the better model.
 For this calculate:
 Distance of actual-mean
 Distance of predicted-mean

 ( y  y)
2
p
 This is R2=
 ( y  y)
2
Linear Regression Solved Numerical
Linear Regression Solved Numerical
Linear Regression Solved Numerical
Linear Regression Solved Numerical
Multiple Linear Regression
 Multiple linear regression (MLR), also known simply as multiple
regression, is a statistical technique that uses several explanatory variables
to predict the outcome of a response variable. The goal of multiple linear
regression is to model the linear relationship between the explanatory
(independent) variables and response (dependent) variables.

 yi​=β0​+β1xi1​+β2xi2​+...+βp​xip​+ϵ
 where, for i=n observations:
 yi​=dependent variable
 xi​=explanatory variables Independent Variable
 β0​=y intercept (constant term)
 βp​=slope coefficients for each explanatory variable
 ϵ=the model’s error term (also known as the residuals)
 As the number of independent variable increases to 2 our graph become
3D. The added 3rd dimension represents other independent variable.
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression
Numerical
Multiple Linear Regression Numerical
Logistic Regression
 Logistic Regression is a “Supervised machine learning” algorithm that can be
used to model the probability of a certain class or event. It is used when the data is
linearly separable and the outcome is binary in nature.
 That means Logistic regression is usually used for Binary classification problems.

 Logistic Regression =

 Binary Classification refers to predicting the output variable that is discrete


in two classes. A few examples of Binary classification are Yes/No, Pass/Fail,
Win/Lose, Cancerous/Non-cancerous, etc.
Logistic Regression
 Logistic regression is one of the most popular Machine learning algorithm that
comes under Supervised Learning techniques.
 It can be used for Classification as well as for Regression problems, but mainly
used for Classification problems.
 Logistic regression is used to predict the categorical dependent variable with the
help of independent variables.
 The output of Logistic Regression problem can be only between the 0 and 1.
 Logistic regression can be used where the probabilities between two classes is
required. Such as whether it will rain today or not, either 0 or 1, true or false etc.
 Logistic regression is based on the concept of Maximum Likelihood estimation.
According to this estimation, the observed data should be most probable.
 In logistic regression, we pass the weighted sum of inputs through an activation
function that can map values in between 0 and 1. Such activation function is known
as sigmoid function and the curve obtained is called as sigmoid curve or S-curve.
Difference between Linear Regression
and Logistic Regression
Linear Regression Logistic Regression
Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.
Linear Regression is used for solving Regression Logistic regression is used for solving Classification
problem. problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.
In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve by which
which we can easily predict the output. we can classify the samples.
Least square estimation method is used for estimation Maximum likelihood estimation method is used for
of accuracy. estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship In Logistic regression, it is not required to have the
between dependent variable and independent variable linear relationship between the dependent and
must be linear. independent variable.
In linear regression, there may be collinearity In logistic regression, there should not be collinearity
between the independent variables. between the independent variable.

You might also like