0% found this document useful (0 votes)

72 views14 pages

Regression Techniques

Uploaded by

Shivansh Ghelani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views14 pages

Regression Techniques

Uploaded by

Shivansh Ghelani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Connect, Message, Like,Follow&Share, 100% Free Counselling➔ThankYou

7 Regression Techniques
Linear and Logistic regressions are usually the first
algorithms people learn in data science. Due to their
popularity, a lot of analysts even end up thinking that
they are the only form of regressions. The ones who are
slightly more involved think that they are the most
important among all forms of regression analysis. The
truth is that there are innumerable forms of regressions,
which can be performed. Each form has its own
importance and a specific condition where they are best
suited to apply.

Contents
1. What is Regression Analysis?
2. Why do we use Regression Analysis?
3. What are the types of Regressions?
a. Linear Regression
b. Logistic Regression
c. Polynomial Regression
d. Stepwise Regression
e. Ridge Regression
f. Lasso Regression
g. ElasticNet Regression
4. How to select the right Regression Model?
What is Regression Analysis?
Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent (target) and independent
variable (s) (predictor). This technique is used for forecasting, time series
modelling and finding the causal effect relationship between the variables. For
example, relationship between rash driving and number of road accidents by a
driver is best studied through regression.

Regression analysis is an important tool for modelling and analyzing data. Here,
we fit a curve / line to the data points, in such a manner that the differences
between the distances of data points from the curve or line is minimized. I’ll
explain this in more details in coming sections.
Why do we use Regression Analysis?
As mentioned above, regression analysis estimates the relationship between two
or more variables. Let’s understand this with an easy example:

Let’s say, you want to estimate growth in sales of a company based on current
economic conditions. You have the recent company data which indicates that
the growth in sales is around two and a half times the growth in the economy.
Using this insight, we can predict future sales of the company based on current
& past information.

There are multiple benefits of using regression analysis. They are as follows:

1. It indicates the significant relationships between dependent variable and

independent variable.
2. It indicates the strength of impact of multiple independent variables on a
dependent variable.

Regression analysis also allows us to compare the effects of variables measured

on different scales, such as the effect of price changes and the number of
promotional activities. These benefits help market researchers / data analysts /
data scientists to eliminate and evaluate the best set of variables to be used
for building predictive models

How many types of regression techniques do

we have?
There are various kinds of regression techniques available to make predictions.
These techniques are mostly driven by three metrics (number of independent
variables, type of dependent variables and shape of regression line). We’ll
discuss them in detail in the following sections.

For the creative ones, you can even cook up new regressions, if you feel the
need to use a combination of the parameters above, which people haven’t used
before. But before you start that, let us understand the most commonly used
regressions:

1. Linear Regression

It is one of the most widely known modeling technique. Linear regression is

usually among the first few topics which people pick while learning predictive
modeling. In this technique, the dependent variable is continuous, independent
variable(s) can be continuous or discrete, and nature of regression line is linear.

Linear Regression establishes a relationship between dependent variable

(Y) and one or more independent variables (X) using a best fit straight
line (also known as regression line).
It is represented by an equation Y=a+b*X + e, where a is intercept, b is slope of
the line and e is error term. This equation can be used to predict the value of
target variable based on given predictor variable(s).

The difference between simple linear regression and multiple linear regression
is that, multiple linear regression has (>1) independent variables, whereas
simple linear regression has only 1 independent variable. Now, the question is
“How do we obtain best fit line?”.

How to obtain best fit line (Value of a and b)?

This task can be easily accomplished by Least Square Method. It is the most
common method used for fitting a regression line. It calculates the best-fit line
for the observed data by minimizing the sum of the squares of the vertical
deviations from each data point to the line. Because the deviations are first
squared, when added, there is no cancelling out between positive and negative
values.
We can evaluate the model performance using the metric R-square.

Important Points:

• There must be linear relationship between independent and dependent

variables
• Multiple regression suffers from multicollinearity, autocorrelation,
heteroskedasticity.
• Linear Regression is very sensitive to Outliers. It can terribly affect the
regression line and eventually the forecasted values.
• Multicollinearity can increase the variance of the coefficient estimates and make
the estimates very sensitive to minor changes in the model. The result is that the
coefficient estimates are unstable
• In case of multiple independent variables, we can go with forward
selection, backward elimination and step wise approach for selection of most
significant independent variables.
2. Logistic Regression

Logistic regression is used to find the probability of event=Success and

event=Failure. We should use logistic regression when the dependent variable is
binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0
to 1 and it can represented by following equation.

odds= p/ (1-p) = probability of event occurrence / probability of not event

occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

Above, p is the probability of presence of the characteristic of interest. A

question that you should ask here is “why have we used log in the equation?”.

Since we are working here with a binomial distribution (dependent variable), we

need to choose a link function which is best suited for this distribution. And, it
is logit function. In the equation above, the parameters are chosen to maximize
the likelihood of observing the sample values rather than minimizing the sum of
squared errors (like in ordinary regression).
Important Points:

• Logistic regression is widely used for classification problems

• Logistic regression doesn’t require linear relationship between dependent and
independent variables. It can handle various types of relationships because it
applies a non-linear log transformation to the predicted odds ratio
• To avoid over fitting and under fitting, we should include all significant
variables. A good approach to ensure this practice is to use a step wise method
to estimate the logistic regression
• It requires large sample sizes because maximum likelihood estimates are less
powerful at low sample sizes than ordinary least square
• The independent variables should not be correlated with each other i.e. no multi
collinearity. However, we have the options to include interaction effects of
categorical variables in the analysis and in the model.
• If the values of dependent variable is ordinal, then it is called as Ordinal
logistic regression
• If dependent variable is multi class then it is known as Multinomial Logistic
regression.

3. Polynomial Regression

A regression equation is a polynomial regression equation if the power of

independent variable is more than 1. The equation below represents a
polynomial equation:

y=a+b*x^2

In this regression technique, the best fit line is not a straight line. It is rather a
curve that fits into the data points.
Important Points:

• While there might be a temptation to fit a higher degree polynomial to get lower
error, this can result in over-fitting. Always plot the relationships to see the fit
and focus on making sure that the curve fits the nature of the problem. Here is
an example of how plotting can help:

• Especially look out for curve towards the ends and see whether those shapes
and trends make sense. Higher polynomials can end up producing wierd results
on extrapolation.

4. Stepwise Regression

This form of regression is used when we deal with multiple independent

variables. In this technique, the selection of independent variables is done with
the help of an automatic process, which involves no human intervention.

This feat is achieved by observing statistical values like R-square, t-stats and
AIC metric to discern significant variables. Stepwise regression basically fits
the regression model by adding/dropping co-variates one at a time based on a
specified criterion. Some of the most commonly used Stepwise regression
methods are listed below:

• Standard stepwise regression does two things. It adds and removes predictors as
needed for each step.
• Forward selection starts with most significant predictor in the model and adds
variable for each step.
• Backward elimination starts with all predictors in the model and removes the
least significant variable for each step.
The aim of this modeling technique is to maximize the prediction power with
minimum number of predictor variables. It is one of the method to
handle higher dimensionality of data set.

5. Ridge Regression

Ridge Regression is a technique used when the data suffers from

multicollinearity (independent variables are highly correlated). In
multicollinearity, even though the least squares estimates (OLS) are unbiased,
their variances are large which deviates the observed value far from the true
value. By adding a degree of bias to the regression estimates, ridge regression
reduces the standard errors.

Above, we saw the equation for linear regression. Remember? It can be

represented as:

y=a+ b*x

This equation also has an error term. The complete equation becomes:

y=a+b*x+e (error term), [error term is the value needed to correct for a
prediction error between the observed and predicted value]
=> y=a+y= a+ b1x1+ b2x2+....+e, for multiple independent variables.

In a linear equation, prediction errors can be decomposed into two sub

components. First is due to the biased and second is due to
the variance. Prediction error can occur due to any one of these two or both
components. Here, we’ll discuss about the error caused due to variance.
Ridge regression solves the multicollinearity problem through shrinkage
parameter λ (lambda). Look at the equation below.

In this equation, we have two components. First one is least square term and
other one is lambda of the summation of β2 (beta- square) where β is the
coefficient. This is added to least square term in order to shrink the parameter to
have a very low variance.

Important Points:

• The assumptions of this regression is same as least squared regression except

normality is not to be assumed
• Ridge regression shrinks the value of coefficients but doesn’t reaches zero,
which suggests no feature selection feature
• This is a regularization method and uses l2 regularization.

6. Lasso Regression

Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection

Operator) also penalizes the absolute size of the regression coefficients. In
addition, it is capable of reducing the variability and improving the accuracy of
linear regression models. Look at the equation below: Lasso regression differs
from ridge regression in a way that it uses absolute values in the penalty
function, instead of squares. This leads to penalizing (or equivalently
constraining the sum of the absolute values of the estimates) values which
causes some of the parameter estimates to turn out exactly zero. Larger the
penalty applied, further the estimates get shrunk towards absolute zero.
This results to variable selection out of given n variables.

Important Points:

• The assumptions of lasso regression is same as least squared regression except

normality is not to be assumed
• Lasso Regression shrinks coefficients to zero (exactly zero), which certainly
helps in feature selection
• Lasso is a regularization method and uses l1 regularization
• If group of predictors are highly correlated, lasso picks only one of them and
shrinks the others to zero

7. ElasticNet Regression

ElasticNet is hybrid of Lasso and Ridge Regression techniques. It is trained

with L1 and L2 prior as regularizer. Elastic-net is useful when there are multiple
features which are correlated. Lasso is likely to pick one of these at random,
while elastic-net is likely to pick both.

A practical advantage of trading-off between Lasso and Ridge is that, it allows

Elastic-Net to inherit some of Ridge’s stability under rotation.

Important Points:

• It encourages group effect in case of highly correlated variables

• There are no limitations on the number of selected variables
• It can suffer with double shrinkage

Beyond these 7 most commonly used regression techniques, you can also look
at other models like Bayesian, Ecological and Robust regression.

How to select the right regression model?

Life is usually simple, when you know only one or two techniques. One of the
training institutes I know of tells their students – if the outcome is continuous –
apply linear regression. If it is binary – use logistic regression! However, higher
the number of options available at our disposal, more difficult it becomes to
choose the right one. A similar case happens with regression models.

Within multiple types of regression models, it is important to choose the best

suited technique based on type of independent and dependent variables,
dimensionality in the data and other essential characteristics of the data. Below
are the key factors that you should practice to select the right regression model:

1. Data exploration is an inevitable part of building predictive model. It

should be you first step before selecting the right model like identify the
relationship and impact of variables
2. To compare the goodness of fit for different models, we can analyse
different metrics like statistical significance of parameters, R-square,
Adjusted r-square, AIC, BIC and error term. Another one is the Mallow’s
Cp criterion. This essentially checks for possible bias in your model, by
comparing the model with all possible submodels (or a careful selection
of them).
3. Cross-validation is the best way to evaluate models used for prediction.
Here you divide your data set into two group (train and validate). A
simple mean squared difference between the observed and predicted
values give you a measure for the prediction accuracy.
4. If your data set has multiple confounding variables, you should not
choose automatic model selection method because you do not want to put
these in a model at the same time.
5. It’ll also depend on your objective. It can occur that a less powerful
model is easy to implement as compared to a highly statistically
significant model.
6. Regression regularization methods(Lasso, Ridge and ElasticNet) works
well in case of high dimensionality and multicollinearity among the
variables in the data set.

15 Types of Regression You Should Know
No ratings yet
15 Types of Regression You Should Know
30 pages
Grid Independence Study
100% (1)
Grid Independence Study
6 pages
SEM:Confirmatory Factor Analysis (CFA)
No ratings yet
SEM:Confirmatory Factor Analysis (CFA)
28 pages
Real Statistics Examples Part 2
No ratings yet
Real Statistics Examples Part 2
1,110 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Lecture 3 - Time Series Analysis - Lecturer
100% (1)
Lecture 3 - Time Series Analysis - Lecturer
38 pages
Optimization in Rubber Industry by Maulik Chauhan
No ratings yet
Optimization in Rubber Industry by Maulik Chauhan
30 pages
Chapter - 1: Introduction To Probability and Statistics For Civil Engineering
No ratings yet
Chapter - 1: Introduction To Probability and Statistics For Civil Engineering
18 pages
MA207 Chap2
No ratings yet
MA207 Chap2
22 pages
Noise Removal
No ratings yet
Noise Removal
16 pages
Sorting Algorithm Report
No ratings yet
Sorting Algorithm Report
5 pages
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
Multiple Regression
100% (1)
Multiple Regression
58 pages
Lecture Notes Chapter 5 Data Collection Sampling and Data Analysis
No ratings yet
Lecture Notes Chapter 5 Data Collection Sampling and Data Analysis
41 pages
ParitoshShukla IPCV EXP6
No ratings yet
ParitoshShukla IPCV EXP6
10 pages
SPSS Multiple Linear Regression
No ratings yet
SPSS Multiple Linear Regression
55 pages
4 - Integration by Partial Fractions
No ratings yet
4 - Integration by Partial Fractions
19 pages
Bmte 144 em 2024 MP
No ratings yet
Bmte 144 em 2024 MP
28 pages
Aiml Neural Net
No ratings yet
Aiml Neural Net
19 pages
Multiple Regression and Correlation Analysis: BX A Y
100% (1)
Multiple Regression and Correlation Analysis: BX A Y
35 pages
An LSTM-based Prediction Model For Gradient-Descending Optimization in Virtual Learning Environments
No ratings yet
An LSTM-based Prediction Model For Gradient-Descending Optimization in Virtual Learning Environments
9 pages
Introduction To Linear Regression and Correlation Analysis: Objectives
100% (1)
Introduction To Linear Regression and Correlation Analysis: Objectives
33 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
SVM Data Imputation
No ratings yet
SVM Data Imputation
6 pages
Rauf Khann C M M File
No ratings yet
Rauf Khann C M M File
12 pages
The Analysis of Economic Data Using Multivariate and Time Series Technique
No ratings yet
The Analysis of Economic Data Using Multivariate and Time Series Technique
5 pages
Final Project - Regression Models
100% (1)
Final Project - Regression Models
35 pages
Six Textbook Mistakes in Computational Physics
No ratings yet
Six Textbook Mistakes in Computational Physics
14 pages
4 Curve Fitting Least Square Regression and Interpolation
No ratings yet
4 Curve Fitting Least Square Regression and Interpolation
59 pages
Lesson 18
No ratings yet
Lesson 18
32 pages
Chapter12 Sampling Successive Occasions
No ratings yet
Chapter12 Sampling Successive Occasions
11 pages
Lecture Notes WI3411TU Financial Time Series - 2021
No ratings yet
Lecture Notes WI3411TU Financial Time Series - 2021
107 pages
Moving Average 2
No ratings yet
Moving Average 2
11 pages
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
No ratings yet
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
2 pages
Tute Exercises PDF
No ratings yet
Tute Exercises PDF
141 pages
PDF For Successive Approximation
100% (1)
PDF For Successive Approximation
28 pages
Lanm PPT
No ratings yet
Lanm PPT
7 pages
Time Series Lecture Notes
No ratings yet
Time Series Lecture Notes
97 pages
Financial Time Series Notes
No ratings yet
Financial Time Series Notes
31 pages
Regression Analysis
100% (2)
Regression Analysis
11 pages
Multiple Regression
100% (1)
Multiple Regression
30 pages
Basic Financial Econometrics PDF
No ratings yet
Basic Financial Econometrics PDF
167 pages
Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
EEE240 Signals and Systems
No ratings yet
EEE240 Signals and Systems
29 pages
DIY Deep Learning For Vision - A Hands-On Tutorial With Caffe
No ratings yet
DIY Deep Learning For Vision - A Hands-On Tutorial With Caffe
89 pages
R Notes Chapter 1. Data Type and Data Entry
No ratings yet
R Notes Chapter 1. Data Type and Data Entry
54 pages
Electronic DL 1
No ratings yet
Electronic DL 1
17 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
New Multivariate Time-Series Estimators in Stata 11
100% (1)
New Multivariate Time-Series Estimators in Stata 11
34 pages
Time Series Characteristic
No ratings yet
Time Series Characteristic
72 pages
QT Chapter 4
No ratings yet
QT Chapter 4
6 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
50 pages
Matrix Description For Linear Block Codes
No ratings yet
Matrix Description For Linear Block Codes
24 pages
Toom Cook Polynomials Bodrato
No ratings yet
Toom Cook Polynomials Bodrato
15 pages
Homework 2 DSP
No ratings yet
Homework 2 DSP
2 pages
Gaussian Quadrature
No ratings yet
Gaussian Quadrature
10 pages
BADM 572 Module 4 Study Session 7 April 2019
No ratings yet
BADM 572 Module 4 Study Session 7 April 2019
44 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Unit 1 Introduction To Algorithms: Structure
No ratings yet
Unit 1 Introduction To Algorithms: Structure
19 pages
Numerical Assign 1
No ratings yet
Numerical Assign 1
2 pages
Topic04 - Simple Linear Regression
No ratings yet
Topic04 - Simple Linear Regression
11 pages
Fundamentals of Machine Learning
No ratings yet
Fundamentals of Machine Learning
24 pages
Z Scores
No ratings yet
Z Scores
27 pages
Radix Sort
No ratings yet
Radix Sort
6 pages
ANOVA One Way
No ratings yet
ANOVA One Way
11 pages
Laplace Transform-Signal Processing
No ratings yet
Laplace Transform-Signal Processing
41 pages
7 Types of Regression Techniques You Should Know PDF
No ratings yet
7 Types of Regression Techniques You Should Know PDF
31 pages
Deep Reinforcement Learning For Flappy Bird: Pipeline
No ratings yet
Deep Reinforcement Learning For Flappy Bird: Pipeline
1 page
ECONF241 GaussMarkov Theorem
No ratings yet
ECONF241 GaussMarkov Theorem
25 pages
Regression
No ratings yet
Regression
72 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Chapter 8 Confidence Interval
No ratings yet
Chapter 8 Confidence Interval
14 pages
Method of Moment
No ratings yet
Method of Moment
53 pages
09 Sampling Distribution
No ratings yet
09 Sampling Distribution
15 pages
Unit 3 Discrete Fourier Transform Questions and Answers - Sanfoundry PDF
No ratings yet
Unit 3 Discrete Fourier Transform Questions and Answers - Sanfoundry PDF
5 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Question and Answers For Pyplots
No ratings yet
Question and Answers For Pyplots
11 pages
Data Structure File MPCT
No ratings yet
Data Structure File MPCT
37 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
31 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
Linear Regression Analysis For STARDEX: Trend Calculation
No ratings yet
Linear Regression Analysis For STARDEX: Trend Calculation
6 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Time Series Analysis
No ratings yet
Time Series Analysis
22 pages
7 Chi-Square Test For Independence
No ratings yet
7 Chi-Square Test For Independence
3 pages
Prof. R C Manocha Autocorrelation: What Happens If The Error Terms Are Correlated?
No ratings yet
Prof. R C Manocha Autocorrelation: What Happens If The Error Terms Are Correlated?
21 pages
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Practice Midterm2 Fall2011
No ratings yet
Practice Midterm2 Fall2011
9 pages
Math114 L5 - ReducedEchelonForm
No ratings yet
Math114 L5 - ReducedEchelonForm
2 pages

Regression Techniques

Uploaded by

Regression Techniques

Uploaded by

Connect, Message, Like,Follow&Share, 100% Free Counselling➔ThankYou

1. It indicates the significant relationships between dependent variable and

Regression analysis also allows us to compare the effects of variables measured

How many types of regression techniques do

It is one of the most widely known modeling technique. Linear regression is

Linear Regression establishes a relationship between dependent variable

How to obtain best fit line (Value of a and b)?

• There must be linear relationship between independent and dependent

Logistic regression is used to find the probability of event=Success and

odds= p/ (1-p) = probability of event occurrence / probability of not event

Above, p is the probability of presence of the characteristic of interest. A

Since we are working here with a binomial distribution (dependent variable), we

• Logistic regression is widely used for classification problems

A regression equation is a polynomial regression equation if the power of

This form of regression is used when we deal with multiple independent

Ridge Regression is a technique used when the data suffers from

Above, we saw the equation for linear regression. Remember? It can be

In a linear equation, prediction errors can be decomposed into two sub

• The assumptions of this regression is same as least squared regression except

Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection

• The assumptions of lasso regression is same as least squared regression except

ElasticNet is hybrid of Lasso and Ridge Regression techniques. It is trained

A practical advantage of trading-off between Lasso and Ridge is that, it allows

• It encourages group effect in case of highly correlated variables

How to select the right regression model?

Within multiple types of regression models, it is important to choose the best

1. Data exploration is an inevitable part of building predictive model. It

You might also like