0% found this document useful (0 votes)
672 views28 pages

Regression Analysis PPT

The document is about regression analysis. It discusses simple linear regression, where there is one independent variable and one dependent variable. It explains how to estimate the parameters of the linear regression model using least squares estimation. It also discusses properties of the parameter estimates and assumptions of the regression model like normality and constant variance of the error term. Finally, it briefly introduces multiple linear regression and logistic regression.

Uploaded by

keshavmandowra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
672 views28 pages

Regression Analysis PPT

The document is about regression analysis. It discusses simple linear regression, where there is one independent variable and one dependent variable. It explains how to estimate the parameters of the linear regression model using least squares estimation. It also discusses properties of the parameter estimates and assumptions of the regression model like normality and constant variance of the error term. Finally, it briefly introduces multiple linear regression and logistic regression.

Uploaded by

keshavmandowra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Regression

”IBM - 312”

September 2022

Sumit Kumar Yadav

Department of Management Studies


Indian Institute of Technology, Roorkee

”IBM - 312” Regression 1 / 22


Regression Analysis

Looking for a relationship between a set of variables


Usually, there is only one dependent variable (Y )
Others are explanatory variables or independent variables
(X1 , X2 , . . . )

”IBM - 312” Regression 2 / 22


Regression Analysis

Looking for a relationship between a set of variables


Usually, there is only one dependent variable (Y )
Others are explanatory variables or independent variables
(X1 , X2 , . . . )
Can we assume a functional relationship between Y and the
independent variables? Y = f (X1 , X2 , . . . )
Usually, because of inherent nature of phenomena that we are
trying to model, there is randomness and hence

”IBM - 312” Regression 2 / 22


Regression Analysis

Looking for a relationship between a set of variables


Usually, there is only one dependent variable (Y )
Others are explanatory variables or independent variables
(X1 , X2 , . . . )
Can we assume a functional relationship between Y and the
independent variables? Y = f (X1 , X2 , . . . )
Usually, because of inherent nature of phenomena that we are
trying to model, there is randomness and hence
Y = f (X1 , X2 , . . . ) + 
 is typically assumed to be a random variable with mean 0
and standard deviation σ
Thus, E (Y ) = f (X1 , X2 , . . . )

”IBM - 312” Regression 2 / 22


Simple Linear Regression

If the assumed functional form is linear, we call it linear


regression
If the number of independent variables is one, we call it simple
linear regression
The linear form typically assumed is Y = α + βX + 

Simple Linear Regression Model


Y = α + βX + 

”IBM - 312” Regression 3 / 22


Simple Linear Regression

Simple Linear Regression Model


Y = α + βX + 

β can be interpreted as average increase in Y for an unit


increase in X
α, in general, has no interpretation

”IBM - 312” Regression 4 / 22


Simple Linear Regression

Simple Linear Regression Model


Y = α + βX + 

α and β are population parameters, and hence are unknown


Our task would be to estimate the values of α and β from the
sample observations

”IBM - 312” Regression 5 / 22


Simple Linear Regression

Simple Linear Regression Model


Y = α + βX + 

α and β are population parameters, and hence are unknown


Our task would be to estimate the values of α and β from the
sample observations
When the number of independent variables is just 1, we can
observe the scatter plot to observe if linear relationship can be
assumed between the variables
If the scatter plot doesn’t indicate that a linear relationship
can be assumed, we should possibly drop the idea of simple
linear regression, and do something more to understand the
relationship between the variables

”IBM - 312” Regression 5 / 22


Simple Linear Regression

Simple Linear Regression Model


Y = α + βX + 

We would estimate the values of α and β from sample


observations
Denote by α̂ and β̂ the estimates of α and β respectively
Note that α and β uniquely determine the line
Thus, given the data, we would determine α̂ and β̂, which
would uniquely determine a line
Which line to fit??

”IBM - 312” Regression 6 / 22


Simple Linear Regression

Simple Linear Regression Model


Y = α + βX + 

We would estimate the values of α and β from sample


observations
Denote by α̂ and β̂ the estimates of α and β respectively
Note that α and β uniquely determine the line
Thus, given the data, we would determine α̂ and β̂, which
would uniquely determine a line
The line which minimizes the sum of square of residuals

”IBM - 312” Regression 6 / 22


Simple Linear Regression

Simple Linear Regression Model


Y = α + βX + 

Given Data: (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )


Xn X n  2
Minimize: ei2 = yi − α̂ − β̂xi
i=1 i=1
Differentiate w.r.t α̂ and β̂ and equate to zero
We obtain 2 equations in 2 unknowns, which on solving give -
Pn
(y − y )(xi − x)
Pn i
β̂ = i=1 2
i=1 (xi − x)
α̂ = y − β̂x

”IBM - 312” Regression 7 / 22


Simple Linear Regression - Estimation of Parameters

Simple Linear Regression Model


Y = α + βX + 
Pn
(y − y )(xi − x)
Pn i
β̂ = i=1 2
i=1 (xi − x)
α̂ = y − β̂x
To fully specify the model, one more parameter needs to be
estimated, which is ??

”IBM - 312” Regression 8 / 22


Simple Linear Regression - Estimation of Parameters

Simple Linear Regression Model


Y = α + βX + 
Pn
(y − y )(xi − x)
Pn i
β̂ = i=1 2
i=1 (xi − x)
α̂ = y − β̂x
To fully specify the model, one more parameter needs to be
estimated, which is σ
σ is estimated using the standard deviation of residuals
v
uP  2
u n
t i=1 yi − α̂ − β̂xi
σ̂ = s =
n−2

”IBM - 312” Regression 8 / 22


Goodness of fit - R 2

Excel will report a R 2 to you


What does it mean??
Gives an idea about what percentage of variability in Y is
explained by the regression equation
SST = SSR + SSE

”IBM - 312” Regression 9 / 22


Simple Linear Regression - Properties of Estimates of
Parameters

Simple Linear Regression Model


Y = α + βX + 

Sum of residuals is zero


Residuals are uncorrelated with xi0 s
It can also be shown that yˆi and ei are uncorrelated
X n Xn
yi = yˆi , since yi = yˆi + ei
i=1 i=1

”IBM - 312” Regression 10 / 22


Assumptions of Regression

 is a random variable that is normally distributed with mean


0 and s.d. σ
Variance of  is same for all values of x

”IBM - 312” Regression 11 / 22


Examples of Residual Plots

Source - http://analyticspro.org/2016/03/05/r-tutorial-residual-analysis-for-regression/

”IBM - 312” Regression 12 / 22


Multiple Linear Regression

We now have more than 1 independent variables. (say k)


Multiple Linear Regression Model
Y = α + β1 X1 + β2 X2 + · · · + βk Xk + 

0
Interpretation of β s??
0
How do you obtain α & β s??
Partial Differentiation to obtain k + 1 equations in k + 1
unknowns
Example

”IBM - 312” Regression 13 / 22


Some other aspects in Regression

Adjusted R 2
(1 − R 2 )(n − 1)

2
Radj = 1 −
n−k −1
Outliers
Multi-collinearity

”IBM - 312” Regression 14 / 22


Dealing with Categorical Variables in Regression

How to include them in the model?


If there are more than 2 categories
Interpretation
Need for interaction term in Regression
Interpretation

”IBM - 312” Regression 15 / 22


Logistic Regression

Example - A customer purchase decision for onion during her visits


to a vegetable store are shown below in the table.

Visit Index 1 2 3 4 5 6 7 8 9 10
Price 2 2 2 2 2 3 3 3 3 4
Decision Y Y Y N Y Y Y N N Y

Visit Index 11 12 13 14 15 16 17 18 19 20
Price 4 4 4 4 4 5 5 5 5 5
Decision N N Y N N N N N N Y

”IBM - 312” Regression 16 / 22


Logistic Regression

Example - A customer purchase decision for onion during her visits


to a vegetable store are shown below in the table.

Visit Index 1 2 3 4 5 6 7 8 9 10
Price 2 2 2 2 2 3 3 3 3 4
Decision Y Y Y N Y Y Y N N Y

Visit Index 11 12 13 14 15 16 17 18 19 20
Price 4 4 4 4 4 5 5 5 5 5
Decision N N Y N N N N N N Y

When the price is 2.5, what is the probability that the person will
make a purchase?

”IBM - 312” Regression 16 / 22


Example Contd-

When the price is 2.5, what is the probability that the person
will make a purchase?

Price No. of Visits No. of Purchases Probability of Purchase


2 5 4 0.8
3 4 2 0.5
4 6 2 0.33
5 5 1 0.2

”IBM - 312” Regression 17 / 22


Example Contd-

When the price is 6, what is the probability that the person


will make a purchase?

Price No. of Visits No. of Purchases Probability of Purchase


2 5 4 0.8
3 4 2 0.5
4 6 2 0.33
5 5 1 0.2

”IBM - 312” Regression 18 / 22


Ideas to answer the problem

Objective - Find the probability of purchase as a function of price

P(purchase) = f(Price)

Can we make use of linear regression??

”IBM - 312” Regression 19 / 22


Excel regression

Attempt for a non-linear fit???

”IBM - 312” Regression 20 / 22


Limitations of Linear Regression

Doesn’t explicitly recognize 0-1 nature of the response


The impact of change in price on probability of purchase
decision is different at different levels of price
Assumed Linear regression equation treats it as a constant

”IBM - 312” Regression 21 / 22


Thank you for your attention

”IBM - 312” Regression 22 / 22

You might also like