0% found this document useful (0 votes)
15 views15 pages

Session 11-12-Linear and Multiple Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Session 11-12-Linear and Multiple Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Linear and Multiple Regression

Prof. Rajiv Kumar


IIM Kashipur
What is Regression?

 Regression is a predictive model used to predict the value of one variable which is referred as
dependent variable or outcome variable from the value(s) of one or more other variables referred as
input variable(s) or independent variable(s).

 There are two types of regression,


• Simple linear regression
• Multiple regression.
Simple Linear Regression

 Linear regression has single independent variable. The general form of the regression model is:

 which is estimated by

 where the b1 represent the partial regression coefficients, (i.e., b1 represents the expected change in Y
when X1 is changed by one unit).

 and ‘a’ is the intercept.


Multiple Regression

 Multiple regression has more than one independent variable. The general form of the multiple
regression model is:

 which is estimated by

 where the bi’s represent the partial regression coefficients (i.e., b1 represents the expected change in Y
when X1 is changed by one unit and the X2 through Xk are held constant).

 and ‘a’ is the intercept.


Regression in Python (1 of 5)

 Student_Sales_Data.xlsx
Student

NoOfHours Skill Freshmen_Score


2 87 55
2.5 84 62
3 82 65
3.5 74 70
4 73 77
4.5 69 82
5 68 75
5.5 55 83
6 63 85
6.5 80 88
Regression in R
library(readxl)
df<-read_excel('Student_Sales_Data.xlsx', sheet = 'Student')
#Print Data Frame
df
#Call lm() function for regression
model<-lm(df$Freshmen_Score~df$NoOfHours+df$Skill)
#print results
OUTPUT:
summary(model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Regression Model (Intercept) 56.5860 15.0293 3.765 0.007028 **
df$NoOfHours 6.3242 1.0509 6.018 0.000533 ***
Freshmen_Score=56.586+6.3242*NoOfHours
df$Skill -0.1260 0.1582 -0.797 0.451709
---
Note: Skill is not found to be significant
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(p>0.05), hence not included in the model.
Regression in Python (1 of 5)

 Student_Sales_Data.xlsx
Sales
DayType Price($) Advertising ($100s) Sales($)
Weekdays 5.5 3.3 350
Weekdays 7.5 3.3 460
Weekdays 8 3 350
Weekdays 8 4.5 430
Weekdays 6.8 3 350
Weekdays 7.5 4 380
Weekdays 4.5 3 430
Weekdays 6.4 3.7 470
Weekend 7 3.5 450
Weekend 5 4 490
Weekend 7.2 3.5 340
Weekend 7.9 3.2 300
Weekend 5.9 4 440
Weekend 5 3.5 450
Weekend 7 2.7 300
Regression in R

library(readxl)
df<-read_excel('Student_Sales_Data.xlsx', sheet = 'Sales')
#Print Data Frame
#df
#Call lm() function for regression
model<-lm(Sales~Price+Advertising, data=df)
#print results OUTPUT:

summary(model) Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 306.53 114.25 2.683 0.0199 *
Regression Model df$Price -24.98 10.83 -2.306 0.0398 *
df$Advertising 74.13 25.97 2.855 0.0145 *
Sales= 306.53 -24.98 *Price+74.13 *Advertising
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Using The Equation to Make Predictions

Regression Model
Sales= 306.53 -24.98 *Price+74.13 *Advertising

where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.

b1 = -24.98: sales will b2 = 74.13: sales will


decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price, net of advertising, net of the
the effects of changes effects of changes
due to advertising due to price
Using The Equation to Make Predictions

Predict sales for a week in which the selling price


is $5.50 and advertising is $350:

Sales= 306.53 -24.98 *Price+74.13 *Advertising


=306.53-24.98*5.50+74.13*3.5
=428.59

Note that Advertising is


Predicted sales in $100s, so $350 means
is 428.62 pies that X2 = 3.5
Dummy Variables
 To handle a Categorical independent variable, one can use Dummy Variable
 A dummy variable is a categorical independent variable with two levels
• Ex. Yes, No
• Male, Female
• coded as 0 or 1
 The number of dummy variables required is one less than the number of
levels (number of levels - 1)
 Eg. Number of Dummy variables required for a Gender (Male, Female)=2-
1=1
 Eg. Number of Dummy variable required for year quarters (Q1, Q2, Q3, Q4)
= 4-1=3
Dummy-Variable Example (with 2 Levels)

Let:
Y = pie sales Ŷ  b0  b1 X1  b 2 X 2
X1 = price
X2 = holiday (X2 = 1 if a holiday occurred during the week)

(X2 = 0 if there was no holiday that week)


Dummy-Variable Models (more than 2 Levels)

The number of dummy variables is one less than the number of


levels
Example:
Y = house price ; X1 = square feet

If style of the house is also thought to matter:


Style = ranch, split level, colonial

Three levels, so two dummy


variables are needed
Dummy-Variable Example-Predicting Auto Sales

Let: Sales = Constant + b1*LAGGNP + b2*LAGUEMP +


Y = Sales b3*LAGINT + b4*Q_Dummy1 + b5*Q_Dummy2 +
b6*Q_Dummy3
X1 = LAGGNP
X2=LAGUEMP Quarter Q_Dummy1 Q_Dummy2 Q_Dummy3
1 1 0 0
X3=LAGINT 2 0 1 0

X4 = Quarter (Q1-Dummy, 3 0 0 1
4 0 0 0

Q2-Dummy,
Total number of dummy variables
Q3-Dummy) = 4 (Total quarters)-1=3
Dummy-Variable Example-Predicting Auto Sales

Regression Equation

Predicting quarterly sales = 3146.86 + 0.17*LAGGNP -


93.55*LAGUNEMP – 73.90*LAGINT+ 167.94*Q_Dummy1 +
378.01*Q_Dummy2 +200.30*Q_Dummy3

Not found to be significant.


Exclude from the model.

You might also like