Module 5 - Regression Models
Module 5 - Regression Models
551)
Lecturer: Dr. Emmanuel Quansah
Department: Supply Chain and Information Systems - KSB
Office: SF 25, KSB Undergraduate Block
Module 5:
REGRESSION
MODELS
Learning Objectives (1 of
2)
After completing this module, students will be able to:
1. Identify variables, visualize them in a scatter diagram,
and use them in a regression model.
2. Develop simple linear regression equations from
sample data and interpret the slope and intercept.
3. Calculate the coefficient of determination and the
coefficient of correlation and interpret their
meanings.
4. Interpret the F test in a linear regression model.
5. Use computer software for regression analysis.
6. Develop a multiple regression model and use it
for
prediction purposes.
Module
Outline
1. Introduction and some terminology
2. Scatter Diagrams
3. Simple Linear Regression
4. Measuring the Fit of the Regression Model
5. Assumptions of the Regression Model
6. Testing the Model for Significance
7. Using Computer Software for Regression
8. Multiple Regression Analysis
Introduction (1 of
2)
• Regression analysis – very valuable tool for a manager
• Understand the relationship between variables
• Predict the value of one variable based on another variable
• Simple linear regression models have only two variables
• Multiple regression models have more than one
independent variable
Introduction (1 of
2)
• Dependent Variable - A dependent or outcome variable
is any variable that changes its value in response to
another variable
• Independent Variable - An independent or predictor or
explanatory variable is a variable that influences another
variable
• Correlation - is a statistical method used to determine
whether a linear relationship exists between any two
variables. It allows you to find out if there is a statistically
significant relationship between TWO variables
• Regression - is a statistical method used to describe the
nature of the relationship (causation) between any two
variables. It allows you to make predictions based on the
relationship that exists between two variables.
Introduction (2 of
2)
• Variable to be predicted is called the dependent variable
or response variable
• Value depends on the value of the independent variable(s)
• Explanatory or predictor variable
8
The General
Idea
• Simple Regression Analysis
• Considers the relationship between two variables
(an independent variable (X) and a dependent
variable (Y)) OR between a single explanatory
variable and response variable
9
The General
•Idea
Multiple regression simultaneously considers
the influence of multiple explanatory variables
(X1, X2…Xk) on a response variable Y
Y 0 1 X
where
Y = dependent variable
(response)
X = independent variable (predictor
or explanatory)
β0 = intercept (value of Y
when X = 0)
β1 = slope of the
regression line
Simple Linear Regression (2 of
2)
• True values for the slope and intercept are not known
• Estimated using sample data
Yˆ b0 b1 X
where
Ŷ = predicted value of Y
b0 = estimate of β0, based on
sample results
b1 = estimate of β1, based on
sample results
Triple A Construction (3
of 7)
• Predict sales based on area payroll
Y = Sales X=
Area payroll
– The line Figure 4.1 minimizes the errors
e Y Yˆ
Yˆ b0 b1 X
Xn
X average (mean) of X values
Y n average (mean) of Y
Y
values
1
b
( X X )(Y Y
) ( X
b0 Y Xb1)2
X
Triple A Construction (5
of 7)
TABLE 4.2 Regression Calculations for Triple A Construction
Y X (X − X̅ )2 (X − X̅ )(Y − Y̅ )
6 3 (3 − 4)2 = 1 (3 − 4)(6 − 7) = 1
8 4 (4 − 4)2 = 0 (4 − 4)(8 − 7) = 0
9 6 (6 − 4)2 = 4 (6 − 4)(9 − 7) = 4
5 4 (4 − 4)2 = 0 (4 − 4)(5 − 7) = 0
4.5 2 (2 − 4)2 = 4 (2 − 4)(4.5 − 7) = 5
9.5 5 (5 − 4)2 = 1 (5 − 4)(9.5 − 7) = 2.5
ΣY = 42 ΣX = 24 Σ(X − X̅ )2 = 10 Σ(X − X̅ )(Y − Y̅ ) =
12.5
Y
̅ = 42÷6 = X̅ = 24÷6
7 =
4
Triple A Construction (6
of 7)
• Regression calculations
X X
24
4 Y Y
42
7
6 6
6 6
b1
( X – X )(Y – Y ) 12.5
10 1.25
( X –
b0 Y Xb1)2X 7 – (1.25)(4)
2
Therefore Yˆ 2
+1.25X
Triple A Construction (7
of 7)
• Regression calculations
X X
24
4 Y Y
42
6 sales = 2 + 1.25(payroll)
7
6 6
b1
( X – X ) If the payroll6next year is $600 million
( X – (Y Ŷ– Y= )2 +12.5
1.25(6) = 9.5 or $ 950,000
b0 Y b1 X 1.25
X )2 10
Therefore 7Y–ˆ (1.25)(4) 2
2+
1.25X
Simple Linear Regression
Problem Data
Question: Given the data below, what is the simple linear
regression model that can be used to predict sales?
Week Sales
1 150
2 157
3 162
4 166
5 177
YI = 143.5 + 6.3x Resulting regression
180 model
175
170
165
160 Sales
155
Sales
Forecast
150
145
140
135
1 2 3 4 5
Period
SSE e 2 (Y Yˆ )2
SSR (Yˆ Y )2
• An important relationship
For Triple A
Construction
SST = 22.5
SSE = 6.875
SSR = 15.625
Measuring the Fit of the
Regression Model (5 of 5)
FIGURE 4.2 Deviations from the Regression Line and from
the Mean
Coefficient of Determination (1
of 2)
• The proportion of the variability in Y explained by the
regression equation
• The coefficient of determination is r2.
SSR SSE
r2 1–
SST SST
• For Triple A Construction
15.625
r 2 22.5 0.6944
Coefficient of Determination (2
of 2)
r r2
r
0.6944 0.8333
Four Values of the Correlation
Coefficient
FIGURE 4.3 Four Values of the Correlation Coefficient
Estimating the Variance (1
of 2)
• Errors are assumed to have a constant variance (σ2),
usually unknown
• Estimated using the mean squared error (MSE),
s2
SSE
s MSE
2
nk
1
where
n = number of observations in the sample
k = number of independent variables
Estimating the Variance (2
of 2)
• For Triple A
Construction
1.7188
n k 1 6 11 4
• Estimate the standard deviation, s
• The standard error of the estimate or the standard
deviation of the regression
Y 0 1 X
MSR
F
MSE
MSR
F
MSE
Steps in a Hypothesis Test (2
of 2)
4. Make a decision using one of the following methods
a) Reject the null hypothesis if the test statistic is
greater than the F value from the table in
Appendix
D. Otherwise, do not reject the
F
null hypothesis:
Reject if Fcalculated ,df1 ,d 2
f
1 k
2 nk1
b) Reject the null hypothesis if the observed significance
level, or p-value, is less than the level of significance
(α). Otherwise, do not reject the null hypothesis:
p-value P(F calculated test
statistic) Reject if p-value
Triple A Construction (1
of 3)
Step 1:
H0: β1 = 0 (no linear relationship between X and
Y) H1: β1 ≠ 0 (linear relationship exists between X
and Y)
Step 2:
Select α = 0.05
Step 3:
– Calculate the value of the test statistic
SSR 15.6250
MSR 15.6250
k 1
MSR 15.6250
F 9.09
Triple A Construction (2
of 3)
• Step 4:
• Reject the null hypothesis if the test statistic is greater
than the F value in Appendix D
df1 = k = 1
df2 = n − k − 1 = 6 − 1 − 1 = 4
DF SS MS F SIGNIFICANCE F
Regression k SSR MSR = SSR÷k MSR÷MSE P(F > MSR÷MSE)
Residual n−k−1 SSE MSE = SSE÷(n − k − 1)
Total n−1 SST
ANOVA for Triple A
Construction
PROGRAM 4.1C Excel 2016 Output for Triple A Construction
Example
Y = β 0 + β 1X 1 + β 2X 2 + … + β k X k + ε
where
Y = dependent variable (response variable)
Xi = ith independent variable (predictor or
explanatory variable)
β0 = intercept (value of Y when all Xi = 0)
βi = coefficient of the ith independent
variable
k= number of independent variables
ε= random error
Multiple Regression Analysis (2
of 2)
• To estimate these values, a sample is taken the
following equation developed
Yˆ b0 b1 X1 b2 X 2 ... bk X k
where
Ŷ= predicted value of Y
b0 = sample intercept (an estimate of β0)
bi = sample coefficient of the ith variable (an
estimate
of βi)
Multiple Regression Model…
1
yˆ
X2
• YI = a + b1x1 + b2x2
Formula for multiple linear regression
with two independent variables X
1
Yˆ i b0 b1X 1 b 2 X 2
b X e
Multiple Regression
Model…2 • A simple regression model
(one independent
variable) fits a regression
line in
Y 2-dimensional space
Yˆ b0
b1X1 b 2 X 2
• A multiple regression model
with two explanatory variables
fits a regression plane in
3-dimensional space X
X1
Multiple Regression Model…
3
• YI = a + b1x1 +
Formula for multiple linear
regression with two
independent variables
b2x2
We will be calculating a, b1
and b2 using excel software
Multiple Regression
Example…1
A distributor of frozen dessert pies wants to evaluate
factors thought to influence demand
• Excel:
• Tools / Data Analysis... / Regression
Multiple Regression
Equation 2 Variable
Multiple R
Example, Excel
Regression Statistics
0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Sales 306.526 - 24.975(X1) 74.131(X2 )
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
where
Sales is in number of pies per week
Price is in $
b Advertising
= -24.975:
1 sales will
is in $100’s. b2 = 74.131: sales will
decrease, on average, by increase, on average, by
24.975 pies per week for each 74.131 pies per week for
each $100 increase in
$1 increase in selling price, advertising, net of the
net of the effects of changes effects of changes due to
due to advertising price
Multiple Regression
Equation 2 Variable
Example
Predict sales for a week in which the
selling price is $5.50 and advertising is
$350:
Sales 306.526 - 24.975(X1 ) 74.131(X2 )
306.526 - 24.975 (5.50) 74.131 (3.5)
428.62
Note that Advertising is in
Predicted sales is $100’s, so $350 means that X2
428.62 pies = 3.5
Jenny Wilson Realty (1 of
9)
• Develop a model to determine the suggested listing price for
houses based on the size and age of the house
Yˆ b 0 b 1 X 1 b 2 X 2
where
Ŷ = predicted value of dependent variable
(selling price)
b0 = Y intercept
X1 and X2 = value of the two independent
variables
(square footage and age) respectively
b1 and b2 = slopes for X1 and X2 respectively
• Selects a sample of houses that have sold recently and
records the data
Jenny Wilson Real Estate
Data
TABLE 4.5 Jenny Wilson Real Estate Data
SELLING PRICE ($) SQUARE FOOTAGE AGE CONDITION
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,000 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
219,000 1,740 12 Mint
Jenny Wilson Realty (2 of
9)
PROGRAM 4.4A Input Screen for Jenny Wilson Realty
Multiple Regression in Excel 2016
Jenny Wilson Realty (3 of
9)
PROGRAM 4.4B Excel 2016 Output Screen for Jenny
Wilson Realty Multiple Regression Example
Y ˆ b0 b1 X 1 b 2 X 2
146,630.89 43.82X 1
2898.69X 2
Evaluating the Multiple Regression
(1 of 2)
Model
• Similar to simple linear regression models
• The p-value for the F test and r2 interpreted the same
• The hypothesis is different because there is more than
one independent variable
• The F test is investigating whether all the coefficients are
equal to 0 at the same time
Evaluating the Multiple Regression
(2 of 2)
Model
• To determine which independent variables are significant,
tests are performed for each variable
H 0 : 1 0
H1 : 1 0