Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
Science
Post Mid Sessions 2 & 3
November 4th and 6th 2019
Regression
• Regression Analysis is used to estimate a function f( ) that
describes the relationship between a continuous dependent
variable and one or more independent variables.
Y = f(X1, X2, X3,…, Xn) + e
Note:
• f( ) describes systematic variation in the relationship.
· e represents the unsystematic variation (or random error) in the
relationship.
Advertising – Sales Example
• Consider the relationship between advertising (X1) and
sales (Y) for a company.
• There probably is a relationship...
...as advertising increases, sales should increase.
• But how would we measure and quantify this
Advertising Actual Sales
relationship? Obs (in $1000s) (in $1000s)
1 30 184.4
2 40 279.1
3 40 244.0
4 50 314.2
5 60 382.2
6 70 450.2
7 70 423.6
8 70 410.2
9 80 500.4
10 90 505.3
Scatter Plot
600.0
Actual Sales in $1000s
500.0
400.0
300.0
200.0
100.0
0.0
20 30 40 50 60 70 80 90 100
Advertising in $1000s
Simple Linear Regression Model
The scatter plot shows a linear relation between
advertising and sales.
So the following regression model is suggested by the
data,
Yi 0 1X1i i
This refers to the true relationship between the entire
population of advertising and sales values.
The estimated regression function (based on our
sample) will be represented as,
b b X
Yi 0 1 1i
i 1
i 0 b X
1 1i
)) 2
Regression
Curve
X
R Statistic
2
Yi - Y ^ (estimated value)
Yi
^ -Y
Y i
^
Y = b0 + b1X
X
Understanding R 2
n n n
(Y Y) (Y Y ) (Y Y)
i 1
i
2
i 1
i i
2
i 1
i
2
or,
TSS = ESS + RSS
2 RSS ESS
R 1
TSS TSS
Predicting the Dependent Value
Suppose we want to estimate the average
levels of sales expected if $65,000 is spent on
advertising.
36.342 5550
Y . X1
i i
i 1
)2
(Yi Yi
Se
n k 1
*
* **
*
* * * *
* *
* * * *
* *
* *
*
* *
*
X2 X1
Real Estate Example
• A real estate appraiser wants to develop a model to
help predict the fair market values of residential
properties.
• Three independent variables will be used to estimate
the selling price of a house:
– total square footage
– number of bedrooms
– size of the garage
Selecting the Model
• We want to identify the simplest model that
adequately accounts for the systematic variation in
the Y variable.
• Arbitrarily using all the independent variables may
result in overfitting.
• A sample reflects characteristics:
– representative of the population
– specific to the sample
• We want to avoid fitting sample specific
characteristics -- or overfitting the data.
Models with One IV
• With simplicity in mind, suppose we fit three
simple linear regression functions:
b b X
Y i 0 1 1i
b b X
Yi 0 2 2i
b b X
Yi 0 3 3i
Key regression results are:
Variables Adjusted Parameter
in the Model R2 R2 Se Estimates
X1 0.870 0.855 10.299 b0=9.503, b1=56.394
X2 0.759 0.731 14.030 b0=78.290, b2=28.382
X3 0.793 0.770 12.982 b0=16.250, b3=27.607
The model using X1 accounts for 87% of the
variation in Y, leaving 13% unaccounted for.
Multiple Regression in Excel
When using more than one independent
variable, all variables for the X-range must be
in one contiguous block of cells (that is, in
adjacent columns).
Models with 2 IVs
• Now suppose we fit the following models with two
independent variables:
b b X b X
Yi 0 1 1i 2 2i
b b X b X
Yi 0 1 1 3 3 i i