BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis
Multiple Regression with Two Independent Variables
A collector of antique grandfather clocks believes that the price received for the clocks at an
antique auction increases with the age of the clocks and with the number of bidders. Thus, the
model hypothesized is
Y 1 X 1 2 X 2
where Y = Auction price, X1 = Age of clock (years), and X2 = Number of bidders.
1. Data Collection
Age Bidder Price
X1 X2 Y
127 13 1235
115 12 1080
127 7 846
150 9 1522
156 6 1047
182 11 1979
156 12 1822
132 10 1253
137 9 1297
113 9 946
137 15 1713
117 11 1024
137 8 1147
153 6 1092
117 13 1152
126 10 1336
170 14 2131
182 8 1550
162 11 1884
184 10 2041
143 6 854
159 9 1483
108 14 1055
175 8 1545
108 6 729
179 9 1792
111 15 1175
187 8 1593
111 7 785
115 7 744
194 5 1356
168 7 1262
Hsieh, P-H 1
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis
2. Initial Analysis
Plot of Price vs Age Plot of Price vs Bidder
2200 2200
1900 1900
1600 1600
Price
Price
1300 1300
1000 1000
700 700
100 120 140 160 180 200 5 7 9 11 13 15
Age Bidder
Hsieh, P-H 2
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis
3. Fitting the Model and 4. Assessing the Model
5. Model Diagnostics : Checking the Conditions
Normal Probability Plot for SRESIDUALS Residual Plot
99.9 2.2
99
Studentized residual
95 1.2
percentage
80
50 0.2
20
5 -0.8
1
0.1 -1.8
-1.8 -0.8 0.2 1.2 2.2 0 10 20 30 40
SRESIDUALS row number
Histogram for SRESIDUALS Residual Plot
8 2.2
Studentized residual
6 1.2
frequency
4 0.2
2 -0.8
0 -1.8
-2 -1 0 1 2 700 1000 1300 1600 1900 2200
SRESIDUALS predicted Price
Hsieh, P-H 3
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis
Residual Plot Residual Plot
2.2 2.2
Studentized residual
Studentized residual
1.2 1.2
0.2 0.2
-0.8 -0.8
-1.8 -1.8
5 7 9 11 13 15 100 120 140 160 180 200
Bidder Age
Is there any violation of the required conditions? (normality, independence, constant variance,
and zero mean.)
When the number of bidders is around 10, does the current model tend to overestimate or
underestimate Price? How about the number of bidders is around 5 or 15?
6. Model Selection
After trying out several models, there are only a few remaining models that passed all the tests
and satisfied the required conditions. The following table summarizes the STATGRAPHICS
PLUS outputs from each model. Which one of the competing models should be chosen as our
final model? And why? (Assume that our current model passed all the tests and satisfied the
required conditions.)
Candidate SSResidual sE R-sq R-sq(adj) # of X’s
Current 2
A 492317 132.6 0.912 0.854 3
B 489735 134.7 0.920 0.832 4
C 354900 130.0 0.960 0.893 10
7. Using the (Final) Model
What is the total variation of auction prices? How much has been explained by the model?
If there are 10 bidders and the age of the clock is 100 years old, what is the expected auction
price?
If Age is held fixed and the number of bidders increases from 10 to 11, how much does Price
increase?
Hsieh, P-H 4
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis
Multiple Regression with One Dummy Variable
Sometimes a predictor/independent variable (X) can take only two possible values; e.g. gender
(male or female). Such qualitative variables are handled in a multiple regression analysis by use
of 0-1 variables. This kind of qualitative variables are also referred to as “dummy” variables.
A bank would like to develop a model to predict the total sum of money that customers withdraw
(Y) from Automatic Teller Machines (ATMs) on a weekend based on the median value of homes
(X1) in the neighborhood in which the ATM is located and the location of the ATM (X2) (no =
not a shopping center and yes = shopping center). A random sample of 15 ATM locations is
selected. The multiple linear regression model:
Y = + 1 X1 + 2 X2 +
with normal error terms is expected to be appropriate. Perform a multiple linear regression
analysis.
Median Value Location Amount Median Value Location Amount
of Homes of ATM Withdrawn of Homes of ATM Withdrawn
($000) ($000) ($000) ($000)
X1 X2 Y X1 X2 Y
225 yes 120 225 120
170 no 99 170 99
153 yes 91 153 91
132 no 82 132 82
237 yes 124 237 124
187 yes 104 187 104
245 yes 127 245 127
125 yes 80 125 80
215 yes 115 215 115
170 no 97 170 97
223 no 117 223 117
147 no 86 147 86
197 yes 109 197 109
167 no 94 167 94
210 no 112 210 112
Hsieh, P-H 5
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis
Regression Printout
Questions
1. Write down the fitted model.
2. Is the assumed model reliable? Why?
3. What is the value of R2? the adjusted R2? To select a model, why do we prefer adj-R2 to R2?
4. Predict the amount of money withdrawn for a neighborhood in which the median value of
homes is $200,000 for an ATM that is located in a shopping center.
5. If the median value of homes increases by $2,000, then the amount of money withdrawn from
an ATM located in a shopping center is expected to increase by .
6. If the median value of homes is $200,000, then the amount of money withdrawn from an
ATM located in a shopping center is ; and the amount of money withdrawn
from an ATM located outside a shopping center is . What is the difference?
Hsieh, P-H 6