0% found this document useful (0 votes)
66 views6 pages

Multiple Regression With Two Independent Variables: 1. Data Collection

The document describes a multiple regression analysis conducted to analyze the relationship between auction price of antique grandfather clocks (Y), age of the clock (X1), and number of bidders (X2). Data was collected on 50 clocks and initially plotted. A regression model was fit to the data and model diagnostics were examined. Some residual plots showed potential violations of normality and constant variance assumptions around certain values of X1 and X2.

Uploaded by

Bharani Dharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views6 pages

Multiple Regression With Two Independent Variables: 1. Data Collection

The document describes a multiple regression analysis conducted to analyze the relationship between auction price of antique grandfather clocks (Y), age of the clock (X1), and number of bidders (X2). Data was collected on 50 clocks and initially plotted. A regression model was fit to the data and model diagnostics were examined. Some residual plots showed potential violations of normality and constant variance assumptions around certain values of X1 and X2.

Uploaded by

Bharani Dharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

BA 275 Modeling Relationships

Winter 2007 Multiple Regression Analysis

Multiple Regression with Two Independent Variables

A collector of antique grandfather clocks believes that the price received for the clocks at an
antique auction increases with the age of the clocks and with the number of bidders. Thus, the
model hypothesized is

Y    1 X 1   2 X 2  

where Y = Auction price, X1 = Age of clock (years), and X2 = Number of bidders.

1. Data Collection

Age Bidder Price


X1 X2 Y
127 13 1235
115 12 1080
127 7 846
150 9 1522
156 6 1047
182 11 1979
156 12 1822
132 10 1253
137 9 1297
113 9 946
137 15 1713
117 11 1024
137 8 1147
153 6 1092
117 13 1152
126 10 1336
170 14 2131
182 8 1550
162 11 1884
184 10 2041
143 6 854
159 9 1483
108 14 1055
175 8 1545
108 6 729
179 9 1792
111 15 1175
187 8 1593
111 7 785
115 7 744
194 5 1356
168 7 1262

Hsieh, P-H 1
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis

2. Initial Analysis

Plot of Price vs Age Plot of Price vs Bidder

2200 2200

1900 1900

1600 1600
Price

Price

1300 1300

1000 1000

700 700
100 120 140 160 180 200 5 7 9 11 13 15

Age Bidder

Hsieh, P-H 2
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis

3. Fitting the Model and 4. Assessing the Model

5. Model Diagnostics : Checking the Conditions

Normal Probability Plot for SRESIDUALS Residual Plot

99.9 2.2

99
Studentized residual

95 1.2
percentage

80
50 0.2

20
5 -0.8
1
0.1 -1.8
-1.8 -0.8 0.2 1.2 2.2 0 10 20 30 40

SRESIDUALS row number

Histogram for SRESIDUALS Residual Plot

8 2.2
Studentized residual

6 1.2
frequency

4 0.2

2 -0.8

0 -1.8
-2 -1 0 1 2 700 1000 1300 1600 1900 2200

SRESIDUALS predicted Price

Hsieh, P-H 3
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis

Residual Plot Residual Plot

2.2 2.2

Studentized residual
Studentized residual

1.2 1.2

0.2 0.2

-0.8 -0.8

-1.8 -1.8
5 7 9 11 13 15 100 120 140 160 180 200

Bidder Age

 Is there any violation of the required conditions? (normality, independence, constant variance,
and zero mean.)
 When the number of bidders is around 10, does the current model tend to overestimate or
underestimate Price? How about the number of bidders is around 5 or 15?

6. Model Selection

 After trying out several models, there are only a few remaining models that passed all the tests
and satisfied the required conditions. The following table summarizes the STATGRAPHICS
PLUS outputs from each model. Which one of the competing models should be chosen as our
final model? And why? (Assume that our current model passed all the tests and satisfied the
required conditions.)

Candidate SSResidual sE R-sq R-sq(adj) # of X’s


Current 2
A 492317 132.6 0.912 0.854 3
B 489735 134.7 0.920 0.832 4
C 354900 130.0 0.960 0.893 10

7. Using the (Final) Model

 What is the total variation of auction prices? How much has been explained by the model?
 If there are 10 bidders and the age of the clock is 100 years old, what is the expected auction
price?
 If Age is held fixed and the number of bidders increases from 10 to 11, how much does Price
increase?

Hsieh, P-H 4
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis

Multiple Regression with One Dummy Variable


Sometimes a predictor/independent variable (X) can take only two possible values; e.g. gender
(male or female). Such qualitative variables are handled in a multiple regression analysis by use
of 0-1 variables. This kind of qualitative variables are also referred to as “dummy” variables.

A bank would like to develop a model to predict the total sum of money that customers withdraw
(Y) from Automatic Teller Machines (ATMs) on a weekend based on the median value of homes
(X1) in the neighborhood in which the ATM is located and the location of the ATM (X2) (no =
not a shopping center and yes = shopping center). A random sample of 15 ATM locations is
selected. The multiple linear regression model:

Y =  + 1 X1 + 2 X2 + 

with normal error terms is expected to be appropriate. Perform a multiple linear regression
analysis.

Median Value Location Amount Median Value Location Amount


of Homes of ATM Withdrawn of Homes of ATM Withdrawn
($000) ($000) ($000) ($000)
X1 X2 Y X1 X2 Y
225 yes 120 225 120
170 no 99 170 99
153 yes 91 153 91
132 no 82 132 82
237 yes 124 237 124
187 yes 104 187 104
245 yes 127 245 127
125 yes 80 125 80
215 yes 115 215 115
170 no 97 170 97
223 no 117 223 117
147 no 86 147 86
197 yes 109 197 109
167 no 94 167 94
210 no 112 210 112

Hsieh, P-H 5
BA 275 Modeling Relationships
Winter 2007 Multiple Regression Analysis

Regression Printout

Questions

1. Write down the fitted model.

2. Is the assumed model reliable? Why?

3. What is the value of R2? the adjusted R2? To select a model, why do we prefer adj-R2 to R2?

4. Predict the amount of money withdrawn for a neighborhood in which the median value of
homes is $200,000 for an ATM that is located in a shopping center.

5. If the median value of homes increases by $2,000, then the amount of money withdrawn from
an ATM located in a shopping center is expected to increase by .

6. If the median value of homes is $200,000, then the amount of money withdrawn from an
ATM located in a shopping center is ; and the amount of money withdrawn
from an ATM located outside a shopping center is . What is the difference?

Hsieh, P-H 6

You might also like