QM-II Midterm OCT 2014 Solution
QM-II Midterm OCT 2014 Solution
QM-II Midterm OCT 2014 Solution
Name ________________________
Total marks:51
Section ________________________
Instructions
1.
2.
3.
4.
5.
6.
7.
8.
This is a closed book exam. You are NOT allowed to use text book and class notes.
Answer all questions only in the space provided following the question.
Show all work and give adequate explanations to get full credit.
You may use the backside of the last page for rough work only if needed. Do NOT attach any rough
work/sheets.
Encircle or underline your final answer for each part.
No clarifications will be made during the exam.
Assume 95% confidence level if necessary ( = 0.05).
Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the tables
attached with the question paper.
Question Number
Max Marks
Marks Scored
Q1
16
Q2
18
Q3
17
Total
1
-0.5453
Sales
PriceIndex
1.0000
-0.6089
-0.5033
-0.3842
Income
Interest
1.0000
1.0000
1.0000
SPSS was used to carry out Stepwise Regression in order to predict Sales. The summary of the models
fitted in the first 2 steps, the ANOVA table and Coefficients table obtained are given below.
Model Summary
Model
1
2
.708
R Square
Adjusted R
Square
.502
.465
760459.004
ANOVA
Model
1
Sum of Squares
Regression
Residual
Total
Regression
1.571E13
Residual
1.561E13
Total
3.132E13
df
Mean Square
Sig.
Coefficients
Model
Standardized
Unstandardized Coefficients
(Constant)
Std. Error
9102897.600
433224.149
-17258.553
4248.694
1.023E7
576427.867
-16873.654
3853.737
-124592.212
46820.081
(Constant)
Interest
Coefficients
Beta
Sig.
21.012
.000
-4.062
.000
17.740
.000
-.595
-4.379
.000
-.362
-2.661
.013
-.609
Question 1.1
a) What is the predictor variable used in Model 1? Explain clearly.
(1 point)
PriceIndex.
Since it has the largest correlation with sales it will be the first variable to enter the
regression equation and will give the highest value of R2.
b) What proportion of variation in Sales does this predictor variable explain in model 1? Explain
clearly.
(1 point)
c) What is the Std. Error of the Estimate for Model 1? Explain clearly.
R2 = 1- SSE/SST
Therefore, SSE = SST*(1-R2) = 1.97 1013
Standard Error (se) = sqrt (SSE/ (n-k-1)) = 838930.95 (n=30, k=1)
(2 points)
Question 1.2
a) What is themagnitude of the semipartial (or part) correlation for the variable
2? Explain.
in Model
(1 point)
b) Carry out an appropriate test, at 95% confidence level, to determine if Model 2 as a whole is valid
(significant). State the null and alternate hypotheses and show all work.
(2 points)
H0 : 1 = 2 = 0
H1 : At least one
c) Given no change in the other significant explanatory variables, can it be concluded from Model 2
that
has a higher impact on
than the other variable used in the model. Explain
clearly.
(1 point)
No.
Standardized beta value for Interest = -0.362 while Standardized beta value for
PriceIndex = -0.595
Implying that one SD change in Interest will have a lesser impact on Sales than one SD
change in PriceIndex will have.
rate by 5%
(3 points)
Question 1.4: What can you say about the relationship between
used in Models 1 and 2? Explain clearly.
The estimate for Price Index in model 1 (-17258.553) becomes less negative in the
presence of Interest in model 2 (-16873.654).
Therefore, the omitted variable Price Index has a negative bias.
Also, Interest has a negative relationship with Sales (Model 2).
Since PriceIndex, when proxying for Interest, was picking up some of this negative
relationship (resulting in the lower value of its estimate), PriceIndex must have a
positive relationship with Interest.
Question 1.5: The partial correlation of the excluded variables after Model 2 was fitted are 0.184 and
0.246 and the corresponding part correlations (semi-partial correlations) are 0.077 and 0.098 respectively.
Conduct an appropriate test, at 95% confidence level, to determine if one of these excluded variables
should be added to the regression model. State the null and alternate hypotheses and show all work.
(3 points)
We need to use the Partial F-test to see if the variable should be included.
Fval = (Change in R2/ # of added variables) / (((1-R2(Full Model)) / (n-k-1)))
Where, Change in R2= R2(Full model with 3 variables) - R2(Reduced model with 2
variables) and # of added variables = 1
Let us check the vaiable with the larger partial correlation
= (0.098)2 / (1-((0.502) + (0.098)2))/26
= 0.51127
Fcrit = 3.84 (used by SPSS) or Fcrit = F0.05,1,26 = 4.23
Since Fval < Fcrit, the excluded variable should not be added to the model.
As the variable with largest partial correlation is not included, the other variable must
not be included as well.
Question 2:
Standardized
Unstandardized Coefficients
B
(Constant)
Carat
Std. Error
-12738.581
200.801
18381.261
141.733
Coefficients
Beta
t
-63.439
Sig.
.000
Table 3. Coefficients
Model
Standardized
Unstandardized Coefficients
Std. Error
(Constant)
7.265
.011
Carat
1.375
.008
Coefficients
Beta
.921
Sig.
682.302
.000
183.004
.000
Yi t / 2, n 2 Se
1
n
( Xi
(Xi
X )2
X )2
Table 4Coefficients
Model
Standardized
Unstandardized Coefficients
B
Std. Error
(Constant)
7.346
.012
Carat
1.392
.009
Good
-.211
V Good
Carat_I
Coefficients
Collinearity Statistics
Beta
Sig.
Tolerance
607.930
.000
.932
158.992
.000
.705
1.418
.016
-.096
-13.549
.000
.484
2.064
-.134
.013
-.093
-10.143
.000
.290
3.454
-.044
.009
-.046
-4.758
.000
.257
3.892
Collinearity Statistics
Partial
Beta In
VIF
Sig.
Correlation
Minimum
Tolerance
VIF
Tolerance
.015
.878
.380
.011
.085
11.697
.085
Carat_G
-.023
-1.263
.207
-.016
.074
13.505
.074
Carat_VG
-.023
-1.263
.207
-.016
.074
13.505
.074
Ideal
Here we are looking for carat. So the second term(marked blue) is fixed. The third term i.e. the
variable good(marked red) has the maximum impact on the log of price.
Also the same can be concluded from standard beta value which is -0.096 and is lower than the
others.
Question 3:
A large grocery store in the US wishes to understand the key drivers that determine the amount
spent per transaction by their customers. Therefore, it obtained a random sample of 4000
transactions with information on the amount spent (Revenue), the product category on which the
transaction was made (Product Family), the annual income of the customer (Annual i ncome), the
number of children in the household the customer belongs to, and finally whether the customer
In order to enable regression analysis, the following indicator (dummy) variables were created:
Own_Hm = 1 (Yes to Homeowner), 0 otherwise,
Ann_Inc2 = 1 (Annual Income in the range $30K - $50K), 0 otherwise
Ann_Inc3 = 1 (Annual Income in the range $50K - $70K), 0 otherwise
Ann_Inc4 = 1 (Annual Income in the range $70K - $90K), 0 otherwise
Ann_Inc5 = 1 (Annual Income in the range $90K and above), 0 otherwise
Prod_Fam2 = 1 (Product Family is Drink), 0 otherwise
Prod_Fam3 = 1 (Product Family is Non-Consumable), 0 otherwise.
0.0340
R Square
0.0012
Adjusted R Square
0.0002
Standard Error
8.1499
Observations
4000.0000
ANOVA
df
Regression
SS
MS
4.0000
306.4494
76.6123
Residual
3995.0000
265348.3672
66.4201
Total
3999.0000
265654.8166
Coefficients
Standard Error
Significance F
1.1535
t Stat
0.3294
P-value
Intercept
12.6841
0.2786
45.5352
0.0000
Ann_Inc2
0.2787
0.3588
0.7766
0.4374
Ann_Inc3
0.7617
0.4106
1.8551
0.0637
Ann_Inc4
0.4524
0.4712
0.9602
0.3370
Ann_Inc5
-0.0130
0.4229
-0.0307
0.9755
0.059
R Square
0.004
Adjusted R Square
0.003
Standard Error
8.137
Observations
4000.000
ANOVA
df
Regression
SS
MS
1.000
933.834
933.834
Residual
3998.000
264720.983
66.213
Total
3999.000
265654.817
Coefficients
Standard Error
t Stat
Significance F
14.103
P-value
Intercept
12.136
0.255
47.564
0.000
Children
0.326
0.087
3.755
0.000
0.000
0.046
R Square
0.002
Adjusted R Square
0.002
Standard Error
8.144
Observations
4000.000
ANOVA
df
Regression
SS
MS
2.000
550.538
275.269
Residual
3997.000
265104.278
66.326
Total
3999.000
265654.817
Coefficients
Standard Error
Significance F
4.150
t Stat
0.016
P-value
Intercept
13.192
0.152
86.958
0.000
Prod_Fam2
-0.975
0.458
-2.131
0.033
Prod_Fam3
-0.743
0.332
-2.240
0.025
0.075
R Square
0.006
Adjusted R Square
0.005
Standard Error
8.131
Observations
4000.000
ANOVA
df
Regression
SS
MS
3.000
1496.798
498.933
Residual
3996.000
264158.018
66.106
Total
3999.000
265654.817
Coefficients
Standard Error
t Stat
Significance F
7.548
P-value
Intercept
12.361
0.267
46.345
0.000
Children
0.329
0.087
3.783
0.000
Prod_Fam2
-0.992
0.457
-2.171
0.030
Prod_Fam3
-0.747
0.331
-2.256
0.024
0.000
0.080
R Square
0.006
Adjusted R Square
0.006
Standard Error
8.127
Observations
4000.000
ANOVA
df
Regression
3.000
SS
MS
Sig F
8.624
0.000
1708.947
569.649
Residual
3996.000
263945.870
66.053
Total
3999.000
265654.817
Coefficients
Standard Error
t Stat
P-value
Intercept
12.214
0.258
47.399
0.000
Children
0.393
0.090
4.379
0.000
Prod_Fam2
-1.010
0.455
-2.218
0.027
Child_Fam3
-0.322
0.112
-2.882
0.004
Use the information given above to answer the following questions. For each question give adequate
explanation and support your answer with given information precisely.
a) Rank the income groups based on average revenue obtained per transaction in the sample
data from largest to smallest. Provide precise reasons as to how you obtained this
ranking. Is this ranking valid for the population? What is the average revenue per
transaction obtained for the income group ($10K-$30K)? (2 points)
Consider the prediction model obtained from Regression Output 1. It is:
Y 12.6841 0.2787 Ann _ Inc2 0.7617 Ann _ Inc3 0.4524 Ann _ Inc4 0.0130 Ann _ Inc5
It is clear from the problem description that Ann_Inc1 (income group ($10K$30K))is treated as the base. Therefore, the coefficient of each variable Ann_Inci,
where i>1, describes the difference between the average revenue obtained for
customers belonging to income group Ann_Inci and those belonging to Ann_Inc1 in
the sample dataset. Thus, by simply considering the magnitude on the coefficients
alone, we obtain the rank to be:
1) Ann_Inc3 ($50K-$70K), 2) Ann_Inc4 ($70K-$90K), 3) Ann_Inc2 ($30K-$50K), 4)
Ann_Inc1 ($10K-$30K), 5) Ann_Inc5 ($90K and above).
This ranking is however not valid for the entire population because, based on a
significance level of 0.05, the model as a whole obtained from Regression Output 1 is
not significant. The Significance F = 0.3294. That is, the average revenue spent by
various income groups are not significantly different from that spent by the income
group ($10K-$30K). That is, all other income groups have the same ranking as
Ann_Inc1.
Since the base income group is Ann_Inc1, the y intercept b 0 provides the average
revenue obtained for the income group ($10K-$30K), which is $12.6841.
b) The grocery store wishes to estimate the average amount spent per transaction on nonconsumables. Provide the most accurate estimate possible. Provide details on how you
obtained this estimate. (2 points)
The most appropriate prediction equation needed to be used in this question is the
one listed in Regression Output 3. It is,
d) Is there a significant difference in the average amount spent per transaction between that
on drinksfood and non-consumables? Why or Why not? Provide precise reasons. (2
points)
The difference in the average amount spent per transaction between food and nonconsumables is measured by the variable Prod_Fam3. Therefore, in Regression
Output 3, this question is answered by testing for the hypothesis:
H O:
Prod_Fam3
HA :
Prod_Fam3
=0
This test is conducted using the t test, for which t = -2.240. The associated p value
=0.024 < 0.05. Consequently, we reject the null hypothesis and conclude that the
difference is significant.
e) The grocery store wishes to target those customers, as well as items on which the amount
spent is maximum. Assuming that no customer has more than five children, identify the
appropriate customer segment as well as the appropriate product family. Provide precise
reasons behind your answer. (2 points)
Prediction equations obtained from Regression Output 4 and Regression Output 5
are the most appropriate models to consider as only these two models incorporate
the effect of both product family and number of children. The prediction equation
obtained from Regression Output 5 is slightly superior to that obtained from
Regression Output 4 since it has a slightly lower standard error and slightly higher
adjusted R2. Both models as a whole are significant, as well as each variable in the
presence of others. The appropriate prediction equation from Regression Output 5
is:
From the above model it is clear that with Children = 0, Product Family food and
non-consumables are superior to drinks, as the coefficient of Prod_Fam2 is -0.1010.
As the number children increases, for food the average revenue increases at a faster
rate than for non-consumables. For food, the rate of increase is 0.393. For nonconsumables it is (0.393-0.322) = 0.071.
Therefore, the grocery store should target those with 5 children purchasing food
items. The same conclusion is also obtained from the prediction equation in
Regression Output 4.
f) What is the chance that a customer with 3 children will spend more than $10.00 on food
items per transaction? Provide details on your calculations. (3 points)
Here, the option is to use either the prediction equation obtained from Regression
Output 4 or Regression Output 5. However, the prediction equation obtained from
Regression Output 5 is marginally better than that from Regresion Output 4, since
it has a lower standard error and a slightly better R 2 value.
Therefore, the appropriate prediction equation to use is:
The average revenue obtained from a customer with 3 children spending on food is
12.214 + 3*0.393 = 13.393. The estimated standard deviation around this mean is
8.127. Therefore the z value associated with $10.00 is (10-13.393)/8.127 = -0.4175.
-0.4175) = 0.6618.
g) Do the number of children effect food purchases more than non-consumables? Why or
why not? State your reasons precisely. (2 points)
The prediction equation obtained from Regression Output 5 is
Y 12.214 0.393Children .
For non-consumables, the prediction equation becomes
Y 12.214 0.071Children .
Since 0.393>0.071, the number of children effect food purchases more than nonconsumables.
h) If the grocery store has reason to believe that in addition to the independent variables
considered in Regression Output 4, homeowners spend significantly more on non consumables than non-home owners on any product category. If so, how will you modify
the model provided in Regression Output 4? Provide the model in terms. If you are
adding new variables to the model, provide details on what you expect the value to be.
Positive? Negative? (2 points)
We first create an interaction variable OwnH_PFam3 = Own_Hm*Prod_Fam3. Clearly,
OwnH_PFam3 = 1, only if Own_Hm = 1 and Prod_Fam3 = 1. Thus, the modified model
would be:
Y=
1Children
We would expect
4>
2Prod_Fam2
0.
3Child_Fam3
4OwnH_PFam3
+ .