Practice Questions - Multiple Linear Regression
Practice Questions - Multiple Linear Regression
Regression 165
CHAPTER 17
MULTIPLE REGRESSION
SECTIONS 1 - 3
1. In a multiple regression analysis, if the model provides a poor fit, this indicates that:
a. the sum of squares for error will be large
b. the standard error of estimate will be large
c. the multiple coefficient of determination will be close to zero
d. All of the above
ANSWER: d
2. In a multiple regression analysis, when there is no linear relationship between each of the
independent variables and the dependent variable, then
a. multiple t-tests of the individual coefficients will likely show some are significant
b. we will conclude erroneously that the model has some validity
c. the chance of erroneously concluding that the model is useful is substantially less
with the F-test than with multiple t-tests
d. All of the above statements are correct
ANSWER: d
166 Chapter Seventeen
3. In testing the validity of a multiple regression model, a large value of the F-test statistic
indicates that:
a. most of the variation in the independent variables is explained by the variation in y
b. most of the variation in y is explained by the regression equation
c. most of the variation in y is unexplained by the regression equation
d. the model provides a poor fit
ANSWER: b
5. In a multiple regression analysis involving 25 data points, the standard error of estimate
squared is calculated as s 1.8 and the sum of squares for error as SSE = 36. Then, the
2
6. When the independent variables are correlated with one another in a multiple regression
analysis, this condition is called:
a. heteroscedasticity
b. homoscedasticity
c. multicollinearity
d. elasticity
ANSWER: c
7. In a multiple regression model, the mean of the probability distribution of the error
variable is assumed to be:
a. 1.0
b. 0.0
c. Any value greater than 1
d. k, where k is the number of independent variables included in the model
ANSWER: b
Multiple
Regression 167
9. In a multiple regression model, the standard deviation of the error variable is assumed
to be:
a. constant for all values of the independent variables
b. constant for all values of the dependent variable
c. 1.0
d. not enough information is given to answer this question
ANSWER: a
11. In a multiple regression analysis involving 6 independent variables, the sum of squares
are calculated as: Total variation in Y = SSY = 900, SSR = 600 and SSE = 300. Then, the
value of the F-test statistic for this model is:
a. 150
b. 100
c. 50
d. None of the above
ANSWER: d
12. In order to test the validity of a multiple regression model involving 5 independent
variables and 30 observations, the numerator and denominator degrees of freedom for the
critical value of F are, respectively,
a. 5 and 30
b. 6 and 29
c. 5 and 24
d. 6 and 25
ANSWER: c
13. In multiple regression models, the values of the error variable are assumed to be:
a. autocorrelated
b. dependent of each other
c. independent of each other
d. always positive
168 Chapter Seventeen
ANSWER: c
14. A multiple regression model involves 5 independent variables and a sample of 10 data
points. If we want to test the validity of the model at the 5% significance level, the
critical value is:
a. 6.26
b. 3.33
c. 9.36
d. 4.24
ANSWER: a
16. In a multiple regression analysis involving k independent variables and n data points, the
number of degrees of freedom associated with the sum of squares for error is:
a. k-1
b. n-k
c. n-1
d. n-k-1
ANSWER: d
20. To test the validity of a multiple regression model involving two independent variables,
the null hypothesis is that:
a. 0 1 2
b. 1 2 0
c. 1 2
d. 1 2
ANSWER: b
22. Which of the following is not true when we add an independent variable to a multiple
regression model?
a. Adjusted coefficient of determination can assume a negative value
b. Unadjusted coefficient of determination always increases
c. Unadjusted coefficient of determination may increase or decrease
d. Adjusted coefficient of determination may increase
ANSWER: c
24. A multiple regression analysis involving three independent variables and 25 data points
results in a value of 0.769 for the unadjusted multiple coefficient of determination. Then,
the adjusted multiple coefficient of determination is:
a. 0.385
b. 0.877
c. 0.591
d. 0.736
ANSWER: d
26. For a multiple regression model, the following statistics are given: Total variation in Y =
SSY = 500, SSE = 80, and n = 25. Then, the coefficient of determination is:
a. 0.84
b. 0.16
c. 0.3125
d. 0.05
ANSWER: a
27. For a multiple regression model the following statistics are given: Total variation in Y =
SSY = 250, SSE = 50, k = 4, and n = 20. Then, the coefficient of determination adjusted
for the degrees of freedom is:
a. 0.800
b. 0.747
c. 0.840
d. 0.775
ANSWER: b
28. A multiple regression model has the form: yˆ 5.25 2 x1 6 x 2 . As x 2 increases by one
unit, holding x1 constant, then the value of y will increase by:
a. 2 units
b. 7.25 units
c. 6 units on average
d. None of the above
ANSWER: c
Multiple
Regression 171
29. The graphical depiction of the equation of a multiple regression model with k
independent variables (k > 1) is referred to as:
a. a straight line
b. response variable
c. response surface
d. a plane only when k = 3
ANSWER: c
31. If all the points for a multiple regression model with two independent variables were on
the regression plane, then the multiple coefficient of determination would equal:
a. 0
b. 1
c. 2, since there are two independent variables
d. any number between 0 and 2
ANSWER: b
32. If none of the data points for a multiple regression model with two independent variables
were on the regression plane, then the multiple coefficient of determination would be:
a. –1.0
b. 1.0
c. any number between –1 and 1, inclusive
d. any number greater than or equal to zero but smaller than 1
ANSWER: d
34. In a multiple regression model, the following statistics are given: SSE = 100, R 2 0.995 ,
k = 5, and n = 15. Then, the multiple coefficient of determination adjusted for degrees of
freedom is:
a. 0.955
b. 0.930
c. 0.900
d. 0.855
ANSWER: b
172 Chapter Seventeen
35. In a multiple regression model, the error variable is assumed to have a mean of:
a. –1.0
b. 0.0
c. 1.0
d. Any value smaller than –1.0
ANSWER: b
37. In a multiple regression model, the probability distribution of the error variable is
assumed to be:
a. normal
b. nonnormal
c. positively skewed
d. negatively skewed
ANSWER: a
38. Which of the following measures can be used to assess the multiple regression model’s
fit?
a. sum of squares for error
b. sum of squares for regression
c. standard error of estimate
d. single t-test
ANSWER: c
41. In testing the validity of a multiple regression model involving 10 independent variables
and 100 observations, the numerator and denominator degrees of freedom for the critical
value of F will be, respectively,
a. 9 and 90
b. 10 and 100
c. 9 and 10
d. 10 and 89
ANSWER: d
42. In multiple regression analysis involving 10 independent variables and 100 observations,
the critical value of t for testing individual coefficients in the model will have:
a. 100 degrees of freedom
b. 10 degrees of freedom
c. 89 degrees of freedom
d. 9 degrees of freedom
ANSWER: c
46. Most statistical software provide p-value for testing each coefficient in the multiple
regression model. In the case of b2 , this represents the probability that:
a. b2 0
b. 2 0
c. | b2 | could be this large if 2 0
d. | b2 | could be this large if 2 0
ANSWER: c
48. In testing the validity of a multiple regression model in which there are four independent
variables, the null hypothesis is:
a. H 0 : 1 2 3 4 1
b. H 0 : 0 1 2 3 4
c. H 0 : 1 2 3 4 0
d. H 0 : 0 1 2 3 4 0
ANSWER: c
Multiple
Regression 175
49. For a set of 20 data points, a statistical software listed the estimated multiple regression
equation as yˆ 8.61 22 x1 7 x 2 28 x3 , and also has listed the t statistic for testing
the significance of each regression coefficient. Using the 5% significance level for
testing whether b2 7 differs significantly from zero, the critical region will be that the
absolute value of t is greater than or equal to:
a. 1.746
b. 2.120
c. 1.337
d. 1.333
ANSWER: b
51. In a multiple regression analysis, there are 20 data points and 4 independent variables,
and the sum of the squared differences between observed and predicted values of y is
180. The multiple standard error of estimate will be:
a. 6.708
b. 3.464
c. 9.000
d. 3.000
ANSWER: b
52. A multiple regression analysis includes 4 independent variables results in sum of squares
for regression of 1200 and sum of squares for error of 800. Then, the multiple coefficient
of determination will be:
a. 0.667
b. 0.600
c. 0.400
d. 0.200
ANSWER: b
176 Chapter Seventeen
53. A multiple regression analysis includes 20 data points and 4 independent variables
produced the following statistics: Total variation in Y = SSY = 200 and SSR = 160. Then,
the multiple standard error of estimate will be:
a. 0.80
b. 3.266
c. 3.651
d. 1.633
ANSWER: d
54. In a multiple regression analysis involving 25 data points and 5 independent variables,
the sum of squares terms are calculated as Total variation in Y = SSY = 500, SSR = 300,
and SSE = 200. In testing the validity of the regression model, the F value of the test
statistic will be:
a. 5.70
b. 2.50
c. 1.50
d. 0.176
ANSWER: a
55. A multiple regression equation includes 5 independent variables, and the coefficient of
determination is 0.81. The percentage of the variation in y that is explained by the
regression equation is:
a. 81%
b. 90%
c. 86%
d. about 16%
ANSWER: a
56. In a simple linear regression problem, the following pairs of ( yi , yˆi ) are given: (6.75,
7.42), (8.96, 8.06), (10.30, 11.65), and (13.24, 12.15). Then, the sum of squares for error
is
a. 39.2500
b. -0.0300
c. 4.2695
d. 39.2800
ANSWER: c
Multiple
Regression 177
58. In a multiple regression model, the value of the coefficient of multiple determination has
to fall between
a. – 1 and + 1
b. 0 and + 1
c. – 1 and 0
d. Any pair of real numbers
ANSWER: b
59. In a multiple regression model, which of the following is correct regarding the value of
the value of R 2 adjusted for the degrees of freedom?
a. It can be negative
b. It has to be positive
c. It has to be larger than the coefficient of multiple determination
d. It can be larger than 1
ANSWER: a
60. An interaction term in a multiple regression model with two independent variables may
be used when
a. the coefficient of determination is small
b. there is a curvilinear relationship between the dependent and independent variables
c. neither one of the two independent variables contribute significantly to the regression
model
d. the relationship between x1 and y changes for differing values of x2
ANSWER: d
63. If a group of independent variables are not significant individually but are significant as a
group at a specified level of significance, this is most likely due to
a. autocorrelation
b. the presence of dummy variables
c. the absence of dummy variables
d. multicollinearity
ANSWER: d
Multiple
Regression 179
64. Multiple regression is the process of using several independent variables to predict a
number of dependent variables.
ANSWER: F
65. In multiple regression, the descriptor “multiple” refers to more than one dependent
variable.
ANSWER: F
66. For each x term in the multiple regression equation, the corresponding is referred to as
a partial regression coefficient.
ANSWER: T
67. In a multiple regression problem, the regression equation is ŷ 60.6 5.2 x1 0.75 x2 . The
estimated value for y when x1 3 and x2 4 is 48.
ANSWER: T
68. In reference to the equation yˆ 0.80 0.12 x1 0.08 x2 , the value –0.80 is the y intercept.
ANSWER: T
69. In testing the significance of a multiple regression model in which there are three
independent variables, the null hypothesis is H 0 : 1 2 3 .
ANSWER: F
71. A multiple regression equation includes 5 independent variables, and the coefficient of
determination is 0.81. Then, the percentage of the variation in y that is explained by the
regression equation is 90%.
ANSWER: F
72. In a multiple regression analysis involving 4 independent variables and 30 data points,
the number of degrees of freedom associated with the sum of squares for error, SSE, is
25.
ANSWER: T
180 Chapter Seventeen
73. In order to test the significance of a multiple regression model involving 4 independent
variables and 25 observations, the numerator and denominator degrees of freedom for the
critical value of F are 3 and 21, respectively.
ANSWER: F
75. A multiple regression analysis includes 25 data points and 4 independent variables
produces SST = 400 and SSR = 300. Then, the multiple standard error of estimate is 5.
ANSWER: F
76. Multicollinearity is present if the dependent variable is linearly related to one of the
explanatory variables.
ANSWER: F
78. A multiple regression model has the form ŷ 6.75 2.25 x1 3.5 x2 . As x1 increases by
one unit, holding x2 constant, the value of y will increase by 9 units.
ANSWER: F
81. In order to test the significance of a multiple regression model involving 5 independent
variables and 30 observations, the numerator and denominator degrees of freedom for the
critical value of F are 5 and 24, respectively.
ANSWER: T
82. In reference to the equation yˆ 0.80 0.12 x1 0.08 x2 , the value 0.12 is the average
change in y per unit change in x1 , when x2 is held constant.
ANSWER: T
83. In multiple regression, if the error sum of squares SSE equals the total variation in y, then
the value of the test statistic F is zero.
Multiple
Regression 181
ANSWER: T
84. In reference to the equation yˆ 1.86 0.51x1 0.60 x2 , the value 0.60 is the average change
in y per unit change in x2 , regardless of the value of x1 .
ANSWER: F
85. Most statistical software print a second R 2 statistic, called the coefficient of
determination adjusted for degrees of freedom, which has been adjusted to take into
account the sample size and the number of independent variables.
ANSWER: T
87. In regression analysis, the total variation in the dependent variable y, measured by
( yi y )2 , can be decomposed into two parts: the explained variation, measured by
SSR, and the unexplained variation, measured by SSE.
ANSWER: T
88. In multiple regression, a large value of the test statistic F indicates that most of the
variation in y is unexplained by the regression equation and that the model is useless. A
small value of F indicates that most of the variation in y is explained by the regression
equation and that the model is useful.
ANSWER: F
89. When an additional explanatory variable is introduced into a multiple regression model,
coefficient of multiple determination adjusted for degrees of freedom can never decrease.
ANSWER: F
90. In multiple regression analysis, when the response surface (the graphical depiction of the
regression equation) hits every single point, the sum of squares for error SSE = 0, the
standard error of estimate s = 0, and the coefficient of determination R 2 = 1.
ANSWER: T
91. In a multiple regression analysis involving k independent variables, the t-tests of the
individual coefficients allows us to determine whether i 0 (for i = 1, 2, …., k), which
tells us whether a linear relationship exists between xi and y.
ANSWER: T
92. In multiple regression analysis, the problem of multicollinearity affects the t-tests of the
individual coefficients as well as the F-test in the analysis of variance for regression,
since the F-test combines these t-tests into a single test.
ANSWER: F
182 Chapter Seventeen
93. A multiple regression model is assessed to be good if the error sum of squares SSE and
the standard error of estimate s are both small, the coefficient of multiple determination
R2 is close to 1, and the value of the test statistic F is large.
ANSWER: T
95. In multiple regression analysis, and because of a commonly occurring problem called
multicollinearity, the t-tests of the individual coefficients may indicate that some
independent variables are not linearly related to the dependent variable, when in fact they
are.
ANSWER: T
96. Multicollinearity is present when there is a high degree of correlation between the
dependent variable and any of the independent variables.
ANSWER: F
98. When an additional explanatory variable is introduced into a multiple regression model,
the coefficient of multiple determination will never decrease.
ANSWER: T
99. In regression analysis, we judge the magnitude of the standard error of estimate relative
to the values of the dependent variable, and particularly to the mean of y.
ANSWER: T
100. In calculating the standard error of the estimate, s MSE , there are(n – k – 1) degrees
of freedom, where n is the sample size and k is the number of independent variables in
the model.
ANSWER: T
101. A multiple regression is called “multiple” because it has several explanatory variables.
ANSWER: T
102. The coefficient of multiple determination measures the proportion or percentage of the
total variation in the dependent variable y that is explained by the regression plane.
ANSWER: T
Multiple
Regression 183
103. When an explanatory variable is dropped from a multiple regression model, the adjusted
coefficient of determination can increase.
ANSWER: T
104. The coefficient of multiple determination is calculated by dividing the regression sum of
squares by the total sum of squares (SSR/SST) and subtracting that value from 1
ANSWER: F
105. In a multiple regression model involving 5 independent variables, if the sum of the
squared residuals is 847 and the data set contains 40 points, then, the value of the
standard error of the estimate is 24.911.
ANSWER: F
107. When an explanatory variable is dropped from a multiple regression model, the
coefficient of multiple determination can increase.
ANSWER: F
108. Multicollinearity is a situation in which two or more of the independent variables are
highly correlated with each other.
ANSWER: T
109. You have just run a regression in which the coefficient of multiple determination is 0.78.
To determine if this indicates that the independent variables explain a significant portion
of the variation in the dependent variable, you would perform an F – test.
ANSWER: T
110. From the coefficient of multiple determination, we cannot detect the strength of the
relationship between the dependent variable y and any individual independent variable.
ANSWER: T
111. The total sum of squares (SST) in a regression model will never exceed the regression
sum of squares (SSR).
ANSWER: F
112. A regression had the following results: SST = 92.25, SSE = 34.55. It can be said that
37.45% of the variation in the dependent variable is explained by the independent
variables in the regression.
ANSWER: F
113. An interaction term in a multiple regression model involving two independent variables
may be used when the relationship between x1 and y changes for differing values of x2 .
ANSWER: T
184 Chapter Seventeen
114. Multicollinearity is present when there is a high degree of correlation between the
independent variables included in the regression model.
ANSWER: T
115. The interpretation of the slope is different in a multiple linear regression model as
compared to a simple linear regression model.
ANSWER: T
116. A multiple regression is called “multiple” because it has several data points, and multiple
dependent variables.
ANSWER: F
117. A high value of the coefficient of multiple determination significantly above 0 in multiple
regression, accompanied by insignificant t – values on all parameter estimates, very often
indicates a high correlation between independent variables in the model.
ANSWER: T
119. A regression analysis showed that SST = 112.18 and SSE = 33.65. It can be said that
70% of the variation in the dependent variable is explained by the independent variables
in the regression.
ANSWER: T
121. When an explanatory variable is dropped from a multiple regression model, the adjusted
coefficient of multiple of multiple determination can increase.
ANSWER: T
122. The parameter estimates are biased when multicollinearity is present in a multiple
regression equation.
ANSWER: F
123. In trying to obtain a model to estimate grades on a statistics test, a professor wanted to
include, among other factors, whether the person had taken the course previously. To do
this, the professor included a dummy variable in her regression that was equal to 1 if the
person had previously taken the course, and 0 otherwise. The interpretation of the
Multiple
Regression 185
coefficient associated with this dummy variable would be the average amount the repeat
students tended to be above or below non-repeaters, with all other factors the same.
ANSWER: T
124. When an additional explanatory variable is introduced into a multiple regression model,
the adjusted coefficient of multiple determination can never decrease.
ANSWER: F
125. If we have taken into account all relevant explanatory variables, the residuals from a
multiple regression should be random.
ANSWER: T
126. When an additional explanatory variable is introduced into a multiple regression model,
the coefficient of multiple determination will increase.
ANSWER: T
127. Multicollinearity will result in excessively low standard errors of the parameter estimates
reported in the regression output.
ANSWER: F
128. A multiple regression model is assessed to be perfect if the error sum of squares SSE = 0,
the standard error of estimate s = 0, the coefficient of multiple determination R 2 =1, and
the value of the test statistic F = �.
ANSWER: T
129. A multiple regression model is assessed to be poor if the error sum of squares SSE , and
the standard error of estimate s are both large, the coefficient of multiple determination
R 2 is close to 0, and the value of the test statistic F is small.
ANSWER: T
186 Chapter Seventeen
130. Consider the following statistics of a multiple regression model: Total variation in y =
SSY = 1000, SSE = 300, n = 50, and k = 4 .
a. Determine the standard error of estimate
b. Determine the multiple coefficient of determination
c. Determine the F-statistics
ANSWER:
a. s 2.582
b. R 2 70%
c. F = MSR/MSE = 26.25
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
Rejection region: | t | > t0.005,19 2.861, Test statistic: t = -2.117
Conclusion: Don’t reject the null hypothesis. No
132. The computer output for the multiple regression model y 0 1 x1 2 x 2 is shown
below. However, because of a printer malfunction some of the results are not shown.
These are indicated by the boldface letters a to i. Fill in the missing results (up to three
decimal places).
S=d R-Sq = e
ANALYSIS OF VARIANCE
Source of Variation df SS MS F
Regression 2 412 g i
Error 37 f h
Total 39 974
Multiple
Regression 187
ANSWER:
a = 25.277 b = 2.808 c = -2.367 d = 3.897 e = .423
f = 562 g = 206 h = 15.189 i = 13.5623
ANALYSIS OF VARIANCE
Source of Variation df SS MS F
Regression 3 936 312 3.477
Error 36 3230 89.722
Total 39 4166
133. {Life Expectancy Narrative} Is there enough evidence at the 10% significance level to
infer that the model is useful in predicting length of life?
ANSWER:
H 0 : 1 2 3 0
H 1 : At least one i is not equal to zero.
Rejection region: F > F0.05,3,36 = 2.84
Test statistic: F = 3.477
Conclusion: Reject the null hypothesis. Yes, there enough evidence at the 10%
significance level to infer that the model is useful in predicting length of life.
188 Chapter Seventeen
134. {Life Expectancy Narrative} Is there enough evidence at the 1% significance level to
infer that the average number of hours of exercise per week and the age at death are
linearly related?
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
Rejection region: | t | > t0.005,36 2.724
Test statistic: t = 4.068
Conclusion: Reject the null hypothesis. Yes, there enough evidence at the 1% significance
level to infer that the average number of hours of exercise per week and the age at death
are linearly related.
135. {Life Expectancy Narrative} Is there enough evidence at the 5% significance level to
infer that the cholesterol level and the age at death are negatively linearly related?
ANSWER:
H 0 : 2 0 vs. H 1 : 2 0
Rejection region: t < - t0.05,36 -1.69
Test statistic: t = -1.909
Conclusion: Reject the null hypothesis. Yes, there enough evidence at the 5% significance
level to infer that the cholesterol level and the age at death are negatively linearly related.
136. {Life Expectancy Narrative} Is there sufficient evidence at the 5% significance level to
infer that the number of points that the individual’s blood pressure exceeded the
recommended value and the age at death are negatively linearly related?
ANSWER:
H 0 : 3 0 vs. H 1 : 3 0
Rejection region: t < - t0.05,36 -1.69
Test statistic: t = -1.143
Conclusion: Don’t reject the null hypothesis. No, sufficient evidence at the 5%
significance level to infer that the number of points that the individual’s blood pressure
exceeded the recommended value and the age at death are negatively linearly related.
137. {Life Expectancy Narrative} What is the coefficient of determination? What does this
statistic tell you?
ANSWER:
R 2 0.225. This means that 22.5% of the variation in the age at death is explained by
the three variables: the average number of hours of exercise per week, the cholesterol
level, and the number of points that the individual’s blood pressure exceeded the
recommended value, while 77.5% of the variation remains unexplained.
Multiple
Regression 189
ANSWER:
b1 = 1.79. This tells us for each additional hour increase of exercise per week, the age at
death on average is extended by 1.79 years (assuming that the other independent
variables in the model are held constant).
ANSWER:
b2 = -0.021. This tells us that for each additional unit increase in the cholesterol level,
the age at death on average is shortened by .021 years or equivalently about a week
(assuming that the other independent variables in the model are held constant).
ANSWER:
b3 = 0.016. This tells us for each additional point increase of the individual’s blood
pressure that exceeded the recommended value, the age at death on average is shortened
by 0.016 years or equivalent, about six days (assuming that the other independent
variables in the model are held constant).
ANALYSIS OF VARIANCE
190 Chapter Seventeen
Source of Variation df SS MS F
Regression 3 227 75.667 3.730
Error 21 426 20.286
Total 24 653
141. {Demographic Variables and TV Narrative} Test the overall validity of the model at the
5% significance level.
ANSWER:
H 0 : 1 2 3 0
H 1 : At least one i is not equal to zero.
Rejection region: F > F0.05,3,21 = 3.07
Test statistic: F = 3.73
Conclusion: Reject the null hypothesis. The model is valid at a = .05.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
Rejection region: | t | > t0.005,21 2.831
Test statistic: t = 2.158
Conclusion: Don’t reject the null hypothesis. No, sufficient evidence at the 1%
significance level to indicate that hours of television watched and age are linearly related.
ANSWER:
H 0 : 2 0 vs. H 1 : 2 0
Rejection region: t < - t0.01,21 -2.518
Test statistic: t = -2.231
Conclusion: Don’t reject the null hypothesis. No, sufficient evidence at the 1%
significance level to indicate that hours of television watched and education are
negatively linearly related.
Multiple
Regression 191
ANSWER:
R 2 0.348. This means that 34.8% of the variation in the number of hours of television
watched per week is explained by the three variables: age, number of years of education,
and income, while 65.2% remains unexplained.
ANSWER:
b1 = 0.41. This tells us that for each additional year of age, the number of hours of
television watched per week on average increases by 0.41 (assuming that the other
independent variables in the model are held constant).
ANSWER:
b2 = -0.29. This tells us that for each additional year of education, the number of hours
of television watched per week on average decreases by 0.29 (assuming that the other
independent variables in the model are held constant).
ANSWER:
b3 = -0.12. This tells us that for each additional year of $1000 in income, the number of
hours of television watched per week on average decreases by 0.12 (assuming that the
other independent variables in the model are held constant).
ANALYSIS OF VARIANCE
Source of Variation df SS MS F
Regression 3 288 96 22.647
Error 46 195 4.239
Total 49 483
148. {Family Expenditure on Clothes Narrative} Test the overall model’s validity at the 5%
significance level
ANSWER:
H 0 : 1 2 3 0
H 1 : At least one i is not equal to zero.
Rejection region: F > F0.05,3,46 2.84
Test statistic: F = 22.647
Conclusion: Reject the null hypothesis. Yes, the model is valid at a = .05.
ANSWER:
H 0 : 1 0 vs. H 1 : 1 > 0
Rejection region: t > t0.05,46 � 1.68
Test statistic: t = 3.64
Conclusion: Reject the null hypothesis. Yes, annual household income and annual family
clothes expenditure are positively linearly related.
Multiple
Regression 193
150. {Family Expenditure on Clothes Narrative} Test at the 1% significance level to determine
whether the number of family members and annual family clothes expenditure are
linearly related.
ANSWER:
H 0 : 2 0 vs. H1 : 2 �0
Rejection region: | t | > t0.005,36 2.69
Test statistic: t = 3.207
Conclusion: Reject the null hypothesis. Yes, the number of family members and annual
family clothes expenditure are linearly related.
151. {Family Expenditure on Clothes Narrative} Test at the 1% significance level to determine
whether the number of children under 10 years of age and annual family clothes
expenditure are linearly related.
ANSWER:
H 0 : 3 0 vs. H1 : 3 �0
Rejection region: | t | > t0.005,46 � 2.69
Test statistic: t = 1.444
Conclusion: Don’t reject the null hypothesis. No sufficient evidence to conclude that the
number of children under 10 years of age and annual family clothes expenditure are
linearly related.
ANSWER:
R 2 0.596. This means that 59.6% of the variation in the annual family clothes
expenditure is explained by the three variables: annual household income, number of
family members, and number of children under 10 years of age, while 40.4% of the
variation remains unexplained.
ANSWER:
b1 = 0.091. This tells us that for each additional $1000 in annual household income, the
annual family clothes expenditure increases on average by $91, assuming that the number
of family members, and the number of children under 10 years of age in the model are
held constant.
ANSWER:
b2 = 0.93. This tells us that for each additional family member, the annual family clothes
expenditure increases on average by $930, assuming that the annual household income,
and the number of children under 10 years of age in the model are held constant.
ANSWER:
b3 = 0.26. This tells us that for each additional child under the age of 10, the annual
family clothes expenditure increases on average by $260, assuming that the number of
family members and the annual household income in the model are held constant.
ANALYSIS OF VARIANCE
Source of Variation df SS MS F
Regression 3 3716 1238.667 6.558
Error 46 8688 188.870
Total 49 12404
156. {Student’s Final Grade Narrative} What is the coefficient of determination? What does
this statistic tell you?
Multiple
Regression 195
ANSWER:
R 2 0.30. This means that 30% of the variation in the student’s final grade in statistics
is explained by the three variables: number of lectures skipped, number of late
assignments, and mid-term test grade, while 70% remains unexplained.
157. {Student’s Final Grade Narrative} Do these data provide enough evidence to conclude at
the 5% significance level that the model is useful in predicting the final mark?
ANSWER:
H 0 : 1 2 3 0
H 1 : At least one i is not equal to zero.
Rejection region: F > F0.05,3,46 2.84
Test statistic: F = 6.558
Conclusion: Reject the null hypothesis. Yes, the model is useful in predicting the final
mark.
158. {Student’s Final Grade Narrative} Do these data provide enough evidence to conclude at
the 5% significance level that the final mark and the number of skipped lectures are
linearly related?
ANSWER:
H 0 : 1 0 vs. H 1 : 1 0
Rejection region: | t | > t0.025,46 2.014
Test statistic: t = -1.916
Conclusion: Don’t reject the null hypothesis. No, enough evidence to conclude at the 5%
significance level that the final mark and the number of skipped lectures are linearly
related.
159. {Student’s Final Grade Narrative} Do these data provide enough evidence at the 5%
significance level to conclude that the final mark and the number of late assignments are
negatively linearly related?
ANSWER:
H 0 : 2 0 vs. H 1 : 2 0
Rejection region: t < - t0.05,46 -1.679
Test statistic: t = -1.035
Conclusion: Don’t reject the null hypothesis. No, enough evidence at the 5% significance
level to conclude that the final mark and the number of late assignments are negatively
linearly related.
160. {Student’s Final Grade Narrative} Do these data provide enough evidence at the 1%
significance level to conclude that the final mark and the mid-term mark are positively
linearly related?
196 Chapter Seventeen
ANSWER:
H 0 : 3 0 vs. H 1 : 3 0
Rejection region: t > t0.01,46 2.412
Test statistic: t = 4.846
Conclusion: Reject the null hypothesis. Yes, these data provide enough evidence at the
1% significance level to conclude that the final mark and the mid-term mark are
positively linearly related.
ANSWER:
b1 = -3.18. This tells us that for each additional lecture skipped, the student’s final score
on average decreases by 3.18 points, assuming that the number of late assignments, and
the mid-term test mark (out of 100) in the model are held constant.
ANSWER:
b2 = -1.17. This tells us that for each additional late assignment, the student’s final score
on average decreases by 1.17 points, assuming that the number of lectures skipped, and
the mid-term test mark (out of 100) in the model are held constant.
ANSWER:
b3 = 0.63. This tells us that for each additional mid-term test score (out of 100), the
student’s final score on average increases by 0.63 points assuming that the number of
lectures skipped, and the number of late assignments in the model are held constant.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.865
R Square 0.748
Adjusted R Square 0.726
Standard Error 5.195
Observations 50
ANOVA
df SS MS F Signif F
Regression 3605.7736 901.4434 0.0001
Residual 1214.2264 26.9828
Total 49 4820.0000
164. {Real Estate Narrative} What percentage of the variability in house size is explained by
income?
ANSWER:
74.8% of the variability in house size is explained by income
165. {Real Estate Narrative} Which of the independent variables in the model are significant
at the 2% level?
ANSWER:
Family income and family size
166. {Real Estate Narrative} Which of the following values for the level of significance is the
smallest for which all explanatory variables are significant individually: a = .01, .05, .
10, and .15?
ANSWER:
a .15
198 Chapter Seventeen
167. {Real Estate Narrative} When the builder used a simple linear regression model with
house size as the dependent variable and education as the independent variable, he
obtained an r 2 value of 23.0%. What additional percentage of the total variation in
house size has been explained by including family size and income in the multiple
regression?
ANSWER:
74.8% - 23.0% = 51.8%. This means that additional 51.8% of the total variation in house
size has been explained by including family size and income in the multiple regression.
168. {Real Estate Narrative} Which of the following values for the level of significance is the
smallest for which at least two explanatory variables are significant individually: a = .01,
.05, .10, and .15?
ANSWER:
a .01
169. {Real Estate Narrative} Which of the following values for the level of significance is the
smallest for which the regression model as a whole is significant: a = .00005, .001, .01,
and .05?
ANSWER:
a .001
170. {Real Estate Narrative} What is the predicted house size for an individual earning an
annual income of $40,000, having a family size of 4, and having 13 years of education?
ANSWER:
2488 square feet
171. {Real Estate Narrative} What minimum annual income would an individual with a family
size of 4 and 16 years of education need to attain a predicted 10,000 square foot home?
ANSWER:
$211,850
172. {Real Estate Narrative} What minimum annual income would an individual with a family
size of 9 and 10 years of education need to attain a predicted 5,000 square foot home?
ANSWER:
$44,140
173. {Real Estate Narrative} One individual in the sample had an annual income of $100,000,
a family size of 10, and an education of 16 years. This individual owned a home with an
area of 7,000 square feet. What is the residual (in hundreds of square feet) for this data
point?
Multiple
Regression 199
ANSWER:
-5.40
174. {Real Estate Narrative} One individual in the sample had an annual income of $10,000, a
family size of 1, and an education of 8 years. This individual owned a home with an area
of 1,000 square fee (House = 10.00). What is the residual (in hundreds of square feet) for
this data point?
ANSWER:
y - ŷ = 70 – 75.404 = - 5.404 or – 540.4 square feet
175. {Real Estate Narrative} Suppose the builder wants to test whether the coefficient on
income is significantly different from 0. What is the value of the relevant t – statistic?
ANSWER:
t = 3.9549
176. {Real Estate Narrative} At the 0.01 level of significance, what conclusion should the
builder draw regarding the inclusion of income in the regression model?
ANSWER:
Income is significant in explaining house size and should be included in the model
because its p value of .0003 is less than 0.01.
177. {Real Estate Narrative} Suppose the builder wants to test whether the coefficient on
education is significantly different from 0. What is the value of the relevant t – statistic?
ANSWER:
t = - 1.509
178. {Real Estate Narrative} What is the value of the calculated F test statistic that is missing
from the output for testing whether the whole regression model is significant?
ANSWER:
F = 901.4434/26.9828 = 33.408
179. {Real Estate Narrative} At the 0.01 level of significance, what conclusion should the
builder draw regarding the inclusion of education in the regression model?
ANSWER:
200 Chapter Seventeen
Education is not significant in explaining house size and should not be included in the
model because its p value of 0.1383 is larger than 0.01
180. {Real Estate Narrative} What are the regression degrees of freedom that are missing from
the output?
ANSWER:
df = 3605.7736/901.4434 = 4
181. {Real Estate Narrative} What are the residual degrees of freedom that are missing from
the output?
ANSWER:
df = 1214.2264/26.9828 = 45
182. {Real Estate Narrative} The observed value of the F – statistic is missing from the
printout. What are the numerator and denominator degrees of freedom for this F –
statistic?
ANSWER:
df = 4 for the numerator, and 45 for the denominator
183. Three predictor variables are being considered for use in a linear regression model. Given
the correlation matrix below, does it appear that multicollinearity could be a problem?
x1 x2 x3
x1 1.000
x2 0.025 1.000
x3 0.968 0.897 1.000
ANSWER:
It appears that multicollinearity could be a problem because x3 is highly correlated with
both x1 and x2 .
ANSWER:
There are several clues to the presence of multicollinearity:
Multiple
Regression 201
a. An independent variable known to be an important predictor ends up having a partial
regression coefficient that is not significant.
b. A partial regression coefficient exhibits the wrong sign.
c. When an independent variable is added or deleted, the partial regression coefficients
for the other variables change dramatically. A more practical way to identify
multicollinearity is through the examination of a correlation matrix, which is a matrix
that shows the correlation of each variable with each of the other variables. A high
correlation between two independent variables is an indication of multicollinearity.
S=d R-Sq = e
ANALYSIS OF VARIANCE
Source of Variation df SS MS F
Regression f i j l
Error g 388 k
Total h 519
ANSWER:
a = 7.125 b = 4.567 c = -1.643 d = 3.039 e = .252 f =2
g = 42 h = 44 i = 131 j = 65.5 k = 9.238 l = 7.090
ANSWER:
Multicollinearity is a condition which indicates that two or more of the independent
variables are highly correlated with each other.
187. A multiple regression equation has been developed for y = daily attendance at a
community swimming pool, x1 = temperature (degrees Fahrenheit), and x2 = weekend
versus weekday, ( x2 =1 for Saturday and Sunday, and 0 for other days of the week.) For
the regression equation shown below, interpret each partial regression coefficient:
yˆ 100 10 x1 175 x2 .
ANSWER:
The partial regression coefficient for x1 implies that, holding the day of the week
constant, a one degree Fahrenheit increase in the temperature will result in an increase of
10 in attendance. The partial regression coefficient for x2 implies that the attendance
increases by 75 people on Saturdays and Sundays (assuming a constant temperature).
Multiple
Regression 203
SECTION 4
MULTIPLE CHOICE QUESTIONS
188. If the Durbin-Watson statistic has a value close to 0, which assumptions is violated?
a. Normality of the errors
b. Independence of errors
c. Homoscedasticity
d. None of the above.
ANSWER: b
189. If the Durbin-Watson statistic d has values smaller than 2, this indicates
a. a positive first – order autocorrelation
b. a negative first – order autocorrelation
c. no first – order autocorrelation at all
d. None of the above.
ANSWER: a
190. If the Durbin-Watson statistic d has values greater than 2, this indicates
a. a positive first – order autocorrelation
b. a negative first – order autocorrelation
c. no first – order autocorrelation at all
d. None of the above.
ANSWER: b
191. If the Durbin-Watson statistic has a value close to 4, which assumption is violated?
a. Normality of the errors
b. Independence of errors
c. Homoscedasticity
d. None of the above
ANSWER: b
194. The Durbin – Watson test is used to test for positive first – order autocorrelation by
comparing its statistic value d to the critical values d L and dU available in most statistics
books. Which of the following statements is true?
a. If d < d L , we conclude that there is enough evidence to show that positive first –
order autocorrelation exists.
b. If d > d L , we conclude that there is not enough evidence to show that positive first –
order autocorrelation exists
c. If d L �d �dU , we conclude that the test is inconclusive.
d. All of the above
ANSWER: d
195. In reference to the Durbin – Watson statistic d and the critical values d L and dU , which
of the following statements is false?
a. If d > 4 - d L , we conclude that the negative first – order autocorrelation exists
b. If d < 4 - dU , we conclude that there is not enough evidence to show that negative
first – order autocorrelation exists
c. If dU �d �4 - dU , we conclude that there is no evidence of first – order
autocorrelation
d. None of the above
ANSWER: d
196. In reference to the Durbin – Watson statistic d and the critical values d L and dU , which
of the following statements is false?
a. If d < d L , we conclude that positive first – order autocorrelation exists
b. If d > dU , we conclude that there is not enough evidence to show that positive first –
order autocorrelation exists
c. If d < d L or d > 4 - d L , we conclude that there is no evidence of first – order
autocorrelation
d. None of the above
ANSWER: c
Multiple
Regression 205
198. The Durbin-Watson test allows the statistics practitioner to determine whether there is
evidence of first – order autocorrelation.
ANSWER: T
n n
201. Time series data refer to data that are gathered sequentially over a series of time periods.
ANSWER: T
202. Small values of the Durbin-Watson statistic d (d < 2) indicate a negative first – order
autocorrelation.
ANSWER: F
203. Large values of the Durbin-Watson statistic d (d > 2) indicate a positive first – order
autocorrelation.
ANSWER: F
204. If the value of the Durbin-Watson statistic d satisfies the inequality d L �d �dU , where d L
and dU are the critical values for d, then the test for positive first – order autocorrelation
is inconclusive.
ANSWER: T
205. If the value of the Durbin-Watson test statistic d satisfies the inequality d > 4 - d L is a
critical value of d, we conclude that positive first – order autocorrelation exists.
ANSWER: F
206. If the value of the Durbin-Watson test statistic d satisfies the inequalities d < d L or d > 4 -
d L , where d L and dU are the critical values of d, we conclude that positive first – order
autocorrelation exists.
ANSWER: T
206 Chapter Seventeen
ANSWER:
d L 0.86 and d U 1.27
The decision is made as follows:
If d > 4 - d L 3.14, reject the null hypothesis and conclude that negative autocorrelation
is present.
If 2.73 = 4 - d U d 4 - d L = 3.14, we say that the test is inconclusive.
If d 4 - d U 2.73, we conclude that there is no evidence of negative autocorrelation.
Since d = 1.75, we conclude that there is no evidence of negative autocorrelation.
ANSWER:
d L 1.29 and d U 1.78
The decision is made as follows:
If d < d L 1.29, reject the null hypothesis and conclude that positive autocorrelation is
present.
If 1.29 = d L d d U 1.78, we say that the test is inconclusive.
If d d U 1.78, we conclude that there is no evidence of positive autocorrelation.
Since d = 1.12, we reject the null hypothesis and conclude that positive autocorrelation is
present.
209. If the residuals in a regression analysis of time ordered data are not correlated, the value
of the Durbin-Watson d statistic should be near __________.
ANSWER:
2
Multiple
Regression 207
210. If the value of the Durbin-Watson statistic d is small (d < 2), this indicates a
__________(positive/negative) first – order autocorrelation exists.
ANSWER:
positive
211. Test the hypotheses H 0 : There is no first-order autocorrelation vs. H 1 : There is first-
order autocorrelation, given that: Durbin–Watson Statistic d = 1.89, n = 28, k = 3, and
a 0.05.
ANSWER:
d L 0.97, and d U 1.41
The decision is made as follows:
If d < d L 0.97 or d > 4 - d L 3.03, reject the null hypothesis and conclude that the
autocorrelation is present..
If 0.97 = d L d d U 1.41, or 2.59 = 4 - d U d 4 - d L 3.03, we say that the
test is inconclusive.
If 1.41 = d U d 4 - d U 2.59, we conclude that there is no evidence of
autocorrelation
Since d = 1.70, we conclude that there is no evidence of autocorrelation.
212. If the value of the Durbin-Watson statistic d is large (d > 2), this indicates a __________
(positive/negative) first – order autocorrelation exists.
ANSWER:
negative
213. To use the Durbin-Watson test to test for positive first – order autocorrelation, the null
hypothesis will be H o : __________ (there is, there is no) first – order autocorrelation.
ANSWER:
there is no
214. To use the Durbin-Watson test to test for negative first – order autocorrelation, the null
hypothesis will be H o : __________ (there is, there is no) first – order autocorrelation.
ANSWER:
there is no
ANSWER:
0 �d �4
208 Chapter Seventeen
216. Given that the Durbin-Watson test is conducted to test for positive first – order
autocorrelation with a .05 , n = 20, and there are two independent variables in the model,
the critical values for the test are d L = __________ and dU = __________, respectively.
ANSWER:
1.10 and 1.54