100% found this document useful (1 vote)
17 views75 pages

Reading 1 Multiple Regression - Answers

The document consists of a series of regression analysis questions and answers, focusing on concepts such as heteroskedasticity, multicollinearity, and statistical significance of coefficients. Each question presents a scenario involving regression equations, statistical tests, and interpretations of results. The explanations clarify the reasoning behind the correct answers, emphasizing the importance of understanding regression assumptions and their implications.

Uploaded by

r379764
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
17 views75 pages

Reading 1 Multiple Regression - Answers

The document consists of a series of regression analysis questions and answers, focusing on concepts such as heteroskedasticity, multicollinearity, and statistical significance of coefficients. Each question presents a scenario involving regression equations, statistical tests, and interpretations of results. The explanations clarify the reasoning behind the correct answers, emphasizing the importance of understanding regression assumptions and their implications.

Uploaded by

r379764
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Question #1 of 144 Question ID: 1586011

An analyst is estimating whether company sales is related to three economic variables. The
regression exhibits conditional heteroskedasticity, serial correlation, and multicollinearity. The
analyst uses White and Newey-West corrected standard errors. Which of the following is most
accurate?

The regression will still exhibit heteroskedasticity and multicollinearity, but the
A)
serial correlation problem will be solved.
The regression will still exhibit multicollinearity, but the heteroskedasticity and
B)
serial correlation problems will be solved.
The regression will still exhibit serial correlation and multicollinearity, but the
C)
heteroskedasticity problem will be solved.

Explanation

The correction mentioned solves for heteroskedasticity and serial correlation.

(Module 1.3, LOS 1.i)

Question #2 of 144 Question ID: 1471868

Consider the following estimated regression equation, with the standard errors of the slope
coefficients as noted:

Salesi = 10.0 + 1.25 R&Di + 1.0 ADVi – 2.0 COMPi + 8.0 CAPi

where the standard error for the estimated coefficient on R&D is 0.45, the standard
error for the estimated coefficient on ADV is 2.2 , the standard error for the estimated
coefficient on COMP is 0.63, and the standard error for the estimated coefficient on
CAP is 2.5.

The equation was estimated over 40 companies. Using a 5% level of significance, which of the
estimated coefficients are significantly different from zero?

A) ADV and CAP only.


B) R&D, ADV, COMP, and CAP.
C) R&D, COMP, and CAP only.

Explanation

The critical t-values for 40-4-1 = 35 degrees of freedom and a 5% level of significance are ±
2.03.

The calculated t-values are:

t for R&D = 1.25 / 0.45 = 2.777

t for ADV = 1.0/ 2.2 = 0.455

t for COMP = -2.0 / 0.63 = -3.175

t for CAP = 8.0 / 2.5 = 3.2

Therefore, R&D, COMP, and CAP are statistically significant.

(Module 1.1, LOS 1.b)

Question #3 of 144 Question ID: 1479906

Which of the following is least likely a method used to detect heteroskedasticity?

A) Scatter plot.
B) Breusch-Pagan test.
C) Breusch-Godfrey test.

Explanation

The Breusch-Godfrey test is used to detect serial correlation. The Breusch-Pagan test is a
formal test used to detect heteroskedasticity while a scatter plot can give visual clues about
presence of heteroscedasticity.

(Module 1.3, LOS 1.h)

Question #4 of 144 Question ID: 1479913


One of the main assumptions of a multiple regression model is that the variance of the
residuals is constant across all observations in the sample. A violation of the assumption is
most likely to be described as:

A) unstable remnant deviation.


B) positive serial correlation.
C) heteroskedasticity.

Explanation

Heteroskedasticity is present when the variance of the residuals is not the same across all
observations in the sample, and there are sub-samples that are more spread out than the
rest of the sample.

(Module 1.3, LOS 1.h)

Question #5 of 144 Question ID: 1479918

During the course of a multiple regression analysis, an analyst has observed several items that
she believes may render incorrect conclusions. For example, the coefficient standard errors are
too small, although the estimated coefficients are accurate. She believes that these small
standard error terms will result in the computed t-statistics being too big, resulting in too many
Type I errors. The analyst has most likely observed which of the following assumption violations
in her regression analysis?

A) Positive serial correlation.


B) Homoskedasticity.
C) Multicollinearity.

Explanation

Positive serial correlation is the condition where a positive regression error in one time
period increases the likelihood of having a positive regression error in the next time period.
The residual terms are correlated with one another, leading to coefficient error terms that
are too small.

(Module 1.3, LOS 1.i)


Question #6 - 11 of 144 Question ID: 1471914

Using the regression model developed, the closest prediction of sales for December 20x6 is:

A) $44,000.
B) $36,000.
C) $55,000.

Explanation

1.76 + 0.23 * (150) − 0.08 * (7.5) = 35.66.

(Module 1.1, LOS 1.b)

Question #7 - 11 of 144 Question ID: 1471915

Will Stumper conclude that the housing starts coefficient is statistically different from zero and
how will he interpret it at the 5% significance level:

A) not different from zero; sales will rise by $0 for every 100 house starts.
B) different from zero; sales will rise by $100 for every 23 house starts.
C) different from zero; sales will rise by $23 for every 100 house starts.

Explanation

A p-value (0.017) below significance (0.05) indicates a variable which is statistically different
from zero. The coefficient of 0.23 indicates that sales will rise by $23 for every 100 house
starts.

(Module 1.1, LOS 1.b)

Question #8 - 11 of 144 Question ID: 1585998

Is the regression coefficient of changes in mortgage interest rates different from zero at the 5
percent level of significance?

A) no, because coefficient is negative.


B) yes, because p-value < 0.05.
C) yes, because -0.08 < 0.05.

Explanation

A coefficient is statistically significantly different from zero if its p-value is less than the level
of significance.

(Module 1.1, LOS 1.b)

Question #9 - 11 of 144 Question ID: 1471917

In this multiple regression, the F-statistic indicates the:

A) degree of correlation between the independent variables.


B) deviation of the estimated values from the actual values of the dependent variable.
C) the joint significance of the independent variables.

Explanation

The F-statistic indicates the joint significance of the independent variables. The deviation of
the estimated values from the actual values of the dependent variable is the standard error
of estimate. The degree of correlation between the independent variables is the coefficient of
correlation.

(Module 1.1, LOS 1.b)

Question #10 - 11 of 144 Question ID: 1471918

The regression statistics above indicate that for the period under study, the independent
variables (housing starts, mortgage interest rate) together explained approximately what
percentage of the variation in the dependent variable (sales)?

A) 9.80.
B) 67.00.
C) 77.00.

Explanation
The question is asking for the coefficient of determination.

(Module 1.1, LOS 1.b)

Question #11 - 11 of 144 Question ID: 1585999

In this multiple regression, if Stumper discovers that the residuals exhibit positive serial
correlation, the most likely effect is:

A) standard errors are not affected but coefficient estimate is inconsistent.


B) standard errors are too low but coefficient estimate is consistent.
C) standard errors are too high but coefficient estimate is consistent.

Explanation

Positive serial correlation in residuals does not affect the consistency of coefficients (i.e., the
coefficients are still consistent) but the estimated standard errors are too low leading to
artificially high t-statistics.

(Module 1.1, LOS 1.b)

Question #12 of 144 Question ID: 1586006

Consider the following estimated regression equation:

AUTOt = 10.0 + 1.25 PIt + 1.0 TEENt – 2.0 INSt

The equation was estimated over 40 companies. The predicted value of AUTO if PI is 4, TEEN is
0.30, and INS = 0.6 is closest to:

A) 14.90.
B) 14.10.
C) 17.50.

Explanation
Predicted AUTO

= 10 + 1.25 (4) + 1.0 (0.30) – 2.0 (0.6)

= 10 + 5 + 0.3 – 1.2

= 14.10

(Module 1.2, LOS 1.f)

Question #13 - 16 of 144 Question ID: 1501597

Salve runs a regression using the squared residuals from the model using the original
dependent variables. The coefficient of determination of this model is 6%. Which of the
following is the most appropriate conclusion at a 5% level of significance?

Because the test statistic of 7.20 is higher than the critical value of 3.84, we reject
A)
the null hypothesis of no conditional heteroskedasticity in residuals.
Because the test statistic of 7.20 is lower than the critical value of 7.81, we fail to
B)
reject the null hypothesis of no conditional heteroskedasticity in residuals.
Because the test statistic of 3.60 is lower than the critical value of 3.84, we reject
C)
the null hypothesis of no conditional heteroskedasticity in residuals.

Explanation

The chi-square test statistic = n × R2 = 120 × 0.06 = 7.20.

The one-tailed critical value for a chi-square distribution with k = 3 degrees of freedom and α
of 5% is 7.81. Therefore, we should not reject the null hypothesis and conclude that we don't
have a problem with conditional heteroskedasticity.

(Module 1.3, LOS 1.h)

Question #14 - 16 of 144 Question ID: 1501598

Which of the following misspecifications is most likely to cause serial correlation in residuals?

A) Data improperly pooled.


B) Improper variable scaling.
C) Improper variable form.

Explanation

Out of the four forms of model misspecifications, serial correlation in residuals may be
caused by omission of important variables (not an answer choice) and by improper data
pooling.

(Module 1.3, LOS 1.g)

Question #15 - 16 of 144 Question ID: 1501599

Should Salve be concerned about residual serial correlation?

A) Yes, for one lag only.


B) Yes, for two lags only.
C) No.

Explanation

The BG test statistic has an F-distribution with p and n – p – k – 1 degrees of freedom, where
p = the number of lags tested. Given n = 120 and k = 3, critical F-values (5% level of
significance) are 3.92 (p = 1) and 3.08 (p = 2). BG stats in Indian Equities—Fama-French
Model are lower than the critical F-values; therefore, serial correlation does not seem to be a
problem for both lags.

(Module 1.3, LOS 1.i)

Question #16 - 16 of 144 Question ID: 1501600

Should Salve be concerned about residual multicollinearity?

A) Yes, and Salve should exclude either variable SMB or HML from the model.
B) Yes, and Salve should exclude variable Rm-Rf from the model.
C) No.
Explanation

Multicollinearity is detected using the variance inflation factor (VIF). VIF values greater than 5
(i.e., R2 > 80%) warrant further investigation, while values above 10 (i.e., R2 > 90%) indicate
severe multicollinearity. None of the variables have VIF > 5.

(Module 1.3, LOS 1.j)

Question #17 of 144 Question ID: 1479903

When constructing a regression model to predict portfolio returns, an analyst runs a regression
for the past five year period. After examining the results, she determines that an increase in
interest rates two years ago had a significant impact on portfolio results for the time of the
increase until the present. By performing a regression over two separate time periods, the
analyst would be attempting to prevent which type of misspecification?

A) Incorrectly pooling data.


B) Inappropriate variable scaling.
C) Inappropriate variable form.

Explanation

The relationship between returns and the dependent variables can change over time, so it is
critical that the data be pooled correctly. Running the regression for multiple sub-periods (in
this case two) rather than one time period can produce more accurate results.

(Module 1.3, LOS 1.g)

Question #18 - 20 of 144 Question ID: 1471907

Concerning the assumptions of multiple regression, Grimbles is:

A) correct to agree with Voiku’s list of assumptions.


incorrect to agree with Voiku’s list of assumptions because one of the assumptions
B)
is stated incorrectly.
incorrect to agree with Voiku’s list of assumptions because two of the assumptions
C)
are stated incorrectly.

Explanation

Assumption 2 is stated incorrectly. Some correlation between independent variables is


unavoidable; and high correlation results in multicollinearity. However, an exact linear
relationship between linear combinations of two or more independent variables should not
exist.

Assumption 4 is also stated incorrectly. The assumption is that the residuals are serially
uncorrelated (i.e., they are not serially correlated).

(Module 1.1, LOS 1.b)

Question #19 - 20 of 144 Question ID: 1471909

The most appropriate decision with regard to the F-statistic for testing the null hypothesis that
all of the independent variables are simultaneously equal to zero at the 5 percent significance
level is to:

reject the null hypothesis because the F-statistic is larger than the critical F-value of
A)
2.66.
reject the null hypothesis because the F-statistic is larger than the critical F-value of
B)
3.19.
fail to reject the null hypothesis because the F-statistic is smaller than the critical F-
C)
value of 2.66.

Explanation

RSS = 368.7 – 140.3 = 228.4, F-statistic = (228.4 / 3) / (140.3 / 176) = 95.51. The critical value for
a one-tailed 5% F-test with 3 and 176 degrees of freedom is 2.66. Because the F-statistic is
greater than the critical F-value, the null hypothesis that all of the independent variables are
simultaneously equal to zero should be rejected.

(Module 1.1, LOS 1.b)

Question #20 - 20 of 144 Question ID: 1585996

The multiple regression, as specified, most likely suffers from:


A) heteroskedasticity.
B) multicollinearity.
C) serial correlation of the error terms.

Explanation

The regression is highly significant (based on the F-stat in Part 3), but the individual
coefficients are not (all p-values > 0.05). This is a result of a regression with significant
multicollinearity problems.

(Module 1.1, LOS 1.b)

Question #21 of 144 Question ID: 1586002

Which of the following statements regarding the R2 is least accurate?

The R2 of a regression will be greater than or equal to the adjusted-R2 for the
A)
same regression.

B) R2 is the coefficient of determination of the regression.

The R2 is the ratio of the unexplained variation to the explained variation of the
C)
dependent variable.

Explanation

The R2 is the ratio of the explained variation to the total variation.

(Module 1.2, LOS 1.d)

Question #22 - 26 of 144 Question ID: 1472050

Using the regression model developed, the closest prediction of sales for December 20X6 is:

A) $36,000.
B) $55,000.
C) $44,000.

Explanation
1.76 + 0.23 × (150) – 0.08 × (7.5) = 35.66.

(Module 1.2, LOS 1.f)

Question #23 - 26 of 144 Question ID: 1489312

Will Jack conclude that the housing starts coefficient is statistically different from zero and how
will he interpret it at the 5% significance level?

A) Different from zero; sales will rise by $100 for every 23 house starts.
B) Not different from zero; sales will rise by $0 for every 100 house starts.
C) Different from zero; sales will rise by $23 for every 100 house starts.

Explanation

A p-value (0.017) below significance (0.05) indicates a variable that is statistically different
from zero. The coefficient of 0.23 indicates that sales will rise by $23 for every 100 house
starts. Remember the rule p-value < significance, then reject null.

(Module 1.1, LOS 1.b)

Question #24 - 26 of 144 Question ID: 1479916

In this multiple regression, the F-statistic indicates the:

A) the joint significance of the independent variables.


B) degree of correlation between the independent variables.
C) deviation of the estimated values from the actual values of the dependent variable.

Explanation

The F-statistic is for the general linear F-test to test the null hypothesis that slope coefficients
on all variables are equal to zero.

(Module 1.2, LOS 1.e)


Question #25 - 26 of 144 Question ID: 1472054

The regression statistics indicate that for the period under study, the independent variables
(housing starts, mortgage interest rate) together explain approximately what percentage of the
variation in the dependent variable (sales)?

A) 67.00.
B) 9.80.
C) 77.00.

Explanation

The question is asking for the coefficient of determination.

(Module 1.2, LOS 1.d)

Question #26 - 26 of 144 Question ID: 1484386

For this question only, assume that the regression of squared residuals on the independent
variables has R2 = 11%. At a 5% level of significance, which of the following conclusions is most
accurate?

Because the critical value is 3.84, we reject the null hypothesis of no conditional
A)
heteroskedasticity.
With a test statistic of 13.53, we can conclude the presence of conditional
B)
heteroskedasticity.
With a test statistic of 0.22, we cannot reject the null hypothesis of no conditional
C)
heteroskedasticity.

Explanation

Chi-square = n × R2 = 123 × 0.11 = 13.53. Critical Chi-square (degree of freedom = k = 2) =


5.99. Because the test statistic exceeds the critical value, we reject the null hypothesis (of no
conditional heteroskedasticity).

(Module 1.3, LOS 1.h)


Question #27 - 31 of 144 Question ID: 1636800

Which model would be a better choice for making a forecast?

A) Model TWO because it has a higher adjusted R2.

B) Model TWO because serial correlation is not a problem.

C) Model ONE because it has a higher R2.

Explanation

Model TWO has a higher adjusted R2 and thus would produce the more reliable estimates. As
is always the case when a variable is removed, R2 for Model TWO is lower. The increase in
adjusted R2 indicates that the removed variable, Q3, has very little explanatory power, and
removing it should improve the accuracy of the estimates.

(Module 1.2, LOS 1.d)

Question #28 - 31 of 144 Question ID: 1479951

Using Model ONE, what is the sales forecast for the second quarter of the next year?

A) $51.09 million.
B) $46.31 million.
C) $56.02 million.

Explanation

The estimate for the second quarter of the following year would be (in millions):

31.4083 + (−2.4631) + (24 + 2) × 0.851786 = 51.091666.

(Module 1.2, LOS 1.f)

Question #29 - 31 of 144 Question ID: 1479952

Which model misspecification is most likely to cause multicollinearity?

A) Inappropriate variable form.


B) Ommission of important variable(s).
C) Inappropriate variable scaling.

Explanation

Inappropriate variable scaling may lead to multicollinearity or heteroskedasticity in residuals.


Omission of important variable may lead to biased and inconsistent regression parameters
and also heteroskedasticity/serial correlation in residuals. Inappropriate variable form can
lead to heteroskedasticity in residuals.

(Module 1.3, LOS 1.g)

Question #30 - 31 of 144 Question ID: 1479953

If it is determined that conditional heteroskedasticity is present in model one, which of the


following inferences are most accurate?

A) Regression coefficients will be unbiased but standard errors will be biased.


B) Both the regression coefficients and the standard errors will be biased.
C) Regression coefficients will be biased but standard errors will be unbiased.

Explanation

Presence of conditional heteroskedasticity will not affect the consistency of regression


coefficients but will bias the standard errors leading to incorrect application of t-tests for
statistical significance of regression parameters.

(Module 1.3, LOS 1.h)

Question #31 - 31 of 144 Question ID: 1479955

If Mercado determines that Model TWO is the appropriate specification, then he is essentially
saying that for each year, value of sales from quarter three to four is expected to:

A) grow by more than $1,000,000.


B) remain approximately the same.
C) grow, but by less than $1,000,000.

Explanation
The specification of Model TWO essentially assumes there is no difference attributed to the
change of the season from the third to fourth quarter. However, the time trend is significant.
The trend effect for moving from one season to the next is the coefficient on TREND times
$1,000,000 which is $852,182 for Equation TWO.

(Module 1.1, LOS 1.b)

Question #32 - 34 of 144 Question ID: 1710719

What is the correct interpretation of the coefficient of closed in the first regression?

A) If a model is closed to new investors, the expected excess fund return is 1.65%.
B) A closed fund is likely to generate a return of 1.65%.
A closed fund is estimated to have an extra return of 1.65% relative to funds that
C)
are not closed.

Explanation

The interpretation of the coefficient is the extra return relative to the alternative outcome.

(Module 1.4, LOS 1.l)

Question #33 - 34 of 144 Question ID: 1710720

To check for only the outliers in the sample, Lee should most appropriately use:

A) Studentized residuals.
B) leverage.
C) Breusch-Pagan statistic.

Explanation
Outliers are extreme observations of the dependent variable. Studentized residuals are used
to identify outliers. Leverage is used to identify high-leverage observations (in the
independent variable),The Breusch-Pagan test statistic is used to identify conditional
heteroskedasticity in residuals.

(Module 1.4, LOS 1.k)

Question #34 - 34 of 144 Question ID: 1710721

Which of the following is least accurate statement about logit models?

A) Logistic regression (logit) models use log odds as the dependent variable.
B) A logit model assumes that residuals have a normal distribution.
The coefficients of the logit model are estimated using the maximum likelihood
C)
estimation methodology.

Explanation

A logit model assumes that residuals have a logistic distribution, which is similar to a normal
distribution but with fatter tails. The other statements are correct.

(Module 1.4, LOS 1.m)

Question #35 - 38 of 144 Question ID: 1501587

Regarding Sophie's statement on multiple regression:

A) only Statement 1 is correct.


B) only Statement 2 is correct.
C) both statements are correct.

Explanation
Multiple regression models can be used to identify relations between variables, forecast the
dependent variable, and test existing theories. Statement 1 is inaccurate in because it
mentions forecast independent (and not dependent) variables.

(Module 1.1, LOS 1.a)

Question #36 - 38 of 144 Question ID: 1501588

Based on the credit spread model, if an issuer gets included in the CDX index and assuming
everything else the same, which of the following statements most accurately describes the
model's forecast?

A) The credit spread on the firm’s issue would decrease by 10 bps.


B) The credit spread on the firm’s issue will increase by 32 bps.
C) The credit spread on the firm’s issue will decrease by 32 bps.

Explanation

The coefficient on the index dummy variable is –0.32, and if the variable takes a value of 1
(inclusion in the index), the credit spread would decrease by 0.32%, or 32 bps.

(Module 1.1, LOS 1.b)

Question #37 - 38 of 144 Question ID: 1501589

Which of the following is least likely an assumption of multiple linear regression?

A) The dependent variable is not serially correlated.


B) There is no linear relationship between the independent variables.
C) The error term is normally distributed.

Explanation
The assumption calls for the residual (or errors) to be not serially correlated. The dependent
variable can have serial correlation. Other assumptions are accurate.

(Module 1.1, LOS 1.c)

Question #38 - 38 of 144 Question ID: 1501590

Which assumption of multiple regression is most likely evaluated using a QQ plot?

A) Serial correlation of residuals.


B) Conditional heteroskedasticity.
C) Error term is normally distributed.

Explanation

A normal QQ plot of the residuals can visually indicate violation of the assumption that the
residuals are normally distributed.

(Module 1.1, LOS 1.c)

Question #39 of 144 Question ID: 1479867

Jason Fye, CFA, wants to check for seasonality in monthly stock returns (i.e., the January effect)
after controlling for market cap and systematic risk. The type of model that Fye would most
appropriately select is:

A) Neither multiple regression nor logistic regression.


B) Multiple regression model.
C) logistic regression model.

Explanation
Fye wants to test a theory of January effect on stock returns (dependent variable) using a
dummy (January = 1, other months = 0), market cap, and beta (independent variables). A
multiple regression model would be most appropriate. Because the dependent variable
(stock returns) is not a qualitative variable, a logistic regression would not apply.

(Module 1.1, LOS 1.a)

Question #40 of 144 Question ID: 1630876

One choice a researcher can use to test for nonstationarity is to use a:

A) Breusch-Pagan test, which uses a modified t-statistic.


B) Dickey-Fuller test, which uses a modified t-statistic.

C) Dickey-Fuller test, which uses a modified χ2 statistic.

Explanation

The Dickey-Fuller test estimates the equation (xt – xt-1) = b0 + (b1 - 1) * xt-1 + et and tests if H0:
(b1 – 1) = 0. Using a modified t-test, if it is found that (b1– 1) is not significantly different from
zero, then it is concluded that b1 must be equal to 1.0 and the series has a unit root.

(Module 1.3, LOS 1.h)

Question #41 - 45 of 144 Question ID: 1471970

According to the model and the data for the Chicago metropolitan area, the forecast of
generator sales is:

A) $65 million above the average.


B) $35.2 million above the average.
C) $55 million above average.

Explanation

The model uses a multiple regression equation to predict sales by multiplying the estimated
coefficient by the observed value to get:

[5 + (2 × 0.10) + (3 × 5) + (10 × 3) + (5 × (−3))] × $1,000,000 = $35.2 million.

(Module 1.2, LOS 1.e)


Question #42 - 45 of 144 Question ID: 1479876

Williams proceeds to test the hypothesis that none of the independent variables has significant
explanatory power. Using the joint F-test for the significance of all slope coefficients, at a 5%
level of significance:

A) all of the independent variables have explanatory power.


B) none of the independent variables has explanatory power.
C) at least one of the independent variables has explanatory power.

Explanation

From the ANOVA table, the calculated F-statistic is (mean square regression / mean square
error) = (83.80 / 28.88) = 2.9017. From the F distribution table (4 df numerator, 21 df
denominator) the critical F value is 2.84. Because 2.9017 is greater than 2.84, Williams rejects
the null hypothesis and concludes that at least one of the independent variables has
explanatory power.

(Module 1.2, LOS 1.e)

Question #43 - 45 of 144 Question ID: 1479877

With respect to testing the validity of the model's results, Williams may wish to perform:

A) both a Breusch-Godfrey test and a Breusch-Pagan test.


B) a Breusch-Pagan test, but not Breusch-Godfrey.
C) a Breusch-Godfrey test, but not a Breusch-Pagan test.

Explanation

Since the model utilized is not an autoregressive time series, a test for serial correlation is
appropriate so the Breusch-Godfrey test would be used. The Breusch-Pagan test for
heteroskedasticity would also be a good idea.

(Module 1.2, LOS 1.e)


Question #44 - 45 of 144 Question ID: 1471974

When Williams ran the model, the computer said the R2 is 0.233. She examines the other
output and concludes that this is the:

A) neither the unadjusted nor adjusted R2 value, nor the coefficient of correlation.

B) unadjusted R2 value.

C) adjusted R2 value.

Explanation

This can be answered by recognizing that the unadjusted R-square is (335.2 / 941.6) = 0.356.
Thus, the reported value must be the adjusted R2. To verify this we see that the adjusted R-
squared is: 1− ((26 − 1) / (26 − 4 − 1)) × (1 − 0.356) = 0.233. Note that whenever there is more
than one independent variable, the adjusted R2 will always be less than R2.

(Module 1.2, LOS 1.e)

Question #45 - 45 of 144 Question ID: 1471975

In preparing and using this model, Williams has least likely relied on which of the following
assumptions?

A) The residuals are homoscedastic.


B) The disturbance or error term is normally distributed.
C) There is a linear relationship between the independent variables.

Explanation

Multiple regression models assume that there is no linear relationship between two or more
of the independent variables. The other answer choices are both assumptions of multiple
regression.

(Module 1.2, LOS 1.e)

Question #46 of 144 Question ID: 1479901


A multiple regression model has included independent variables that are not linearly related to
the dependent variable. The model is most likely misspecified due to:

A) incorrect data pooling.


B) incorrect variable form.
C) incorrect variable scaling.

Explanation

Incorrect variable form misspecification occurs if the relationship between dependent and
independent variables is nonlinear.

(Module 1.3, LOS 1.g)

Question #47 of 144 Question ID: 1471870

Consider the following regression equation:

Salesi = 10.0 + 1.25 R&Di + 1.0 ADVi – 2.0 COMPi + 8.0 CAPi

where Sales is dollar sales in millions, R&D is research and development expenditures
in millions, ADV is dollar amount spent on advertising in millions, COMP is the
number of competitors in the industry, and CAP is the capital expenditures for the
period in millions of dollars.

Which of the following is NOT a correct interpretation of this regression information?

If R&D and advertising expenditures are $1 million each, there are 5 competitors,
A)
and capital expenditures are $2 million, expected Sales are $8.25 million.
If a company spends $1 million more on capital expenditures (holding everything
B)
else constant), Sales are expected to increase by $8.0 million.
One more competitor will mean $2 million less in Sales (holding everything else
C)
constant).

Explanation

Predicted sales = $10 + 1.25 + 1 – 10 + 16 = $18.25 million.

(Module 1.1, LOS 1.b)


Question #48 of 144 Question ID: 1479883

One possible problem that could jeopardize the validity of the employment growth rate model
is multicollinearity. Which of the following would most likely suggest the existence of
multicollinearity?

A) The variance of the observations has increased over time.


B) The Durbin–Watson statistic is significant.
The F-statistic suggests that the overall regression is significant, however the
C)
regression coefficients are not individually significant.

Explanation

One symptom of multicollinearity is that the regression coefficients may not be individually
statistically significant even when according to the F-statistic the overall regression is
significant. The problem of multicollinearity involves the existence of high correlation
between two or more independent variables. Clearly, as service employment rises,
construction employment must rise to facilitate the growth in these sectors. Alternatively, as
manufacturing employment rises, the service sector must grow to serve the broader
manufacturing sector.

The variance of observations suggests the possible existence of heteroskedasticity.


If the Durbin–Watson statistic may be used to test for serial correlation at a single lag.

(Module 1.2, LOS 1.f)

Question #49 of 144 Question ID: 1586005

Consider the following estimated regression equation:

Salesi = 10.0 + 1.25 R&Di + 1.0 ADVi − 2.0 COMPi + 8.0 CAPi

Sales are in millions of dollars. An analyst is given the following predictions on the independent
variables: R&D = 5, ADV = 4, COMP = 10, and CAP = 40.

The predicted level of sales is closest to:

A) $310.25 million.
B) $300.25 million.
C) $320.25 million.

Explanation
Predicted sales

= $10 + 1.25 (5) + 1.0 (4) −2.0 (10) + 8 (40)

= 10 + 6.25 + 4 − 20 + 320 = $320.25

(Module 1.2, LOS 1.f)

Question #50 of 144 Question ID: 1479902

When pooling the samples over multiple economic environments in a multiple regression
model, which of the following errors is most likely to occur?

A) Model misspecification.
B) Heteroskedasticity.
C) Multicollinearity.

Explanation

When data are improperly pooled over multiple economic environments in a multiple
regression analysis, the model would be misspecified.

(Module 1.3, LOS 1.g)

Question #51 of 144 Question ID: 1472067

An analyst runs a regression of portfolio returns on three independent variables. These


independent variables are price-to-sales (P/S), price-to-cash flow (P/CF), and price-to-book (P/B).
The analyst discovers that the p-values for each independent variable are relatively
high. However, the F-test has a very small p-value. The analyst is puzzled and tries to figure out
how the F-test can be statistically significant when the individual independent variables are not
significant. What violation of regression analysis has occurred?

A) multicollinearity.
B) conditional heteroskedasticity.
C) serial correlation.

Explanation
An indication of multicollinearity is when the independent variables individually are not
statistically significant but the F-test suggests that the variables as a whole do an excellent
job of explaining the variation in the dependent variable.

(Module 1.3, LOS 1.j)

Question #52 of 144 Question ID: 1472012

Which of the following statements regarding heteroskedasticity is least accurate?

A) Heteroskedasticity may occur in cross-sectional or time-series analyses.


Heteroskedasticity results in an estimated variance that is too small and, therefore,
B)
affects statistical inference.
C) The assumption of linear regression is that the residuals are heteroskedastic.

Explanation

The assumption of regression is that the residuals are homoskedastic (i.e., the residuals are
drawn from the same distribution).

(Module 1.3, LOS 1.h)

Question #53 of 144 Question ID: 1586003

May Jones estimated a regression that produced the following analysis of variance (ANOVA)
table:

Source Sum of squares Degrees of freedom Mean square

Regression 20 1 20

Error 80 40 2

Total 100 41

The values of R2 and the F-statistic for joint test of significance of all the slope coefficients are:

A) R2 = 0.25 and F = 0.909.

B) R2 = 0.20 and F = 10.


C) R2 = 0.25 and F = 10.

Explanation

R2 = RSS / SST = 20 / 100 = 0.20

The F-statistic is equal to the ratio of the mean squared regression to the mean squared
error.

F = 20 / 2 = 10

(Module 1.2, LOS 1.e)

Question #54 - 56 of 144 Question ID: 1479936

The predicted price of a house that has 2,000 square feet of space and 4 bedrooms is closest
to:

A) $114,000.
B) $256,000.
C) $185,000.

Explanation

66,500 + 74.30(2,000) + 10,306(4) = $256,324

(Module 1.2, LOS 1.f)

Question #55 - 56 of 144 Question ID: 1479937

The conclusion from the hypothesis test of H0: b1 = b2 = 0, is that the null hypothesis should:

A) not be rejected as the calculated F of 40.73 is greater than the critical value of 3.29.
B) be rejected as the calculated F of 40.73 is greater than the critical value of 3.33.
C) be rejected as the calculated F of 40.73 is greater than the critical value of 3.29.

Explanation
We can reject the null hypothesis that coefficients of both independent variables equal 0. The
F value for comparison is F2,29 = 3.33. The degrees of freedom in the numerator is 2; equal to
the number of independent variables. Degrees of freedom for the denominator is 32 − (2+1)
= 29. The critical value of the F-test needed to reject the null hypothesis is thus 3.33. The
actual value of the F-test statistic is 40.73, so the null hypothesis should be rejected, as the
calculated F of 40.73 is greater than the critical value of 3.33.

(Module 1.2, LOS 1.e)

Question #56 - 56 of 144 Question ID: 1479938

Which of the following is most likely to present a problem in using this regression for
forecasting?

A) Heteroskedasticity.
B) Multicollinearity.
C) Autocorrelation.

Explanation

Multicollinearity is present in a regression model when some linear combination of the


independent variables are highly correlated. We are told that the two independent variables
in this question are highly correlated. We also recognize that unconditional
heteroskedasticity is present – but this would not pose any major problems in using this
model for forecasting. No information is given about autocorrelation in residuals, but this is
generally a concern with time series data (in this case, the model uses cross-sectional data).

(Module 1.3, LOS 1.j)

Question #57 of 144 Question ID: 1471872


Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that
bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of
disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are
measured in millions of units. Hilton gathers data for the last 20 years and estimates the
following equation (standard errors in parentheses):

SALES = α + 0.004 POP + 1.031 INCOME + 2.002 ADV

(0.005) (0.337) (2.312)

The critical t-statistic for a 95% confidence level is 2.120. Which of the independent variables is
statistically different from zero at the 95% confidence level?

A) INCOME and ADV.


B) ADV only.
C) INCOME only.

Explanation

The calculated test statistic is coefficient/standard error. Hence, the t-stats are 0.8 for POP,
3.059 for INCOME, and 0.866 for ADV. Since the t-stat for INCOME is the only one greater
than the critical t-value of 2.120, only INCOME is significantly different from zero.

(Module 1.1, LOS 1.b)

Question #58 of 144 Question ID: 1489310

Jacob Warner, CFA, is evaluating a regression analysis recently published in a trade journal that
hypothesizes that the annual performance of the S&P 500 stock index can be explained by
movements in the Federal Funds rate and the U.S. Producer Price Index (PPI). Which of the
following statements regarding his analysis is most accurate?

If the p-value of a variable is less than the significance level, the null hypothesis can
A)
be rejected.
If the t-value of a variable is less than the significance level, the null hypothesis
B)
should be rejected.
If the p-value of a variable is less than the significance level, the null hypothesis
C)
cannot be rejected.

Explanation
The p-value is the smallest level of significance for which the null hypothesis can be rejected.
Therefore, for any given variable, if the p-value of a variable is less than the significance level,
the null hypothesis can be rejected and the variable is considered to be statistically
significant.

(Module 1.1, LOS 1.b)

Question #59 of 144 Question ID: 1472026

Which of the following statements regarding serial correlation that might be encountered in
regression analysis is least accurate?

A) Serial correlation occurs least often with time series data.


B) Serial correlation does not affect consistency of regression coefficients.
C) Positive serial correlation and heteroskedasticity can both lead to Type I errors.

Explanation

Serial correlation, which is sometimes referred to as autocorrelation, occurs when the


residual terms are correlated with one another, and is most frequently encountered with
time series data. Positive serial correlation can lead to standard errors that are too small,
which will cause computed t-statistics to be larger than they should be, which will lead to too
many Type I errors (i.e. the rejection of the null hypothesis when it is actually true). Serial
correlation however does not affect the consistency of the regression coefficients.

(Module 1.3, LOS 1.h)

Question #60 of 144 Question ID: 1479923

Assume that in a particular multiple regression model, it is determined that the error terms are
uncorrelated with each other. Which of the following statements is most accurate?

Serial correlation may be present in this multiple regression model, and can be
A)
confirmed only through a Durbin-Watson test.
This model is in accordance with the basic assumptions of multiple regression
B)
analysis because the errors are not serially correlated.
Unconditional heteroskedasticity present in this model should not pose a problem,
C)
but can be corrected by using robust standard errors.
Explanation

One of the basic assumptions of multiple regression analysis is that the error terms are not
correlated with each other. In other words, the error terms are not serially correlated.
Multicollinearity and heteroskedasticity are problems in multiple regression that are not
related to the correlation of the error terms.

(Module 1.3, LOS 1.i)

Question #61 of 144 Question ID: 1471980

An analyst runs a regression of monthly value-stock returns on five independent variables over
48 months. The total sum of squares is 430, and the sum of squared errors is 170. Test the null
hypothesis at the 2.5% and 5% significance level that all five of the independent variables are
equal to zero.

A) Rejected at 5% significance only.


B) Rejected at 2.5% significance and 5% significance.
C) Not rejected at 2.5% or 5.0% significance.

Explanation

The F-statistic is equal to the ratio of the mean squared regression (MSR) to the mean
squared error (MSE).

RSS = SST – SSE = 430 – 170 = 260

MSR = 260 / 5 = 52

MSE = 170 / (48 – 5 – 1) = 4.05

F = 52 / 4.05 = 12.84

The critical F-value for 5 and 42 degrees of freedom at a 5% significance level is


approximately 2.44. The critical F-value for 5 and 42 degrees of freedom at a 2.5%
significance level is approximately 2.89. Therefore, we can reject the null hypothesis at either
level of significance and conclude that at least one of the five independent variables explains
a significant portion of the variation of the dependent variable.

(Module 1.2, LOS 1.e)

Question #62 of 144 Question ID: 1479874


Wilson estimated a regression that produced the following analysis of variance (ANOVA) table:

Source Sum of squares Degrees of freedom Mean square

Regression 100 1 100.0

Error 300 40 7.5

Total 400 41

The values of R2 and the F-statistic to test the null hypothesis that slope coefficients on all
variables are equal to zero are:

A) R2 = 0.20 and F = 13.333.

B) R2 = 0.25 and F = 0.930.

C) R2 = 0.25 and F = 13.333.

Explanation

R2 = RSS / SST = 100 / 400 = 0.25

The F-statistic is equal to the ratio of the mean squared regression to the mean squared
error.

F = 100 / 7.5 = 13.333

(Module 1.2, LOS 1.e)

Question #63 - 65 of 144 Question ID: 1508634

Which of the following tests is least likely to be used to detect autocorrelation?

A) Durbin-Watson.
B) Breusch-Godfrey.
C) Breusch-Pagan.

Explanation

Durbin-Watson and Breusch-Godfrey test statistic are used to detect autocorrelation. The
Breusch-Pagan test is used to detect heteroskedasticity.

(Module 1.3, LOS 1.i)


Question #64 - 65 of 144 Question ID: 1472023

One of the most popular ways to correct heteroskedasticity is to:

A) improve the specification of the model.


B) adjust the standard errors.
C) use robust standard errors.

Explanation

Using generalized least squares and calculating robust standard errors are possible remedies
for heteroskedasticity. Improving specifications remedies serial correlation. The standard
error cannot be adjusted, only the coefficient of the standard errors.

(Module 1.3, LOS 1.h)

Question #65 - 65 of 144 Question ID: 1479933

If a regression equation shows that no individual t-tests are significant, but the F-statistic is
significant, the regression probably exhibits:

A) serial correlation.
B) multicollinearity.
C) heteroskedasticity.

Explanation

Common indicators of multicollinearity include: high correlation (>0.7) between independent


variables, no individual t-tests are significant but the F-statistic is, and signs on the
coefficients that are opposite of what is expected.

(Module 1.3, LOS 1.j)


Question #66 of 144 Question ID: 1472073

A fund has changed managers twice during the past 10 years. An analyst wishes to measure
whether either of the changes in managers has had an impact on performance. R is the return
on the fund, and M is the return on a market index. Which of the following regression
equations can appropriately measure the desired impacts?

A) The desired impact cannot be measured.


R = a + bM + c1D1 + c2D2 + ε, where D1 = 1 if the return is from the first manager,
B)
and D2 = 1 if the return is from the third manager.

R = a + bM + c1D1 + c2D2 + c3D3 + ε, where D1 = 1 if the return is from the first


C) manager, and D2 = 1 if the return is from the second manager, and D3 = 1 is the
return is from the third manager.

Explanation

The effect needs to be measured by two distinct dummy variables. The use of three variables
will cause collinearity, and the use of one dummy variable will not appropriately specify the
manager impact.

(Module 1.4, LOS 1.l)

Question #67 of 144 Question ID: 1471869

Consider the following regression equation:

Salesi = 20.5 + 1.5 R&Di + 2.5 ADVi – 3.0 COMPi

where Sales is dollar sales in millions, R&D is research and development expenditures
in millions, ADV is dollar amount spent on advertising in millions, and COMP is the
number of competitors in the industry.

Which of the following is NOT a correct interpretation of this regression information?

If R&D and advertising expenditures are $1 million each and there are 5
A)
competitors, expected sales are $9.5 million.
One more competitor will mean $3 million less in sales (holding everything else
B)
constant).
If a company spends $1 more on R&D (holding everything else constant), sales are
C)
expected to increase by $1.5 million.

Explanation

If a company spends $1 million more on R&D (holding everything else constant), sales are
expected to increase by $1.5 million. Always be aware of the units of measure for the
different variables.

(Module 1.1, LOS 1.b)

Question #68 of 144 Question ID: 1479922

Which of the following is a potential remedy for multicollinearity?

A) Add dummy variables to the regression.


B) Take first differences of the dependent variable.
C) Omit one or more of the collinear variables.

Explanation

The first differencing is not a remedy for the collinearity, nor is the inclusion of dummy
variables. The best potential remedy is to attempt to eliminate highly correlated variables.

(Module 1.3, LOS 1.i)

Question #69 of 144 Question ID: 1586007

Which of the following conditions will least likely affect the statistical inference about
regression parameters by itself?

A) Multicollinearity.
B) Unconditional heteroskedasticity.
C) Model misspecification.

Explanation
Unconditional heteroskedasticity does not impact the statistical inference concerning the
parameters. Misspecified models have inconsistent and biased regression parameters.
Multicollinearity results in unreliable estimates of regression parameters.

(Module 1.3, LOS 1.h)

Question #70 - 73 of 144 Question ID: 1685256

What is most likely represented by the Y intercept of the regression?

A) The drift of a random walk.


B) The return on a particular trading day.
C) The intercept is not a driver of returns, only the independent variables.

Explanation

The omitted variable is represented by the intercept. So, if we have four variables to
represent Monday through Thursday, the intercept would represent returns on Friday.
Remember when we want to distinguish between "n" classes we always use one less dummy
variable the number of classes (n – 1).

(Module 1.2, LOS 1.e)

Question #71 - 73 of 144 Question ID: 1479957

What can be said of the overall explanatory power of the model at the 5% significance?

A) There is no value to calendar trading.


B) There is value to calendar trading.
The coefficient of determination for the above regression is significantly higher
C) than the standard error of the estimate, and therefore there is value to calendar
trading.

Explanation

This question calls for a computation of the F-stat for all independent variables jointly. F =
(0.0039 / 4) / (0.9534 / (780 – 4 – 1) = 0.79. The critical F is somewhere between 2.37 and 2.45
so we fail to reject the null that the coefficient are equal to zero.

(Module 1.2, LOS 1.e)


Question #72 - 73 of 144 Question ID: 1472059

The test mentioned by Jessica is known as the:

A) Breusch-Pagan, which is a one-tailed test.


B) Breusch-Pagan, which is a two-tailed test.
C) Durbin-Watson, which is a two-tailed test.

Explanation

The Breusch-Pagan is used to detect conditional heteroskedasticity and it is a one-tailed test.


This is because we are only concerned about large values in the residuals coefficient of
determination.

(Module 1.3, LOS 1.h)

Question #73 - 73 of 144 Question ID: 1479958

Are Jessica and her son Jonathan correct in terms of the method used to correct for
heteroskedasticity and the likely effects?

A) Neither is correct.
B) Both are correct.
C) One is correct.

Explanation

Jessica is correct. White-corrected standard errors are also known as robust standard errors.
Jonathan is correct because for financial data, generally, White-corrected errors are higher
than the biased errors leading to lower computed t-statistics and, therefore, less frequent
rejection of the null hypothesis (remember incorrectly rejecting a true null is Type I error).

(Module 1.3, LOS 1.h)

Question #74 - 77 of 144 Question ID: 1501592

The adjusted R2 of Model 2 is closest to:


A) 0.39.
B) 0.37.
C) 0.36.

Explanation

Given n = 120 months, k = 4 (for Model 2), and R2 = 0.39:

120−1
2
R = 1 − [( ) × (1 − 0.39)] = 0.37
a
120−4−1

(Module 1.2, LOS 1.d)

Question #75 - 77 of 144 Question ID: 1501593

The model better suited for prediction is:

A) Model 1 because it has a lower Bayesian information criterion.


B) Model 2 because it has a higher Akaike information criterion.
C) Model 2 because it has a lower Akaike information criterion.

Explanation

The Akaike information criterion (AIC) is used if the goal is to have a better forecast, while the
Bayesian information criterion (BIC) is used if the goal is a better goodness of fit. Lower
values of both criteria indicate a better model. Both criteria are lower for Model 2.

(Module 1.2, LOS 1.d)

Question #76 - 77 of 144 Question ID: 1639814

The F-statistic for testing H0: coefficient of LIQ = 0 versus Ha: coefficient of LIQ ≠ 0 is closest to:

A) 13.53.
B) 5.45.
C) 2.11.

Explanation

(SSER −SSEU )/q


F =
(SSEU )/(n−k−1)

where n = 120, k = 4, and q = 1:

(38−34)/1
= 13.53
(34)/(120−4−1)

(Module 1.2, LOS 1.d)

Question #77 - 77 of 144 Question ID: 1501595

What is the predicted return for a stock using Model 1 when SMB = 3.30, HML = 1.25 and Rm-Rf
= 5?

A) 6.80%.
B) 7.88%.
C) 9.58%.

Explanation

Model 1:

Return = 1.22 + 0.23 × SMB + 0.34 × HML + 0.88 Rm-Rf

= 1.22 + 0.23 × 3.30 + 0.34 × 1.25 + 0.88 × 5

= 6.80%.

(Module 1.2, LOS 1.d)

Question #78 - 80 of 144 Question ID: 1651804

How many dummy variables should Rathod use?


A) Five.
B) Six.
C) Four.

Explanation

There are 5 trading days in a week, but we should use (n − 1) or 4 dummies in order to ensure
no violations of regression analysis occur.

(Module 1.4, LOS 1.l)

Question #79 - 80 of 144 Question ID: 1651805

What is most likely represented by the intercept of the regression?

A) The return on a particular trading day.


B) The intercept is not a driver of returns, only the independent variables.
C) The drift of a random walk.

Explanation

The omitted variable is represented by the intercept. So, if we have four variables to
represent Monday through Thursday, the intercept would represent returns on Friday.

(Module 1.4, LOS 1.l)

Question #80 - 80 of 144 Question ID: 1651806

Are Jessica and her son Jonathan, correct in terms of the method used to correct for
heteroskedasticity and the likely effects?

A) Both are correct.


B) Neither is correct.
C) One is correct.

Explanation
Jessica is correct. White-corrected standard errors are also known as robust standard errors.
Jonathan is correct because White-corrected errors are higher than the biased errors leading
to lower computed t-statistics and therefore less frequent rejection of the Null Hypothesis
(remember incorrectly rejecting a true Null is Type I error).

(Module 1.3, LOS 1.h)

Question #81 of 144 Question ID: 1471947

An analyst regresses the return of a S&P 500 index fund against the S&P 500, and also
regresses the return of an active manager against the S&P 500. The analyst uses the last five
years of data in both regressions. Without making any other assumptions, which of the
following is most accurate? The index fund:

regression should have higher sum of squares regression as a ratio to the total
A)
sum of squares.
B) should have a lower coefficient of determination.
C) should have a higher coefficient on the independent variable.

Explanation

The index fund regression should provide a higher R2 than the active manager regression. R2
is the sum of squares regression divided by the total sum of squares.

(Module 1.2, LOS 1.d)

Question #82 of 144 Question ID: 1479949

Suppose the analyst wants to add a dummy variable for whether a person has a business
college degree and an engineering degree. What is the CORRECT representation if a person has
both degrees?

Business Engineering
Degree Dummy Degree Dummy
Variable Variable

A) 0 1
B) 0 0

C) 1 1

Explanation

Assigning a zero to both categories is appropriate for someone with neither degree.
Assigning one to the business category and zero to the engineering category is appropriate
for someone with only a business degree. Assigning zero to the business category and one to
the engineering category is appropriate for someone with only an engineering degree.
Assigning a one to both categories is correct because it reflects the possession of both
degrees.

(Module 1.4, LOS 1.l)

Question #83 of 144 Question ID: 1471946

Which of the following statements regarding the R2 is least accurate?

A) The adjusted-R2 not appropriate to use in simple regression.

It is possible for the adjusted-R2 to decline as more variables are added to the
B)
multiple regression.

C) The adjusted-R2 is greater than the R2 in multiple regression.

Explanation

The adjusted-R2 will always be less than R2in multiple regression.

(Module 1.2, LOS 1.d)

Question #84 - 87 of 144 Question ID: 1471900

In regard to their conversation about the regression equation:

A) Brent’s statement is correct; Johnson’s statement is correct.


B) Brent’s statement is correct; Johnson’s statement is incorrect.
C) Brent’s statement is incorrect; Johnson’s statement is correct.
Explanation

Expected sales is the dependent variable in the equation, while expenditures for marketing
and salespeople are the independent variables. Therefore, a $1 million increase in marketing
expenditures will increase the dependent variable (expected sales) by $1.6 million. Brent's
statement is incorrect.

Johnson's statement is correct. 12.6 is the intercept in the equation, which means that if all
independent variables are equal to zero, expected sales will be $12.6 million.

(Module 1.1, LOS 1.b)

Question #85 - 87 of 144 Question ID: 1586001

Regarding Brent's Statements 1 and 2:

A) Only Statement 1 is correct.


B) Only Statement 2 is correct.
C) Both statements are correct.

Explanation

Statement 1 is correct. Comparing the formulae for computation of AIC and BIC, because
ln(n) is greater than 2 for even small sample sizes, the BIC metric imposes a higher penalty
for overfitting. Statement 2 is correct. Both AIC and BIC evaluate the quality of model fit
among competing models for the same dependent variable. Lower values indicate a better
model under either criteria. AIC is used if the goal is to have a better forecast, while BIC is
used if the goal is a better goodness of fit.

(Module 1.2, LOS 1.d)

Question #86 - 87 of 144 Question ID: 1471904

Assuming that next year's marketing expenditures are $3,500,000 and there are five
salespeople, predicted sales for Mega Flowers should will be:

A) $11,600,000.
B) $24,000,000.
C) $24,200,000.

Explanation
Using the information provided, expected sales equals 12.6 + (1.6 x 3.5) + (1.2 x 5) = $24.2
million. Remember to check the details - i.e. this equation is denominated in millions of
dollars.

(Module 1.1, LOS 1.b)

Question #87 - 87 of 144 Question ID: 1471905

Brent would like to further investigate whether at least one of the independent variables can
explain a significant portion of the variation of the dependent variable. Which of the following
methods would be best for Brent to use?

A) The F-statistic.
B) The multiple coefficient of determination.
C) An ANOVA table.

Explanation

To determine whether at least one of the coefficients is statistically significant, the calculated
F-statistic is compared with the critical F-value at the appropriate level of significance.
(Module 1.1, LOS 1.b)

Question #88 of 144 Question ID: 1479921

Alex Wade, CFA, is analyzing the result of a regression analysis comparing the performance of
gold stocks versus a broad equity market index. Wade believes that first lag serial correlation
may be present and, in order to prove his theory, should use which of the following methods to
detect its presence?

A) The Breusch-Pagan test.


B) The Durbin-Watson statistic.
C) The Hansen method.

Explanation
The Durbin-Watson statistic is the most commonly used method for the detection of serial
correlationat the first lag, although residual plots can also be utilized. For testing of serial
correlation beyond the first lag, we can instead use the Breusch-Godfrey test (but is not one
of the answer choices).

(Module 1.3, LOS 1.i)

Question #89 - 91 of 144 Question ID: 1479910

If GDP rises 2.2% and the price of fuels falls $0.15, Baltz's model will predict Company sales to
be (in $ millions) closest to:

A) $82.00.
B) $128.00.
C) $206.00.

Explanation

Sales will be closest to $78 + ($30.22 × 2.2) + [(−412.39) × (−$0.15)] = $206.34 million.

(Module 1.2, LOS 1.f)

Question #90 - 91 of 144 Question ID: 1479911

Baltz proceeds to test the hypothesis that none of the independent variables has significant
explanatory power. He concludes that, at a 5% level of significance:

all of the independent variables have explanatory power, because the calculated F-
A)
statistic exceeds its critical value.
none of the independent variables has explanatory power, because the calculated
B)
F-statistic does not exceed its critical value.
at least one of the independent variables has explanatory power, because the
C)
calculated F-statistic exceeds its critical value.

Explanation
MSE = SSE / [n − (k + 1)] = 132.12 ÷ 27 = 4.89. From the ANOVA table, the calculated F-statistic
is (mean square regression / mean square error) = 145.65 / 4.89 = 29.7853. From the F-
distribution table (2 df numerator, 27 df denominator) the F-critical value may be
interpolated to be 3.36. Because 29.7853 is greater than 3.36, Baltz rejects the null
hypothesis and concludes that at least one of the independent variables has explanatory
power.

(Module 1.2, LOS 1.e)

Question #91 - 91 of 144 Question ID: 1479912

Presence of conditional heteroskedasticity is least likely to affect the:

A) computed F-statistic.
B) coefficient estimates.
C) computed t-statistic.

Explanation

Conditional heteroskedasticity results in consistent coefficient estimates, but it biases


standard errors, affecting the computed t-statistic and F-statistic.

(Module 1.3, LOS 1.h)

Question #92 of 144 Question ID: 1471882

When interpreting the results of a multiple regression analysis, which of the following terms
represents the value of the dependent variable when the independent variables are all equal to
zero?

A) Slope coefficient.
B) p-value.
C) Intercept term.

Explanation
The intercept term is the value of the dependent variable when the independent variables
are set to zero.

(Module 1.1, LOS 1.b)

Question #93 of 144 Question ID: 1479908

An analyst is trying to determine whether fund return performance is persistent. The analyst
divides funds into three groups based on whether their return performance was in the top
third (group 1), middle third (group 2), or bottom third (group 3) during the previous year. The
manager then creates the following equation: R = a + b1D1 + b2D2 + b3D3 + ε, where R is return
premium on the fund (the return minus the return on the S&P 500 benchmark) and Di is equal
to 1 if the fund is in group i. Assuming no other information, this equation will suffer from:

A) serial correlation.
B) heteroskedasticity.
C) multicollinearity.

Explanation

When we use dummy variables, we have to use one less than the states of the world. In this
case, there are three states (groups) possible. We should have used only two dummy
variables. Multicollinearity is a problem in this case. Specifically, a linear combination of
independent variables is perfectly correlated. X1 + X2 + X3 = 1.

There are too many dummy variables specified, so the equation will suffer from
multicollinearity.

(Module 1.3, LOS 1.h)

Question #94 of 144 Question ID: 1471867


Consider the following estimated regression equation, with calculated t-statistics of the
estimates as indicated:

AUTOt = 10.0 + 1.25 PIt + 1.0 TEENt – 2.0 INSt

with a PI calculated t-statistic of 0.45, a TEEN calculated t-statistic of 2.2, and an INS
calculated t-statistic of 0.63.

The equation was estimated over 40 companies. Using a 5% level of significance, which of the
independent variables significantly different from zero?

A) TEEN only.
B) PI and INS only.
C) PI only.

Explanation

The critical t-values for 40-3-1 = 36 degrees of freedom and a 5% level of significance are ±
2.028. Therefore, only TEEN is statistically significant.

(Module 1.1, LOS 1.b)

Question #95 of 144 Question ID: 1472077

Consider the following model of earnings (EPS) regressed against dummy variables for the
quarters:

EPSt = α + β1Q1t + β2Q2t + β3Q3t

where:

EPSt is a quarterly observation of earnings per share

Q1t takes on a value of 1 if period t is the second quarter, 0 otherwise

Q2t takes on a value of 1 if period t is the third quarter, 0 otherwise

Q3t takes on a value of 1 if period t is the fourth quarter, 0 otherwise

Which of the following statements regarding this model is most accurate? The:
significance of the coefficients cannot be interpreted in the case of dummy
A)
variables.
B) EPS for the first quarter is represented by the residual.
coefficient on each dummy tells us about the difference in earnings per share
C)
between the respective quarter and the one left out (first quarter in this case).

Explanation

The coefficients on the dummy variables indicate the difference in EPS for a given quarter,
relative to the first quarter.

(Module 1.4, LOS 1.l)

Question #96 - 99 of 144 Question ID: 1471895

The percentage of the total variation in quarterly stock returns explained by the independent
variables is closest to:

A) 32%.
B) 47%.
C) 42%.

Explanation

The R2 is the percentage of variation in the dependent variable explained by the independent
variables. The R2 is equal to the SSRegression/SSTotal, where the SSTotal is equal to SSRegression
+ SSError. R2 = 126.00/ (126.00+267.00) = 32%.

(Module 1.1, LOS 1.b)

Question #97 - 99 of 144 Question ID: 1586009

Using a 5% level of significance, there is:

A) evidence of first-lag serial correlation in residuals.


B) evidence of second-lag serial correlation in residuals.
C) no evidence of serial correlation in the residuals.
Explanation

n=160, k=3. For lag 1, p = 1. (n-k-p-1 = 155). Critical F-stat (1, 155) = 3.90. BG Stat for lag 1 is
given as 3.15. We fail to reject the null of no serial correlation in residuals. For lag 2, critical F
(2,154) = 3.05. BG stat for lag 2 is given as 3.22 and exceeds the critical value and hence we
reject the null of no serial correlation at the 2nd lag.

(Module 1.3, LOS 1.i)

Question #98 - 99 of 144 Question ID: 1471897

What is the predicted quarterly stock return, given the following forecasts?

Employment growth = 2.0%


GDP growth = 1.0%
Private investment growth = -1.0%

A) 4.4%.
B) 4.7%.
C) 5.0%.

Explanation

Predicted quarterly stock return is 9.50% + (−4.50)(2.0%) + (4.20)(1.0%) + (−0.30)(−1.0%) =


5.0%.

(Module 1.1, LOS 1.b)

Question #99 - 99 of 144 Question ID: 1586010

Assuming a restricted model with all three variables removed and a 5% level of significance, the
most appropriate conclusion is:

With an F-statistic of 2.66, we fail to reject the null hypothesis of all slope
A)
coefficients equal to zero.
With an F-statistic of 24.54, we reject the null hypothesis that all the slope
B)
coefficients are equal to zero.
With an F-statistic of 0.472, we fail to reject the null hypothesis of all coefficients
C)
equal to zero.

Explanation

We are testing that all slope coefficients are equal to 0.

F = [RSS/K] / [SSE/(n-k-1)] = (126/3) / (267/156) = 42/1.711 = 24.54

Critical F(3,156) = 2.66

F-stat > critical F > reject null that all slope coefficients are equal to zero.

(Module 1.2, LOS 1.e)

Question #100 of 144 Question ID: 1479914

Which of the following questions is least likely answered by using a qualitative dependent
variable?

Based on the following company-specific financial ratios, will company ABC enter
A)
bankruptcy?
Based on the following subsidiary and competition variables, will company XYZ
B)
divest itself of a subsidiary?
Based on the following executive-specific and company-specific variables, how
C)
many shares will be acquired through the exercise of executive stock options?

Explanation

The number of shares can be a broad range of values and is, therefore, not considered a
qualitative dependent variable.

(Module 1.3, LOS 1.h)

Question #101 - 106 of 144 Question ID: 1479925

The percent of the variation in the fund's return that is explained by the regression is:

A) 66.76%.
B) 81.71%.
C) 61.78%.

Explanation

The R2 tells us how much of the change in the dependent variable is explained by the
changes in the independent variables in the regression: 0.667632.

(Module 1.2, LOS 1.d)

Question #102 - 106 of 144 Question ID: 1479926

Suppose the Breusch-Godfrey statistic is 3.22. At a 5% level of significance, which of the


following is the most accurate conclusion regarding the presence of serial correlation (at two
lags) in the residuals?

No, because the BG statistic is less than the critical test statistic of 3.49, we don't
A)
have evidence of serial correlation.
No, because the BG statistic is less than the critical test statistic of 3.55, we don't
B)
have evidence of serial correlation.
Yes, because the BG statistic exceeds the critical test statistic of 3.16, there is
C)
evidence of serial correlation.

Explanation

Number of lags tested = p = 2. The appropriate test statistic for BG test is F-stat with (p = 2)
and (n – p – k – 1 = 18) degrees of freedom. From the table, critical value = 3.55.

(Module 1.3, LOS 1.i)

Question #103 - 106 of 144 Question ID: 1616908

Gloucester subsequently revises the model to exclude the small cap index and finds that the
revised model has a RSS of 106.332. Which of the following statements is most accurate? At a
5% level of significance, the test statistic:

of 1.40 indicates that we cannot reject the hypothesis that the coefficient of small-
A)
cap index is not significantly different from 0.
of 13.39 indicates that we cannot reject the hypothesis that the coefficient of small-
B)
cap index is significantly different from 0.
of 4.35 indicates that we cannot reject the hypothesis that the coefficient of small-
C)
cap index is significantly different from 0.

Explanation

SSER = SST – RSSR = 164.9963 – 106.3320 = 58.6643

F = [(SSER – SSEU) / q] / [SSEU / (n – k – 1)] = [(58.6643 – 54.8395) / 1] / (54.8395 / 20) = 3.8248 /


2.742 = 1.40

Critical F(1, 20) = 4.35 (from Exhibit 1)

Since the test statistic is not greater than the critical value, we cannot reject the null
hypothesis that b2 = 0.

(Module 1.2, LOS 1.e)

Question #104 - 106 of 144 Question ID: 1586013

The best test for unconditional heteroskedasticity is:

A) the Breusch-Pagan test only.


B) the Breusch-Godfrey test only.
C) neither the Durbin-Watson test nor the Breusch-Pagan test.

Explanation

Breusch-Godfrey and Durbin-Watson tests are for serial correlation. The Breusch-Pagan test
is for conditional heteroskedasticity; it tests to see if the size of the independent variables
influences the size of the residuals. Although tests for unconditional heteroskedasticity exist,
they are not part of the CFA curriculum, and unconditional heteroskedasticity is generally
considered less serious than conditional heteroskedasticity.

(Module 1.3, LOS 1.i)

Question #105 - 106 of 144 Question ID: 1479929


In the month of January, if both the small and large capitalization index have a zero return, we
would expect the fund to have a return equal to:

A) 2.799.
B) 2.322.
C) 2.561.

Explanation

The forecast of the return of the fund would be the intercept plus the coefficient on the
January effect: 2.322 = -0.238214 + 2.560552.

(Module 1.2, LOS 1.f)

Question #106 - 106 of 144 Question ID: 1479930

Assuming (for this question only) that the F-test was significant but that the t-tests of the
independent variables were insignificant, this would most likely suggest:

A) multicollinearity.
B) serial correlation.
C) conditional heteroskedasticity.

Explanation

When the F-test and the t-tests conflict, multicollinearity is indicated.

(Module 1.3, LOS 1.j)

Question #107 of 144 Question ID: 1479878


Consider the following analysis of variance table:

Source Sum of Squares Df Mean Square

Regression 20 1 20

Error 80 20 4

Total 100 21

The F-statistic for a test of joint significance of all the slope coefficients is closest to:

A) 0.2.
B) 0.05.
C) 5.

Explanation

The F-statistic is equal to the ratio of the mean squared regression to the mean squared
error.

F = MSR / MSE = 20 / 4 = 5.

(Module 1.2, LOS 1.e)

Question #108 - 111 of 144 Question ID: 1471887

If the number of analysts on NGR Corp. were to double to 4, the change in the forecast of NGR
would be closest to?

A) −0.035.
B) −0.055.
C) −0.019.

Explanation

Initially, the estimate is 0.1303 = 0.043 + ln(2)(−0.027) + ln(47000000)(0.006)

Then, the estimate is 0.1116 = 0.043 + ln(4)(−0.027) + ln(47000000)(0.006)

0.1116 − 0.1303 = −0.0187, or −0.019

(Module 1.1, LOS 1.b)


Question #109 - 111 of 144 Question ID: 1585993

Based on a R2 calculated from the information in Table 2, the analyst should conclude that the
number of analysts and ln(market value) of the firm explain:

A) 18.4% of the variation in returns.


B) 84.4% of the variation in returns.
C) 15.6% of the variation in returns.

Explanation

R2 is the percentage of the variation in the dependent variable (in this case, variation of
returns) explained by the set of independent variables. R2 is calculated as follows: R2 = (SSR /
SST) = (0.103 / 0.662) = 15.6%.

(Module 1.1, LOS 1.b)

Question #110 - 111 of 144 Question ID: 1507766

What is the F-statistic for the hypothesis that all slope coefficients are not statistically
significantly different from 0? And, what can be concluded from its value at a 1% level of
significance?

A) F = 17.00, reject a hypothesis that both of the slope coefficients are equal to zero.
F = 1.97, fail to reject a hypothesis that both of the slope coefficients are equal to
B)
zero.
C) F = 5.80, reject a hypothesis that both of the slope coefficients are equal to zero.

Explanation

The F-statistic is calculated as follows: F = MSR / MSE = 0.051 / 0.003 = 17.00; and 17.00 >
4.61, which is the critical F-value for the given degrees of freedom and a 1% level of
significance. However, when F-values are in excess of 10 for a large sample like this, a table is
not needed to know that the value is significant.

(Module 1.1, LOS 1.b)


Question #111 - 111 of 144 Question ID: 1585994

Upon further analysis, Turner concludes that multicollinearity is a problem. What might have
prompted this further analysis and what is intuition behind the conclusion?

At least one of the t-statistics was not significant, the F-statistic was significant, and
A) a positive relationship between the number of analysts and the size of the firm
would be expected.
At least one of the t-statistics was not significant, the F-statistic was significant, and
B)
an intercept not significantly different from zero would be expected.
At least one of the t-statistics was not significant, the F-statistic was not significant,
C) and a positive relationship between the number of analysts and the size of the firm
would be expected.

Explanation

Multicollinearity occurs when there is a high correlation among independent variables and
may exist if there is a significant F-statistic for the fit of the regression model, but at least one
insignificant independent variable when we expect all of them to be significant. In this case
the coefficient on ln(market value) was not significant at the 1% level, but the F-statistic was
significant. It would make sense that the size of the firm, i.e., the market value, and the
number of analysts would be positively correlated.

(Module 1.1, LOS 1.b)

Question #112 of 144 Question ID: 1471871

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that
bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of
disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are
measured in millions of units. Hilton gathers data for the last 20 years. Which of the follow
regression equations correctly represents Hilton's hypothesis?

A) SALES = α x β1 POP x β2 INCOME x β3 ADV x ε.

B) SALES = α + β1 POP + β2 INCOME + β3 ADV + ε.

C) INCOME = α + β1 POP + β2 SALES + β3 ADV + ε.

Explanation
SALES is the dependent variable. POP, INCOME, and ADV should be the independent
variables (on the right hand side) of the equation (in any order). Regression equations are
additive.

(Module 1.1, LOS 1.b)

Question #113 of 144 Question ID: 1479934

A regression with three independent variables have VIF values of 3, 4, and 2 for the first,
second, and third independent variables, respectively. Which of the following conclusions is
most appropriate?

A) Total VIF of 9 indicates a serious multicollinearity problem.


B) Only variable two has a problem with multicollinearity.
C) Multicollinearity does not seem to be a problem with the model.

Explanation

Multicollinearity occurs when an independent variable is highly correlated with a linear


combination of the remaining independent variables. VIF values exceeding 5 need to be
investigated while values exceeding 10 indicate strong evidence of multicollinearity.

(Module 1.3, LOS 1.j)

Question #114 - 117 of 144 Question ID: 1479893

Which of the following is least likely to be an assumption regarding linear regression?

A) The variance of the residuals is constant.


B) The independent variable is correlated with the residuals.
C) A linear relationship exists between the dependent and independent variables.

Explanation

Although the linear regression model is fairly insensitive to minor deviations from any of
these assumptions, the independent variable is typically uncorrelated with the residuals.

(Module 1.1, LOS 1.c)


Question #115 - 117 of 144 Question ID: 1471957

Based upon the information presented in the ANOVA table, what is the coefficient of
determination?

0.839, indicating that company returns explain about 83.9% of the variability of
A)
industry returns.
0.084, indicating that the variability of industry returns explains about 8.4% of the
B)
variability of company returns.
0.916, indicating that the variability of industry returns explains about 91.6% of the
C)
variability of company returns.

Explanation

The coefficient of determination (R2) is the percentage of the total variation in the dependent
variable explained by the independent variable.

The R2 = (RSS / SS) Total = (3,257 / 3,555) = 0.916. This means that the variation of
independent variable (the airline industry) explains 91.6% of the variations in the dependent
variable (Pinnacle stock).

(Module 1.2, LOS 1.d)

Question #116 - 117 of 144 Question ID: 1479894

Based upon her analysis, Carter has derived the following regression equation: Ŷ = 1.75 +
3.25X1. The predicted value of the Y variable equals 50.50, if the:

A) predicted value of the dependent variable equals 15.


B) predicted value of the independent variable equals 15.
C) coefficient of the determination equals 15.

Explanation

Note that the easiest way to answer this question is to plug numbers into the equation.

The predicted value for Y = 1.75 + 3.25(15) = 50.50.

The variable X1 represents the independent variable.

(Module 1.2, LOS 1.f)


Question #117 - 117 of 144 Question ID: 1479895

Carter realizes that although regression analysis is a useful tool when analyzing investments,
there are certain limitations. Carter made a list of points describing limitations that Smith
Brothers equity traders should be aware of when applying her research to their investment
decisions.

Point 1: Regression residuals may be homoskedastic.


Point 2: Data from regression relationships tends to exhibit parameter instability.
Point 3: Regression residuals may exhibit autocorrelation.
Point 4: The variance of the error term may change with one or more independent
variables.

When reviewing Carter's list, one of the Smith Brothers' equity traders points out that not all of
the points describe regression analysis limitations. Which of Carter's points most accurately
describes the limitations to regression analysis?

A) Points 2, 3, and 4.
B) Points 1, 3, and 4.
C) Points 1, 2, and 3.

Explanation

One of the basic assumptions of regression analysis is that the variance of the error terms is
constant, or homoskedastic. Any violation of this assumption is called heteroskedasticity.
Therefore, Point 1 is incorrect, but Point 4 is correct because it describes conditional
heteroskedasticity, which results in unreliable estimates of standard errors. Points 2 and 3
also describe limitations of regression analysis.

(Module 1.1, LOS 1.c)

Question #118 of 144 Question ID: 1472074

The management of a large restaurant chain believes that revenue growth is dependent upon
the month of the year. Using a standard 12 month calendar, how many dummy variables must
be used in a regression model that will test whether revenue growth differs by month?

A) 11.
B) 13.
C) 12.

Explanation

The appropriate number of dummy variables is one less than the number of categories
because the intercept captures the effect of the other effect. With 12 categories (months) the
appropriate number of dummy variables is 11 = 12 – 1. If the number of dummy variables
equals the number of categories, it is possible to state any one of the independent dummy
variables in terms of the others. This is a violation of the assumption of the multiple linear
regression model that none of the independent variables are linearly related.

(Module 1.4, LOS 1.l)

Question #119 of 144 Question ID: 1472011

Consider the following graph of residuals and the regression line from a time-series regression:

These residuals exhibit the regression problem of:

A) heteroskedasticity.
B) autocorrelation.
C) homoskedasticity.

Explanation

The residuals appear to be from two different distributions over time. In the earlier periods,
the model fits rather well compared to the later periods.

(Module 1.3, LOS 1.h)


Question #120 - 122 of 144 Question ID: 1508632

Using the regression model represented in Exhibit 1, what is the predicted number of housing
starts for 20X7?

A) 1,394,420.
B) 1,751,000.
C) 1,394.

Explanation

Housing starts = 0.42 − (1 × 0.07) + (0.03 × 46.7) = 1.751 million

(Module 1.2, LOS 1.f)

Question #121 - 122 of 144 Question ID: 1685254

Which of the following statements best describes the explanatory power of the estimated
regression?

A) The independent variables explain 61.58% of the variation in housing starts.


The residual standard error of only 0.3 indicates that the regression equation is a
B)
good fit for the sample data.
The large F-statistic indicates that both independent variables help explain changes
C)
in housing starts.

Explanation

The coefficient of determination is the statistic used to identify explanatory power. This can
be calculated from the ANOVA table as 3.896/6.327 × 100 = 61.58%.

The residual standard error of 0.3 indicates that the standard deviation of the residuals is 0.3
million housing starts. Without knowledge of the data for the dependent variable it is not
possible to assess whether this is a small or a large error.

The F-statistic does not enable us to conclude on both independent variables. It only allows
us the reject the hypothesis that all regression coefficients are zero and accept the
hypothesis that at least one isn't.

(Module 1.2, LOS 1.e)


Question #122 - 122 of 144 Question ID: 1543892

Which of the following is the least appropriate statement in relation to R-square and adjusted
R-square:

Adjusted R-square is a value between 0 and 1 and can be interpreted as a


A)
percentage.
Adjusted R-square decreases when the added independent variable adds little
B)
value to the regression model.
R-square typically increases when new independent variables are added to the
C)
regression regardless of their explanatory power.

Explanation

Adjusted R-square can be negative for a large number of independent variables that have no
explanatory power. The other two statements are correct.

(Module 1.2, LOS 1.d)

Question #123 - 125 of 144 Question ID: 1479880

Using the regression model represented in Exhibit 1, what is the predicted number of housing
starts for 20X7?

A) 1,394.
B) 1,394,420.
C) 1,751,000.

Explanation

Housing starts = 0.42 – (1 × 0.07) + (0.03 × 46.7) = 1.751 million

(Module 1.2, LOS 1.e)

Question #124 - 125 of 144 Question ID: 1472031


Which of the following statements best describes the explanatory power of the estimated
regression?

The large F-statistic indicates that both independent variables help explain
A)
changes in housing starts.
The residual standard error of only 0.3 indicates that the regression equation is a
B)
good fit for the sample data.
C) The independent variables explain 61.58% of the variation in housing starts.

Explanation

The coefficient of determination is the statistic used to identify explanatory power. This can
be calculated from the ANOVA table as 3.896 / 6.327 × 100 = 61.58%.

The residual standard error of 0.3 indicates that the standard deviation of the residuals is 0.3
million housing starts. Without knowledge of the data for the dependent variable, it is not
possible to assess whether this is a small or a large error.

The F-statistic does not enable us to conclude on both independent variables. It only allows
us the reject the hypothesis that all regression coefficients are zero and accept the
hypothesis that at least one isn't.

(Module 1.2, LOS 1.d)

Question #125 - 125 of 144 Question ID: 1479882

Which of the following is the least appropriate statement in relation to R-square and adjusted
R-square?

Adjusted R-square decreases when the added independent variable adds little
A)
value to the regression model.
R-square typically increases when new independent variables are added to the
B)
regression.
Adjusted R-square can be higher than the coefficient of determination for a model
C)
with a good fit.

Explanation

Adjusted R-squared cannot exceed R-squared (or coefficient of determination) for a multiple
regression. .

(Module 1.2, LOS 1.d)


Question #126 of 144 Question ID: 1471873

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that
bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of
disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are
measured in millions of units. Hilton gathers data for the last 20 years and estimates the
following equation (standard errors in parentheses):

SALES = 0.000 + 0.004 POP + 1.031 INCOME + 2.002 ADV

(0.113) (0.005) (0.337) (2.312)

For next year, Hilton estimates the following parameters: (1) the population under 20 will be
120 million, (2) disposable income will be $300,000,000, and (3) advertising expenditures will be
$100,000,000. Based on these estimates and the regression equation, what are predicted sales
for the industry for next year?

A) $557,143,000.
B) $509,980,000.
C) $656,991,000.

Explanation

Predicted sales for next year are:

SALES = α + 0.004 (120) + 1.031 (300) + 2.002 (100) = 509,980,000.

(Module 1.1, LOS 1.b)

Question #127 of 144 Question ID: 1479919

Which of the following is least likely a method of detecting serial correlations?

A) A scatter plot of the residuals over time.


B) The Breusch-Pagan test.
C) The Breusch-Godfrey test.

Explanation
The Breusch-Pagan test is a test of the heteroskedasticity and not of serial correlation.

(Module 1.3, LOS 1.i)

Question #128 of 144 Question ID: 1479959

A high-yield bond analyst is trying to develop an equation using financial ratios to estimate the
probability of a company defaulting on its bonds. A technique that can be used to develop this
equation is:

A) logistic regression model.


B) dummy variable regression.
C) multiple linear regression adjusting for heteroskedasticity.

Explanation

The only one of the possible answers that estimates a probability of a discrete outcome is
logit or logistic modeling.

(Module 1.4, LOS 1.m)

Question #129 - 131 of 144 Question ID: 1479889

What is the expected salary (in $1,000) of a woman with 16 years of education and 10 years of
experience?

A) 59.18.
B) 65.48.
C) 54.98.

Explanation

34.98 + 1.2(16) + 0.5(10) = 59.18

(Module 1.2, LOS 1.f)


Question #130 - 131 of 144 Question ID: 1479890

If the return on the industry index is 4%, the stock's expected return would be:

A) 7.6%.
B) 9.7%.
C) 11.2%.

Explanation

Y = b0 + bX1

Y = 2.1 + 1.9(4) = 9.7%

(Module 1.2, LOS 1.f)

Question #131 - 131 of 144 Question ID: 1479891

The percentage of the variation in the stock return explained by the variation in the industry
index return is closest to:

A) 84.9%.
B) 63.2%.
C) 72.1%.

Explanation

The coefficient of determination, R2, is the square the correlation coefficient. 0.8492, = 0.721.

(Module 1.2, LOS 1.d)

Question #132 of 144 Question ID: 1471928

Which of the following statements least accurately describes one of the fundamental multiple
regression assumptions?

A) The independent variables are not random.


B) The error term is normally distributed.
The variance of the error terms is not constant (i.e., the errors are
C)
heteroskedastic).

Explanation

The variance of the error term IS assumed to be constant, resulting in errors that are
homoskedastic.

(Module 1.1, LOS 1.c)

Question #133 of 144 Question ID: 1472007

An analyst is trying to estimate the beta for a fund. The analyst estimates a regression equation
in which the fund returns are the dependent variable and the Wilshire 5000 is the independent
variable, using monthly data over the past five years. The analyst finds that the correlation
between the square of the residuals of the regression and the Wilshire 5000 is 0.2. Which of the
following is most accurate, assuming a 0.05 level of significance? There is:

no evidence that there is conditional heteroskedasticity or serial correlation in the


A)
regression equation.
evidence of serial correlation but not conditional heteroskedasticity in the
B)
regression equation.
evidence of conditional heteroskedasticity but not serial correlation in the
C)
regression equation.

Explanation

The test for conditional heteroskedasticity involves regressing the square of the residuals on
the independent variables of the regression and creating a test statistic that is n × R2, where
n is the number of observations and R2 is from the squared-residual regression. The test
statistic is distributed with a chi-squared distribution with the number of degrees of freedom
equal to the number of independent variables. For a single variable, the R2 will be equal to
the square of the correlation; so in this case, the test statistic is 60 × 0.22 = 2.4, which is less
than the chi-squared value (with one degree of freedom) of 3.84 for a p-value of 0.05. There is
no indication about serial correlation.

(Module 1.3, LOS 1.h)


Question #134 of 144 Question ID: 1471927

One of the underlying assumptions of a multiple regression is that the variance of the residuals
is constant for various levels of the independent variables. This quality is referred to as:

A) a linear relationship.
B) homoskedasticity.
C) a normal distribution.

Explanation

Homoskedasticity refers to the basic assumption of a multiple regression model that the
variance of the error terms is constant.

(Module 1.1, LOS 1.c)

Question #135 of 144 Question ID: 1472075

Jill Wentraub is an analyst with the retail industry. She is modeling a company's sales over time
and has noticed a quarterly seasonal pattern. If she includes dummy variables to represent the
seasonality component of the sales she must use:

A) one dummy variables.


B) four dummy variables.
C) three dummy variables.

Explanation

Three. Always use one less dummy variable than the number of possibilities. For a
seasonality that varies by quarters in the year, three dummy variables are needed.

(Module 1.4, LOS 1.l)

Question #136 of 144 Question ID: 1471881


Which of the following statements most accurately interprets the following regression results at
the given significance level?

Variable p-value

Intercept 0.0201

X1 0.0284

X2 0.0310

X3 0.0143

The variable X2 is statistically significantly different from zero at the 3%


A)
significance level.
The variables X1 and X2 are statistically significantly different from zero at the 2%
B)
significance level.
The variable X3 is statistically significantly different from zero at the 2% significance
C)
level.

Explanation

The p-value is the smallest level of significance for which the null hypothesis can be rejected.
An independent variable is significant if the p-value is less than the stated significance level.
In this example, X3 is the variable that has a p-value less than the stated significance level.

(Module 1.1, LOS 1.b)

Question #137 - 140 of 144 Question ID: 1479940

Sutter has detected the presence of conditional heteroskedasticity in Smith's report. This is
evidence that:

A) the error terms are correlated with each other.


the variance of the error term is correlated with the values of the independent
B)
variables.
C) two or more of the independent variables are highly correlated with each other.

Explanation
Conditional heteroskedasticity exists when the variance of the error term is correlated with
the values of the independent variables.

Multicollinearity, on the other hand, occurs when two or more of the independent variables
are highly correlated with each other. Serial correlation exists when the error terms are
correlated with each other.

(Module 1.3, LOS 1.j)

Question #138 - 140 of 144 Question ID: 1479941

Suppose there is evidence that the variance of the error term is correlated with the values of
the independent variables. The most likely effect on the statistical inferences Smith can make
from the regressions results using financial data is to commit a:

Type I error by incorrectly failing to reject the null hypothesis that the regression
A)
parameters are equal to zero.
Type II error by incorrectly failing to reject the null hypothesis that the regression
B)
parameters are equal to zero.
Type I error by incorrectly rejecting the null hypotheses that the regression
C)
parameters are equal to zero.

Explanation

One problem with conditional heteroskedasticity while working with financial data, is that the
standard errors of the parameter estimates will be too small and the t-statistics too large.
This will lead Smith to incorrectly reject the null hypothesis that the parameters are equal to
zero. In other words, Smith will incorrectly conclude that the parameters are statistically
significant when in fact they are not. This is an example of a Type I error: incorrectly rejecting
the null hypothesis when it should not be rejected.

(Module 1.3, LOS 1.h)

Question #139 - 140 of 144 Question ID: 1479942


Which of the following is most likely to indicate that two or more of the independent variables,
or linear combinations of independent variables, may be highly correlated with each other?
Unless otherwise noted, significant and insignificant mean significantly different from zero and
not significantly different from zero, respectively.

The R2 is low, the F-statistic is insignificant and the Durbin-Watson statistic is


A)
significant.

The R2 is high, the F-statistic is significant and the t-statistics on the individual
B)
slope coefficients are insignificant.

The R2 is high, the F-statistic is significant and the t-statistics on the individual slope
C)
coefficients are significant.

Explanation

Multicollinearity occurs when two or more of the independent variables, or linear


combinations of independent variables, may be highly correlated with each other. In a classic
effect of multicollinearity, the R2 is high and the F-statistic is significant, but the t-statistics on
the individual slope coefficients are insignificant.

(Module 1.3, LOS 1.j)

Question #140 - 140 of 144 Question ID: 1479943

Using the Durbin-Watson test statistic, Smith rejects the null hypothesis suggested by the test.
This is evidence that:

A) two or more of the independent variables are highly correlated with each other.
B) the error term is normally distributed.
C) the error terms are correlated with each other.

Explanation
Serial correlation (also called autocorrelation) exists when the error terms are correlated with
each other.

Multicollinearity, on the other hand, occurs when two or more of the independent variables
are highly correlated with each other. One assumption of multiple regression is that the error
term is normally distributed.

(Module 1.3, LOS 1.i)

Question #141 of 144 Question ID: 1472009

Which of the following statements regarding heteroskedasticity is least accurate?

Conditional heteroskedasticity can be detected using the Breusch-Pagan chi-


A)
square statistic.
When not related to independent variables, heteroskedasticity does not pose any
B)
major problems with the regression.
C) Heteroskedasticity only occurs in cross-sectional regressions.

Explanation

If there are shifting regimes in a time-series (e.g., change in regulation, economic


environment), it is possible to have heteroskedasticity in a time-series. Unconditional
heteroskedasticity occurs when the heteroskedasticity is not related to the level of the
independent variables. Unconditional heteroskedasticity causes no major problems with the
regression. Breusch-Pagan statistic has a chi-square distribution and can be used to detect
conditional heteroskedasticity.

(Module 1.3, LOS 1.h)

Question #142 of 144 Question ID: 1472066

When two or more of the independent variables in a multiple regression are correlated with
each other, the condition is called:

A) multicollinearity.
B) conditional heteroskedasticity.
C) serial correlation.
Explanation

Multicollinearity refers to the condition when two or more of the independent variables, or
linear combinations of the independent variables, in a multiple regression are highly
correlated with each other. This condition distorts the standard error of estimate and the
coefficient standard errors, leading to problems when conducting t-tests for statistical
significance of parameters.

(Module 1.3, LOS 1.j)

Question #143 of 144 Question ID: 1479904

Which of the following is least likely to result in misspecification of a regression model?

A) Omission of an important independent variable.


B) Inappropriate variable form.
C) Transforming a variable.

Explanation

The four types of model specification errors are: omission of an important independent
variable, inappropriate variable form, inappropriate variable scaling and data improperly
pooled. Transforming an independent variable is usually done to rectify inappropriate
variable scaling.

(Module 1.3, LOS 1.g)

Question #144 of 144 Question ID: 1471891

Which of the following statements regarding the results of a regression analysis is least
accurate? The:

slope coefficient in a multiple regression is the change in the dependent variable


A) for a one-unit change in the independent variable, holding all other variables
constant.
slope coefficient in a multiple regression is the value of the dependent variable for
B)
a given value of the independent variable.
C) slope coefficients in the multiple regression are referred to as partial betas.
Explanation

The slope coefficient is the change in the dependent variable for a one-unit change in the
independent variable.

(Module 1.1, LOS 1.b)

You might also like