0% found this document useful (0 votes)
241 views18 pages

Reading 07-Correlation and Regression

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 18

3.8.

Limitations of Regression Analysis


Although this reading has shown many of the uses of regression models for financial analysis,
regression models do have limitations. First, regression relations can change over time, just as
correlations can. This fact is known as the issue of parameter instability, and its existence
should not be surprising as the economic, tax, regulatory, political, and institutional contexts in
which financial markets operate change. Whether considering cross-sectional or time-series
regression, the analyst will probably face this issue. As one example, cross-sectional regression
relationships between stock characteristics may differ between growth-led and value-led
markets. As a second example, the time-series regression estimating the beta often yields
significantly different estimated betas depending on the time period selected. In both cross-
sectional and time-series contexts, the most common problem is sampling from more than one
population, with the challenge of identifying when doing so is an issue.

A second limitation to the use of regression results specific to investment contexts is that public
knowledge of regression relationships may negate their future usefulness. Suppose, for
example, an analyst discovers that stocks with a certain characteristic have had historically very
high returns. If other analysts discover and act upon this relationship, then the prices of stocks
with that characteristic will be bid up. The knowledge of the relationship may result in the
relation no longer holding in the future.

Finally, if the regression assumptions listed in Section 3.2 are violated, hypothesis tests and
predictions based on linear regression will not be valid. Although there are tests for violations
of regression assumptions, often uncertainty exists as to whether an assumption has been
violated. This limitation will be discussed in detail in the reading on multiple regression.

SUMMARY

◾ A scatter plot shows graphically the relationship between two variables. If the points on
the scatter plot cluster together in a straight line, the two variables have a strong linear
relation.
Cov(X ,Y )
◾ The sample correlation coefficient for two variables X and Y is r = sx sy .

◾ If two variables have a very strong linear relation, then the absolute value of their
correlation will be close to 1. If two variables have a weak linear relation, then the
absolute value of their correlation will be close to 0.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
◾ The squared value of the correlation coefficient for two variables quantifies the
percentage of the variance of one variable that is explained by the other. If the correlation
coefficient is positive, the two variables are directly related; if the correlation coefficient
is negative, the two variables are inversely related.

◾ If we have n observations for two variables, we can test whether the population
correlation between the two variables is equal to 0 by using a t-test. This test statistic has
a t-distribution with n − 2 degrees of freedom if the null hypothesis of 0 correlation is
true.

◾ Even one outlier can greatly affect the correlation between two variables. Analysts should
examine a scatter plot for the variables to determine whether outliers might affect a
particular correlation.

◾ Correlations can be spurious in the sense of misleadingly pointing toward associations


between variables.

◾ The dependent variable in a linear regression is the variable that the regression model
tries to explain. The independent variables are the variables that a regression model uses
to explain the dependent variable.

◾ If there is one independent variable in a linear regression and there are n observations on
the dependent and independent variables, the regression model is Yi = b0 + b1Xi + εi, i = 1,
…, n, where Yi is the dependent variable, Xi is the independent variable, and εi is the error
term. In this model, the coefficient b0 is the intercept. The intercept is the predicted value
of the dependent variable when the independent variable has a value of zero. In this
model, the coefficient b1 is the slope of the regression line. If the value of the independent
variable increases by one unit, then the model predicts that the value of the dependent
variable will increase by b1 units.

◾ The assumptions of the classic normal linear regression model are the following:

• A linear relation exists between the dependent variable and the independent
variable.

• The independent variable is not random.

• The expected value of the error term is 0.

• The variance of the error term is the same for all observations (homoskedasticity).

• The error term is uncorrelated across observations.

• The error term is normally distributed.

◾ The estimated parameters in a linear regression model minimize the sum of the squared
regression residuals.

◾ The standard error of estimate measures how well the regression model fits the data. If
the SEE is small, the model fits well.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

◾ The coefficient of determination measures the fraction of the total variation in the
dependent variable that is explained by the independent variable. In a linear regression
with one independent variable, the simplest way to compute the coefficient of
determination is to square the correlation of the dependent and independent variables.

◾ To calculate a confidence interval for an estimated regression coefficient, we must know


the standard error of the estimated coefficient and the critical value for the t-distribution
at the chosen level of significance, tc.

◾ To test whether the population value of a regression coefficient, b1, is equal to a particular

hypothesized value, B1, we must know the estimated coefficient, b 1 , the standard error of
the estimated coefficient, s‸ , and the critical value for the t-distribution at the chosen
b1

level of significance, tc. The test statistic for this hypothesis is ( b 1 − B1 ) /s‸ . If the
b1
absolute value of this statistic is greater than tc, then we reject the null hypothesis that b1
= B1.

‸ ‸
◾ In the regression model Yi = b0 + b1Xi + εi, if we know the estimated parameters, b 0 and b 1
, for any value of the independent variable, X, then the predicted value of the dependent
‸ ‸ ‸
variable Y is Y = b 0 + b 1 X.

◾ The prediction interval for a regression equation for a particular predicted value of the

dependent variable is Y ± tc sf where sf is the square root of the estimated variance of the
prediction error and tc is the critical level for the t-statistic at the chosen significance
level. This computation specifies a (1 − α) percent confidence interval. For example, if α
= 0.05, then this computation yields a 95 percent confidence interval.

REFERENCES

Buetow, Gerald W., Robert R. Johnson, and David E. Runkle. 2000. “The Inconsistency of
Returns-Based Style Analysis.” Journal of Portfolio Management 26 (3): 61–77.

Campbell, John Y., Karine Serfaty-de Medeiros, and Luis M. Viceira. 2010. “Global Currency
Hedging.” Journal of Finance 65 (1): 87–121.

Chan, Louis K. C., Stephen G. Dimmock, and Josef Lakonishok. 2009. “Benchmarking Money
Manager Performance: Issues and Evidence.” Review of Financial Studies 22 (11): 4553–99.

Daniel, Wayne W., and James C. Terrell. 1995. Business Statistics for Management and
Economics. 7th ed. Boston: Houghton-Mifflin.

Dybvig, Philip H., and Stephen A. Ross. 1985a. “Differential Information and Performance
Measurement Using a Security Market Line.” Journal of Finance 40 (2): 383–99.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

Dybvig, Philip H., and Stephen A. Ross. 1985b. “The Analytics of Performance Measurement
Using a Security Market Line.” Journal of Finance 40 (2): 401–16.

Genre, Veronique, Geoff Kenny, Aidan Meyler, and Allan Timmermann. 2013. “Combining
expert forecasts: Can anything beat the simple average?” International Journal of Forecasting
29 (1): 108–21.

Greene, William H. 2018. Economic Analysis. 8th ed. Upper Saddle River, NJ: Prentice-Hall.

Keane, Michael P., and David E. Runkle. 1990. “Testing the Rationality of Price Forecasts:
New Evidence from Panel Data.” American Economic Review 80 (4): 714–35.

Nelson, David C., Robert B. Moskow, Tiffany Lee, and Gregg Valentine. 2003. Food Investor’s
Handbook. New York: Credit Suisse First Boston.

Sharpe, William F. 1992. “Asset Allocation: Management Style and Performance


Measurement.” Journal of Portfolio Management 18 (2): 7–19.

Sonkin, Paul D., and Paul Johnson. 2017. Pitch the Perfect Investment. New York: Wiley.

PRACTICE PROBLEMS
© 2016 CFA Institute. All rights reserved.

1. The following table shows the sample correlations between the monthly returns for four
different mutual funds and the S&P 500. The correlations are based on 36 monthly
observations. The funds are as follows:

Fund 1 Large-cap fund


Fund 2 Mid-cap fund
Fund 3 Large-cap value fund
Fund 4 Emerging markets fund
S&P 500 US domestic stock index

Fund 1 Fund 2 Fund 3 Fund 4 S&P 500


Fund 1 1
Fund 2 0.9231 1
Fund 3 0.4771 0.4156 1
Fund 4 0.7111 0.7238 0.3102 1
S&P 500 0.8277 0.8223 0.5791 0.7515 1

Test the null hypothesis that each of these correlations, individually, is equal to zero
against the alternative hypothesis that it is not equal to zero. Use a 5 percent significance
level.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

2. Julie Moon is an energy analyst examining electricity, oil, and natural gas consumption in
different regions over different seasons. She ran a regression explaining the variation in
energy consumption as a function of temperature. The total variation of the dependent
variable was 140.58, the explained variation was 60.16, and the unexplained variation
was 80.42. She had 60 monthly observations.

A. Compute the coefficient of determination.

B. What was the sample correlation between energy consumption and temperature?

C. Compute the standard error of the estimate of Moon’s regression model.

D. Compute the sample standard deviation of monthly energy consumption.

3. You are examining the results of a regression estimation that attempts to explain the unit
sales growth of a business you are researching. The analysis of variance output for the
regression is given in the table below. The regression was based on five observations (n =
5).

ANOVA df SS MSS F Significance F


Regression 1 88.0 88.0 36.667 0.00904
Residual 3 7.2 2.4
Total 4 95.2

A. How many independent variables are in the regression to which the ANOVA
refers?

B. Define Total SS.

C. Calculate the sample variance of the dependent variable using information in the
above table.

D. Define Regression SS and explain how its value of 88 is obtained in terms of other
quantities reported in the above table.

E. What hypothesis does the F-statistic test?

F. Explain how the value of the F-statistic of 36.667 is obtained in terms of other
quantities reported in the above table.

G. Is the F-test significant at the 5 percent significance level?

4. An economist collected the monthly returns for KDL’s portfolio and a diversified stock
index. The data collected are shown below:

Month Portfolio Return (%) Index Return (%)


1 1.11 −0.59
2 72.10 64.90
3 5.12 4.81
4 1.01 1.68
5 −1.72 −4.97

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

Month Portfolio Return (%) Index Return (%)


6 4.06 −2.06

The economist calculated the correlation between the two returns and found it to be
0.996. The regression results with the KDL return as the dependent variable and the index
return as the independent variable are given as follows:

Regression Statistics
Multiple R 0.996
R-squared 0.992
Standard error 2.861
Observations 6

ANOVA df SS MSS F Significance F


Regression 1 4101.62 4101.62 500.79 0
Residual 4 32.76 8.19
Total 5 4134.38

Coefficients Standard Error t-Statistic p-Value


Intercept 2.252 1.274 1.768 0.1518
Slope 1.069 0.0477 22.379 0

When reviewing the results, Andrea Fusilier suspected that they were unreliable. She
found that the returns for Month 2 should have been 7.21 percent and 6.49 percent,
instead of the large values shown in the first table. Correcting these values resulted in a
revised correlation of 0.824 and the revised regression results shown as follows:

Regression Statistics
Multiple R 0.824
R-squared 0.678
Standard error 2.062
Observations 6

ANOVA df SS MSS F Significance F


Regression 1 35.89 35.89 8.44 0.044
Residual 4 17.01 4.25
Total 5 52.91

Coefficients Standard Error t-Statistic p-Value


Intercept 2.242 0.863 2.597 0.060
Slope 0.623 0.214 2.905 0.044

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

Explain how the bad data affected the results.

The following information relates to Questions 5–10


Kenneth McCoin, CFA, is a fairly tough interviewer. Last year, he handed each job applicant a
sheet of paper with the information in the following table, and he then asked several questions
about regression analysis. Some of McCoin’s questions, along with a sample of the answers he
received to each, are given below. McCoin told the applicants that the independent variable is
the ratio of net income to sales for restaurants with a market cap of more than $100 million and
the dependent variable is the ratio of cash flow from operations to sales for those restaurants.
Which of the choices provided is the best answer to each of McCoin’s questions?

Regression Statistics
Multiple R 0.8623
R-squared 0.7436
Standard error 0.0213
Observations 24

ANOVA df SS MSS F Significance F


Regression 1 0.029 0.029000 63.81 0
Residual 22 0.010 0.000455
Total 23 0.040

Coefficients Standard Error t-Statistic p-Value


Intercept 0.077 0.007 11.328 0
Slope 0.826 0.103 7.988 0

5. What is the value of the coefficient of determination?

A. 0.8261.

B. 0.7436.

C. 0.8623.

6. Suppose that you deleted several of the observations that had small residual values. If you
re-estimated the regression equation using this reduced sample, what would likely happen
to the standard error of the estimate and the R-squared?

Standard Error of the Estimate R-Squared


A Decrease Decrease
B Decrease Increase
C Increase Decrease

7. What is the correlation between X and Y?

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

A. −0.7436.

B. 0.7436.

C. 0.8623.

8. Where did the F-value in the ANOVA table come from?

A. You look up the F-value in a table. The F depends on the numerator and
denominator degrees of freedom.

B. Divide the “Mean Square” for the regression by the “Mean Square” of the
residuals.

C. The F-value is equal to the reciprocal of the t-value for the slope coefficient.

9. If the ratio of net income to sales for a restaurant is 5 percent, what is the predicted ratio
of cash flow from operations to sales?

A. 0.007 + 0.103(5.0) = 0.524.

B. 0.077 − 0.826(5.0) = −4.054.

C. 0.077 + 0.826(5.0) = 4.207.

10. Is the relationship between the ratio of cash flow to operations and the ratio of net income
to sales significant at the 5 percent level?

A. No, because the R-squared is greater than 0.05.

B. No, because the p-values of the intercept and slope are less than 0.05.

C. Yes, because the p-values for F and t for the slope coefficient are less than 0.05.

The following information relates to Questions 11–16


Howard Golub, CFA, is preparing to write a research report on Stellar Energy Corp. common
stock. One of the world’s largest companies, Stellar is in the business of refining and marketing
oil. As part of his analysis, Golub wants to evaluate the sensitivity of the stock’s returns to
various economic factors. For example, a client recently asked Golub whether the price of
Stellar Energy Corporation stock has tended to rise following increases in retail energy prices.
Golub believes the association between the two variables to be negative, but he does not know
the strength of the association.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

Golub directs his assistant, Jill Batten, to study the relationships between Stellar monthly
common stock returns versus the previous month’s percent change in the US Consumer Price
Index for Energy (CPIENG), and Stellar monthly common stock returns versus the previous
month’s percent change in the US Producer Price Index for Crude Energy Materials (PPICEM).
Golub wants Batten to run both a correlation and a linear regression analysis. In response,
Batten compiles the summary statistics shown in Exhibit 1 for the 248 months between January
1980 and August 2000. All of the data are in decimal form, where 0.01 indicates a 1 percent
return. Batten also runs a regression analysis using Stellar monthly returns as the dependent
variable and the monthly change in CPIENG as the independent variable. Exhibit 2 displays the
results of this regression model.

Exhibit 1. Descriptive Statistics

Lagged Monthly
Monthly Return Stellar Change
Common Stock CPIENG PPICEM
Mean 0.0123 0.0023 0.0042
Standard Deviation 0.0717 0.0160 0.0534

Covariance, Stellar vs.


−0.00017
CPIENG
Covariance, Stellar vs.
−0.00048
PPICEM
Covariance, CPIENG vs.
0.00044
PPICEM
Correlation, Stellar vs.
−0.1452
CPIENG

Exhibit 2. Regression Analysis with CPIENG

Regression Statistics
Multiple R 0.1452
R-squared 0.0211
Standard error of the estimate 0.0710
Observations 248

Coefficients Standard Error t-Statistic


Intercept 0.0138 0.0046 3.0275
Slope coefficient −0.6486 0.2818 −2.3014

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

11. Batten wants to determine whether the sample correlation between the Stellar and
CPIENG variables (−0.1452) is statistically significant. The critical value for the test
statistic at the 0.05 level of significance is approximately 1.96. Batten should conclude
that the statistical relationship between Stellar and CPIENG is:

A. significant, because the calculated test statistic has a lower absolute value than the
critical value for the test statistic.

B. significant, because the calculated test statistic has a higher absolute value than the
critical value for the test statistic.

C. not significant, because the calculated test statistic has a higher absolute value than
the critical value for the test statistic.

12. Did Batten’s regression analyze cross-sectional or time-series data, and what was the
expected value of the error term from that regression?

Data Type Expected Value of Error Term


A Time-series 0
B Time-series εi
C Cross-sectional 0

13. Based on the regression, which used data in decimal form, if the CPIENG decreases by
1.0 percent, what is the expected return on Stellar common stock during the next period?

A. 0.0073 (0.73 percent).

B. 0.0138 (1.38 percent).

C. 0.0203 (2.03 percent).

14. Based on Batten’s regression model, the coefficient of determination indicates that:

A. Stellar’s returns explain 2.11 percent of the variability in CPIENG.

B. Stellar’s returns explain 14.52 percent of the variability in CPIENG.

C. Changes in CPIENG explain 2.11 percent of the variability in Stellar’s returns.

15. For Batten’s regression model, the standard error of the estimate shows that the standard
deviation of:

A. the residuals from the regression is 0.0710.

B. values estimated from the regression is 0.0710.

C. Stellar’s observed common stock returns is 0.0710.

16. For the analysis run by Batten, which of the following is an incorrect conclusion from the
regression output?

A. The estimated intercept coefficient from Batten’s regression is statistically


significant at the 0.05 level.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
CFA全新考季资料免费获取(含CFA高清网课)
(随考季更新,长期有效)

扫码关注以下微信公众号 2018-2019年最新CFA一级二级考点汇总中文版根据 备考CFA的8大最有效资料和工具


回复【资料】即可免费获取全套资源! CFA最新考纲编写,比看notes还有效率/2018-2019年 教材/notes/核心词汇手册/考纲及解析手册/计算
此活动永久有效!资料会常年实时更 泽稷网校CFA视频音频课程及指南/史上最全的学霸学 器讲解、历年全真模拟题/真题/道德手册/
新!绝对全面! 渣党CFA考经笔记分享 QuickSheet/CFA小白入门指南等等

PDF






【CFA万人微信群】
需要加入我们CFA全球考友微信群的请添加CFA菌的微信号:374208596,备注需要加哪些群~或直接扫左方
CFA菌菌二维码即可~
所有人均先加入CFA全球考友总群再根据您的需求加入其他分群~
(201 年12月 ,201 年6 考 ,一级、二级、三级分群、上海、北京、成都、深圳、海外等分
群)!
备考资料、学霸考经、考试资讯免费共享!交流、答疑、互助应有尽有!快来加入我们吧!群数量太多,文件中
只是部分展示~有困难的话可以随时咨询我哦!
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

B. In the month after the CPIENG declines, Stellar’s common stock is expected to
exhibit a positive return.

C. Viewed in combination, the slope and intercept coefficients from Batten’s


regression are not statistically significant at the 0.05 level.

The following information relates to Questions 17–26


Anh Liu is an analyst researching whether a company’s debt burden affects investors’ decision
to short the company’s stock. She calculates the short interest ratio (the ratio of short interest to
average daily share volume, expressed in days) for 50 companies as of the end of 2016 and
compares this ratio with the companies’ debt ratio (the ratio of total liabilities to total assets,
expressed in decimal form).

Liu provides a number of statistics in Exhibit 1. She also estimates a simple regression to
investigate the effect of the debt ratio on a company’s short interest ratio. The results of this
simple regression, including the analysis of variance (ANOVA), are shown in Exhibit 2.

In addition to estimating a regression equation, Liu graphs the 50 observations using a


scatterplot, with the short interest ratio on the vertical axis and the debt ratio on the horizontal
axis.

Exhibit 1. Summary Statistics

Debt Ratio Short Interest Ratio


Statistic Xi Yi
Sum 19.8550 192.3000
Average 0.3971 3.8460
n n
Sum of squared
deviations from the ∑ (Xi − X
¯¯¯)2 = 2.2225 ∑ (Yi − Y
¯¯¯)2 = 412.2042
i=1 i=1
mean
n
Sum of cross-products
of deviations from the ∑ (Xi − X
¯¯¯) (Yi − Y
¯¯¯) = −9.2430
i=1
mean

Exhibit 2. Regression of the Short Interest Ratio on the Debt Ratio

Degrees of Freedom Sum of Squares Mean Square


ANOVA (df) (SS) (MS)
Regression 1 38.4404 38.4404
Residual 48 373.7638 7.7867

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

Degrees of Freedom Sum of Squares Mean Square


ANOVA (df) (SS) (MS)
Total 49 412.2042

Regression Statistics
Multiple R 0.3054
2
R 0.0933
Standard error of estimate 2.7905
Observations 50

Coefficients Standard Error t-Statistic


Intercept 5.4975 0.8416 6.5322
Debt ratio –4.1589 1.8718 –2.2219

Liu is considering three interpretations of these results for her report on the relationship
between debt ratios and short interest ratios:

Interpretation 1 Companies’ higher debt ratios cause lower short interest ratios.

Interpretation 2 Companies’ higher short interest ratios cause higher debt ratios.

Interpretation 3 Companies with higher debt ratios tend to have lower short interest
ratios.

She is especially interested in using her estimation results to predict the short interest ratio for
MQD Corporation, which has a debt ratio of 0.40.

17. Based on Exhibits 1 and 2, if Liu were to graph the 50 observations, the scatterplot
summarizing this relation would be best described as:

A. horizontal.

B. upward sloping.

C. downward sloping.

18. Based on Exhibit 1, the sample covariance is closest to:

A. −9.2430.

B. −0.1886.

C. 8.4123.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

19. Based on Exhibit 1, the correlation between the debt ratio and the short interest ratio is
closest to:

A. −0.3054.

B. 0.0933.

C. 0.3054.

20. Which of the interpretations best describes Liu’s findings for her report?

A. Interpretation 1

B. Interpretation 2

C. Interpretation 3

21. The dependent variable in Liu’s regression analysis is the:

A. intercept.

B. debt ratio.

C. short interest ratio.

22. Based on Exhibit 2, the degrees of freedom for the t-test of the slope coefficient in this
regression are:

A. 48.

B. 49.

C. 50.

23. The upper bound for the 95% confidence interval for the coefficient on the debt ratio in
the regression is closest to:

A. −1.0199.

B. −0.3947.

C. 1.4528.

24. Which of the following should Liu conclude from these results shown in Exhibit 2?

A. The average short interest ratio is 5.4975.

B. The estimated slope coefficient is statistically significant at the 0.05 level.

C. The debt ratio explains 30.54% of the variation in the short interest ratio.

25. Based on Exhibit 2, the short interest ratio expected for MQD Corporation is closest to:

A. 3.8339.

B. 5.4975.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

C. 6.2462.

26. Based on Liu’s regression results in Exhibit 2, the F-statistic for testing whether the slope
coefficient is equal to zero is closest to:

A. −2.2219.

B. 3.5036.

C. 4.9367.

SOLUTIONS
1. The critical t-value for n − 2 = 34 df, using a 5 percent significance level and a two-tailed
test, is 2.032. First, take the smallest correlation in the table, the correlation between Fund
3 and Fund 4, and see if it is significantly different from zero. Its calculated t-value is

r√n − 2 0.3102√36 − 2
t= = = 1.903
√1 − r2 √1 − 0.31022

This correlation is not significantly different from zero. If we take the next lowest
correlation, between Fund 2 and Fund 3, this correlation of 0.4156 has a calculated
t-value of 2.664. So this correlation is significantly different from zero at the 5 percent
level of significance. All of the other correlations in the table (besides the 0.3102) are
greater than 0.4156, so they too are significantly different from zero.

2.

A. The coefficient of determination is

Explained variation 60.16


= = 0.4279
Total variation 140.58

B. For a linear regression with one independent variable, the absolute value of
correlation between the independent variable and the dependent variable equals the
square root of the coefficient of determination, so the correlation is √0.4279 =
0.6542. (The correlation will have the same sign as the slope coefficient.)

C. The standard error of the estimate is

⎛ ⎞
2 1/2
‸ ‸
⎜ n (Y − − b 1 Xi ) ⎟
⎜ ⎟
b 0
⎜∑ ⎟
i

⎜ ⎟
1/2
=( )
Unexplained variation
⎜ ⎟
⎜ i=1 n−2 ⎟
⎜ ⎟
n−2

⎝ ⎠

= √ 60−2 = 1.178
80.42

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

D. The sample variance of the dependent variable is

n ¯¯¯)2
(Yi − Y Total variation 140.58
∑ = = = 2.3827
i=1
n−1 n−1 60 − 1

The sample standard deviation is √2.3827 = 1.544.

3.

A. The degrees of freedom for the regression is the number of slope parameters in the
regression, which is the same as the number of independent variables in the
regression. Because regression df = 1, we conclude that there is one independent
variable in the regression.

B. Total SS is the sum of the squared deviations of the dependent variable Y about its
mean.

C. The sample variance of the dependent variable is the total SS divided by its degrees
of freedom (n − 1 = 5 − 1 = 4 as given). Thus the sample variance of the dependent
variable is 95.2/4 = 23.8.

D. The Regression SS is the part of total sum of squares explained by the regression.
Regression SS equals the sum of the squared differences between predicted values
2

of the Y and the sample mean of Y: ∑ (Y i − Y ¯¯¯) . In terms of other values in
n

i=1
the table, Regression SS is equal to Total SS minus Residual SS: 95.2 − 7.2 = 88.

E. The F-statistic tests whether all the slope coefficients in a linear regression are
equal to 0.

F. The calculated value of F in the table is equal to the Regression MSS divided by
the Residual MSS: 88/2.4 = 36.667.

G. Yes. The significance of 0.00904 given in the table is the p-value of the test (the
smallest level at which we can reject the null hypothesis). This value of 0.00904 is
less than the specified significance level of 0.05, so we reject the null hypothesis.
The regression equation has significant explanatory power.

4. The Month 2 data point is an outlier, lying far away from the other data values. Because
this outlier was caused by a data entry error, correcting the outlier improves the validity
and reliability of the regression. In this case, the true correlation is reduced from 0.996 to
0.824. The revised R-squared is substantially lower (0.678 versus 0.992). The
significance of the regression is also lower, as can be seen in the decline of the F-value
from 500.79 to 8.44 and the decline in the t-statistic of the slope coefficient from 22.379
to 2.905.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可,不得复制或传播此图书的任何部分。违者将被起诉。

The total sum of squares and regression sum of squares were greatly exaggerated in the
incorrect analysis. With the correction, the slope coefficient changes from 1.069 to 0.623.
This change is important. When the index moves up or down, the original model indicates
that the portfolio return goes up or down by 1.069 times as much, while the revised model
indicates that the portfolio return goes up or down by only 0.623 times as much. In this
example, incorrect data entry caused the outlier. Had it been a valid observation, not
caused by a data error, then the analyst would have had to decide whether the results were
more reliable including or excluding the outlier.

5. B is correct. The coefficient of determination is the same as R-squared.

6. C is correct. Deleting observations with small residuals will degrade the strength of the
regression, resulting in an increase in the standard error and a decrease in R-squared.

7. C is correct. For a regression with one independent variable, the correlation is the same as
the Multiple R with the sign of the slope coefficient. Because the slope coefficient is
positive, the correlation is 0.8623.

8. B is correct. This answer describes the calculation of the F-statistic.

9. C is correct. To make a prediction using the regression model, multiply the slope
coefficient by the forecast of the independent variable and add the result to the intercept.

10. C is correct. The p-value is the smallest level of significance at which the null hypotheses
concerning the slope coefficient can be rejected. In this case the p-value is less than 0.05,
and thus the regression of the ratio of cash flow from operations to sales on the ratio of
net income to sales is significant at the 5 percent level.

11. B is correct because the calculated test statistic is


r√n−2
t=
√1−r2
−0.1452√248−2
= = −2.3017
√1−(−0.1452)2

Because the absolute value of t = −2.3017 is greater than 1.96, the correlation coefficient
is statistically significant. For a regression with one independent variable, the t-value (and
significance) for the slope coefficient (which is −2.3014) should equal the t-value (and
significance) of the correlation coefficient. The slight difference between these two
t-values is caused by rounding error.

12. A is correct because the data are time series, and the expected value of the error term, E
(ε), is 0.

13. C is correct. From the regression equation, Expected return = 0.0138 + −0.6486(−0.01) =
0.0138 + 0.006486 = 0.0203, or 2.03 percent.

14. C is correct. R-squared is the coefficient of determination. In this case, it shows that 2.11
percent of the variability in Stellar’s returns is explained by changes in CPIENG.

15. A is correct, because the standard error of the estimate is the standard deviation of the
regression residuals.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
16. C is the correct response, because it is a false statement. The slope and intercept are both
statistically significant.

17. C is correct because the slope coefficient (Exhibit 2) and the cross-product (Exhibit 1) are
negative.

18. B is correct. The sample covariance is calculated as


n
∑(Xi −X
¯¯¯)(Yi −Y
¯¯¯)
i=1
n−1
= −9.2430 ÷ 49 = −0.1886.

19. A is correct. The correlation coefficient equals the covariance between variables X and Y
divided by the product of the standard deviations of variables X and Y, as follows:
−9.2430
49 −0.1886327
= 0.2130×2.9004
= −0.3054.
√ 2.2225 √ 412.2042
49 49

20. C is correct. Conclusions cannot be drawn regarding causation, only about association.

21. C is correct. Liu explains the short interest ratio using the debt ratio.

22. A is correct. The degrees of freedom are the number of observations minus the number of
parameters estimated, which equals two in this case (the intercept and the slope
coefficient). The number of degrees of freedom is 50 − 2 = 48.

23. B is correct. The calculation for the confidence interval is −4.1589 ± (2.011 × 1.8718).
The upper bound is −0.3947. The 2.011 is the critical t-value for the 5% level of
significance (2.5% in one tail) for 48 degrees of freedom.

24. B is correct. The t-statistic is −2.2219, which is outside of the bounds created by the
critical t-values of ± 2.011 for a two-tailed test with a 5% significance level. The 2.011 is
the critical t-value for the 5% level of significance (2.5% in one tail) for 48 degrees of
freedom.

25. A is correct because Predicted value = 5.4975 + (−4.1589 × 0.40) = 5.4975 − 1.6636 =
3.8339.
Mean regression sum of squares 38.4404
26. C is correct because F = Mean squared error
= 7.7867
= 4.9367.

NOTES
1Examples in this reading were updated in 2014 by Professor Sanjiv Sabherwal of the University of Texas,
Arlington.

2Later, we show that variables with a correlation of 0 can have a strong nonlinear relation.

3The use of n − 1 in the denominator is a technical point; it ensures that the sample covariance is an unbiased
estimate of population covariance.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9

You might also like