0% found this document useful (0 votes)

289 views18 pages

Reading 07-Correlation and Regression

Uploaded by

杨坡

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

289 views18 pages

Reading 07-Correlation and Regression

Uploaded by

杨坡

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

3.8.

Limitations of Regression Analysis

Although this reading has shown many of the uses of regression models for financial analysis,
regression models do have limitations. First, regression relations can change over time, just as
correlations can. This fact is known as the issue of parameter instability, and its existence
should not be surprising as the economic, tax, regulatory, political, and institutional contexts in
which financial markets operate change. Whether considering cross-sectional or time-series
regression, the analyst will probably face this issue. As one example, cross-sectional regression
relationships between stock characteristics may differ between growth-led and value-led
markets. As a second example, the time-series regression estimating the beta often yields
significantly different estimated betas depending on the time period selected. In both cross-
sectional and time-series contexts, the most common problem is sampling from more than one
population, with the challenge of identifying when doing so is an issue.

A second limitation to the use of regression results specific to investment contexts is that public
knowledge of regression relationships may negate their future usefulness. Suppose, for
example, an analyst discovers that stocks with a certain characteristic have had historically very
high returns. If other analysts discover and act upon this relationship, then the prices of stocks
with that characteristic will be bid up. The knowledge of the relationship may result in the
relation no longer holding in the future.

Finally, if the regression assumptions listed in Section 3.2 are violated, hypothesis tests and
predictions based on linear regression will not be valid. Although there are tests for violations
of regression assumptions, often uncertainty exists as to whether an assumption has been
violated. This limitation will be discussed in detail in the reading on multiple regression.

SUMMARY

◾ A scatter plot shows graphically the relationship between two variables. If the points on
the scatter plot cluster together in a straight line, the two variables have a strong linear
relation.
Cov(X ,Y )
◾ The sample correlation coefficient for two variables X and Y is r = sx sy .

◾ If two variables have a very strong linear relation, then the absolute value of their
correlation will be close to 1. If two variables have a weak linear relation, then the
absolute value of their correlation will be close to 0.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
◾ The squared value of the correlation coefficient for two variables quantifies the
percentage of the variance of one variable that is explained by the other. If the correlation
coefficient is positive, the two variables are directly related; if the correlation coefficient
is negative, the two variables are inversely related.

◾ If we have n observations for two variables, we can test whether the population
correlation between the two variables is equal to 0 by using a t-test. This test statistic has
a t-distribution with n − 2 degrees of freedom if the null hypothesis of 0 correlation is
true.

◾ Even one outlier can greatly affect the correlation between two variables. Analysts should
examine a scatter plot for the variables to determine whether outliers might affect a
particular correlation.

◾ Correlations can be spurious in the sense of misleadingly pointing toward associations

between variables.

◾ The dependent variable in a linear regression is the variable that the regression model
tries to explain. The independent variables are the variables that a regression model uses
to explain the dependent variable.

◾ If there is one independent variable in a linear regression and there are n observations on
the dependent and independent variables, the regression model is Yi = b0 + b1Xi + εi, i = 1,
…, n, where Yi is the dependent variable, Xi is the independent variable, and εi is the error
term. In this model, the coefficient b0 is the intercept. The intercept is the predicted value
of the dependent variable when the independent variable has a value of zero. In this
model, the coefficient b1 is the slope of the regression line. If the value of the independent
variable increases by one unit, then the model predicts that the value of the dependent
variable will increase by b1 units.

◾ The assumptions of the classic normal linear regression model are the following:

• A linear relation exists between the dependent variable and the independent
variable.

• The independent variable is not random.

• The expected value of the error term is 0.

• The variance of the error term is the same for all observations (homoskedasticity).

• The error term is uncorrelated across observations.

• The error term is normally distributed.

◾ The estimated parameters in a linear regression model minimize the sum of the squared
regression residuals.

◾ The standard error of estimate measures how well the regression model fits the data. If
the SEE is small, the model fits well.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可，不得复制或传播此图书的任何部分。违者将被起诉。

◾ The coefficient of determination measures the fraction of the total variation in the
dependent variable that is explained by the independent variable. In a linear regression
with one independent variable, the simplest way to compute the coefficient of
determination is to square the correlation of the dependent and independent variables.

◾ To calculate a confidence interval for an estimated regression coefficient, we must know

the standard error of the estimated coefficient and the critical value for the t-distribution
at the chosen level of significance, tc.

◾ To test whether the population value of a regression coefficient, b1, is equal to a particular
‸
hypothesized value, B1, we must know the estimated coefficient, b 1 , the standard error of
the estimated coefficient, s‸ , and the critical value for the t-distribution at the chosen
b1
‸
level of significance, tc. The test statistic for this hypothesis is ( b 1 − B1 ) /s‸ . If the
b1
absolute value of this statistic is greater than tc, then we reject the null hypothesis that b1
= B1.

‸ ‸
◾ In the regression model Yi = b0 + b1Xi + εi, if we know the estimated parameters, b 0 and b 1
, for any value of the independent variable, X, then the predicted value of the dependent
‸ ‸ ‸
variable Y is Y = b 0 + b 1 X.

◾ The prediction interval for a regression equation for a particular predicted value of the
‸
dependent variable is Y ± tc sf where sf is the square root of the estimated variance of the
prediction error and tc is the critical level for the t-statistic at the chosen significance
level. This computation specifies a (1 − α) percent confidence interval. For example, if α
= 0.05, then this computation yields a 95 percent confidence interval.

REFERENCES

Buetow, Gerald W., Robert R. Johnson, and David E. Runkle. 2000. “The Inconsistency of
Returns-Based Style Analysis.” Journal of Portfolio Management 26 (3): 61–77.

Campbell, John Y., Karine Serfaty-de Medeiros, and Luis M. Viceira. 2010. “Global Currency
Hedging.” Journal of Finance 65 (1): 87–121.

Chan, Louis K. C., Stephen G. Dimmock, and Josef Lakonishok. 2009. “Benchmarking Money
Manager Performance: Issues and Evidence.” Review of Financial Studies 22 (11): 4553–99.

Daniel, Wayne W., and James C. Terrell. 1995. Business Statistics for Management and
Economics. 7th ed. Boston: Houghton-Mifflin.

Dybvig, Philip H., and Stephen A. Ross. 1985a. “Differential Information and Performance
Measurement Using a Security Market Line.” Journal of Finance 40 (2): 383–99.

Dybvig, Philip H., and Stephen A. Ross. 1985b. “The Analytics of Performance Measurement
Using a Security Market Line.” Journal of Finance 40 (2): 401–16.

Genre, Veronique, Geoff Kenny, Aidan Meyler, and Allan Timmermann. 2013. “Combining
expert forecasts: Can anything beat the simple average?” International Journal of Forecasting
29 (1): 108–21.

Greene, William H. 2018. Economic Analysis. 8th ed. Upper Saddle River, NJ: Prentice-Hall.

Keane, Michael P., and David E. Runkle. 1990. “Testing the Rationality of Price Forecasts:
New Evidence from Panel Data.” American Economic Review 80 (4): 714–35.

Nelson, David C., Robert B. Moskow, Tiffany Lee, and Gregg Valentine. 2003. Food Investor’s
Handbook. New York: Credit Suisse First Boston.

Sharpe, William F. 1992. “Asset Allocation: Management Style and Performance

Measurement.” Journal of Portfolio Management 18 (2): 7–19.

Sonkin, Paul D., and Paul Johnson. 2017. Pitch the Perfect Investment. New York: Wiley.

PRACTICE PROBLEMS
© 2016 CFA Institute. All rights reserved.

1. The following table shows the sample correlations between the monthly returns for four
different mutual funds and the S&P 500. The correlations are based on 36 monthly
observations. The funds are as follows:

Fund 1 Large-cap fund

Fund 2 Mid-cap fund
Fund 3 Large-cap value fund
Fund 4 Emerging markets fund
S&P 500 US domestic stock index

Fund 1 Fund 2 Fund 3 Fund 4 S&P 500

Fund 1 1
Fund 2 0.9231 1
Fund 3 0.4771 0.4156 1
Fund 4 0.7111 0.7238 0.3102 1
S&P 500 0.8277 0.8223 0.5791 0.7515 1

Test the null hypothesis that each of these correlations, individually, is equal to zero
against the alternative hypothesis that it is not equal to zero. Use a 5 percent significance
level.

2. Julie Moon is an energy analyst examining electricity, oil, and natural gas consumption in
different regions over different seasons. She ran a regression explaining the variation in
energy consumption as a function of temperature. The total variation of the dependent
variable was 140.58, the explained variation was 60.16, and the unexplained variation
was 80.42. She had 60 monthly observations.

A. Compute the coefficient of determination.

B. What was the sample correlation between energy consumption and temperature?

C. Compute the standard error of the estimate of Moon’s regression model.

D. Compute the sample standard deviation of monthly energy consumption.

3. You are examining the results of a regression estimation that attempts to explain the unit
sales growth of a business you are researching. The analysis of variance output for the
regression is given in the table below. The regression was based on five observations (n =
5).

ANOVA df SS MSS F Significance F

Regression 1 88.0 88.0 36.667 0.00904
Residual 3 7.2 2.4
Total 4 95.2

A. How many independent variables are in the regression to which the ANOVA
refers?

B. Define Total SS.

C. Calculate the sample variance of the dependent variable using information in the
above table.

D. Define Regression SS and explain how its value of 88 is obtained in terms of other
quantities reported in the above table.

E. What hypothesis does the F-statistic test?

F. Explain how the value of the F-statistic of 36.667 is obtained in terms of other
quantities reported in the above table.

G. Is the F-test significant at the 5 percent significance level?

4. An economist collected the monthly returns for KDL’s portfolio and a diversified stock
index. The data collected are shown below:

Month Portfolio Return (%) Index Return (%)

1 1.11 −0.59
2 72.10 64.90
3 5.12 4.81
4 1.01 1.68
5 −1.72 −4.97

Month Portfolio Return (%) Index Return (%)

6 4.06 −2.06

The economist calculated the correlation between the two returns and found it to be
0.996. The regression results with the KDL return as the dependent variable and the index
return as the independent variable are given as follows:

Regression Statistics
Multiple R 0.996
R-squared 0.992
Standard error 2.861
Observations 6

ANOVA df SS MSS F Significance F

Regression 1 4101.62 4101.62 500.79 0
Residual 4 32.76 8.19
Total 5 4134.38

Coefficients Standard Error t-Statistic p-Value

Intercept 2.252 1.274 1.768 0.1518
Slope 1.069 0.0477 22.379 0

When reviewing the results, Andrea Fusilier suspected that they were unreliable. She
found that the returns for Month 2 should have been 7.21 percent and 6.49 percent,
instead of the large values shown in the first table. Correcting these values resulted in a
revised correlation of 0.824 and the revised regression results shown as follows:

Regression Statistics
Multiple R 0.824
R-squared 0.678
Standard error 2.062
Observations 6

ANOVA df SS MSS F Significance F

Regression 1 35.89 35.89 8.44 0.044
Residual 4 17.01 4.25
Total 5 52.91

Coefficients Standard Error t-Statistic p-Value

Intercept 2.242 0.863 2.597 0.060
Slope 0.623 0.214 2.905 0.044

Explain how the bad data affected the results.

The following information relates to Questions 5–10

Kenneth McCoin, CFA, is a fairly tough interviewer. Last year, he handed each job applicant a
sheet of paper with the information in the following table, and he then asked several questions
about regression analysis. Some of McCoin’s questions, along with a sample of the answers he
received to each, are given below. McCoin told the applicants that the independent variable is
the ratio of net income to sales for restaurants with a market cap of more than $100 million and
the dependent variable is the ratio of cash flow from operations to sales for those restaurants.
Which of the choices provided is the best answer to each of McCoin’s questions?

Regression Statistics
Multiple R 0.8623
R-squared 0.7436
Standard error 0.0213
Observations 24

ANOVA df SS MSS F Significance F

Regression 1 0.029 0.029000 63.81 0
Residual 22 0.010 0.000455
Total 23 0.040

Coefficients Standard Error t-Statistic p-Value

Intercept 0.077 0.007 11.328 0
Slope 0.826 0.103 7.988 0

5. What is the value of the coefficient of determination?

A. 0.8261.

B. 0.7436.

C. 0.8623.

6. Suppose that you deleted several of the observations that had small residual values. If you
re-estimated the regression equation using this reduced sample, what would likely happen
to the standard error of the estimate and the R-squared?

Standard Error of the Estimate R-Squared

A Decrease Decrease
B Decrease Increase
C Increase Decrease

7. What is the correlation between X and Y?

A. −0.7436.

B. 0.7436.

C. 0.8623.

8. Where did the F-value in the ANOVA table come from?

A. You look up the F-value in a table. The F depends on the numerator and
denominator degrees of freedom.

B. Divide the “Mean Square” for the regression by the “Mean Square” of the
residuals.

C. The F-value is equal to the reciprocal of the t-value for the slope coefficient.

9. If the ratio of net income to sales for a restaurant is 5 percent, what is the predicted ratio
of cash flow from operations to sales?

A. 0.007 + 0.103(5.0) = 0.524.

B. 0.077 − 0.826(5.0) = −4.054.

C. 0.077 + 0.826(5.0) = 4.207.

10. Is the relationship between the ratio of cash flow to operations and the ratio of net income
to sales significant at the 5 percent level?

A. No, because the R-squared is greater than 0.05.

B. No, because the p-values of the intercept and slope are less than 0.05.

C. Yes, because the p-values for F and t for the slope coefficient are less than 0.05.

The following information relates to Questions 11–16

Howard Golub, CFA, is preparing to write a research report on Stellar Energy Corp. common
stock. One of the world’s largest companies, Stellar is in the business of refining and marketing
oil. As part of his analysis, Golub wants to evaluate the sensitivity of the stock’s returns to
various economic factors. For example, a client recently asked Golub whether the price of
Stellar Energy Corporation stock has tended to rise following increases in retail energy prices.
Golub believes the association between the two variables to be negative, but he does not know
the strength of the association.

Golub directs his assistant, Jill Batten, to study the relationships between Stellar monthly
common stock returns versus the previous month’s percent change in the US Consumer Price
Index for Energy (CPIENG), and Stellar monthly common stock returns versus the previous
month’s percent change in the US Producer Price Index for Crude Energy Materials (PPICEM).
Golub wants Batten to run both a correlation and a linear regression analysis. In response,
Batten compiles the summary statistics shown in Exhibit 1 for the 248 months between January
1980 and August 2000. All of the data are in decimal form, where 0.01 indicates a 1 percent
return. Batten also runs a regression analysis using Stellar monthly returns as the dependent
variable and the monthly change in CPIENG as the independent variable. Exhibit 2 displays the
results of this regression model.

Exhibit 1. Descriptive Statistics

Lagged Monthly
Monthly Return Stellar Change
Common Stock CPIENG PPICEM
Mean 0.0123 0.0023 0.0042
Standard Deviation 0.0717 0.0160 0.0534

Covariance, Stellar vs.

−0.00017
CPIENG
Covariance, Stellar vs.
−0.00048
PPICEM
Covariance, CPIENG vs.
0.00044
PPICEM
Correlation, Stellar vs.
−0.1452
CPIENG

Exhibit 2. Regression Analysis with CPIENG

Regression Statistics
Multiple R 0.1452
R-squared 0.0211
Standard error of the estimate 0.0710
Observations 248

Coefficients Standard Error t-Statistic

Intercept 0.0138 0.0046 3.0275
Slope coefficient −0.6486 0.2818 −2.3014

11. Batten wants to determine whether the sample correlation between the Stellar and
CPIENG variables (−0.1452) is statistically significant. The critical value for the test
statistic at the 0.05 level of significance is approximately 1.96. Batten should conclude
that the statistical relationship between Stellar and CPIENG is:

A. significant, because the calculated test statistic has a lower absolute value than the
critical value for the test statistic.

B. significant, because the calculated test statistic has a higher absolute value than the
critical value for the test statistic.

C. not significant, because the calculated test statistic has a higher absolute value than
the critical value for the test statistic.

12. Did Batten’s regression analyze cross-sectional or time-series data, and what was the
expected value of the error term from that regression?

Data Type Expected Value of Error Term

A Time-series 0
B Time-series εi
C Cross-sectional 0

13. Based on the regression, which used data in decimal form, if the CPIENG decreases by
1.0 percent, what is the expected return on Stellar common stock during the next period?

A. 0.0073 (0.73 percent).

B. 0.0138 (1.38 percent).

C. 0.0203 (2.03 percent).

14. Based on Batten’s regression model, the coefficient of determination indicates that:

A. Stellar’s returns explain 2.11 percent of the variability in CPIENG.

B. Stellar’s returns explain 14.52 percent of the variability in CPIENG.

C. Changes in CPIENG explain 2.11 percent of the variability in Stellar’s returns.

15. For Batten’s regression model, the standard error of the estimate shows that the standard
deviation of:

A. the residuals from the regression is 0.0710.

B. values estimated from the regression is 0.0710.

C. Stellar’s observed common stock returns is 0.0710.

16. For the analysis run by Batten, which of the following is an incorrect conclusion from the
regression output?

A. The estimated intercept coefficient from Batten’s regression is statistically

significant at the 0.05 level.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
CFA全新考季资料免费获取（含CFA高清网课）
(随考季更新，长期有效)

扫码关注以下微信公众号 2018-2019年最新CFA一级二级考点汇总中文版根据备考CFA的8大最有效资料和工具

回复【资料】即可免费获取全套资源！ CFA最新考纲编写，比看notes还有效率/2018-2019年教材/notes/核心词汇手册/考纲及解析手册/计算
此活动永久有效！资料会常年实时更泽稷网校CFA视频音频课程及指南/史上最全的学霸学器讲解、历年全真模拟题/真题/道德手册/
新！绝对全面！渣党CFA考经笔记分享 QuickSheet/CFA小白入门指南等等

PDF
里
资
料
扫
码
获
得

【CFA万人微信群】
需要加入我们CFA全球考友微信群的请添加CFA菌的微信号：374208596，备注需要加哪些群~或直接扫左方
CFA菌菌二维码即可~
所有人均先加入CFA全球考友总群再根据您的需求加入其他分群~
（201 年12月，201 年6 考，一级、二级、三级分群、上海、北京、成都、深圳、海外等分
群）！
备考资料、学霸考经、考试资讯免费共享！交流、答疑、互助应有尽有！快来加入我们吧！群数量太多，文件中
只是部分展示~有困难的话可以随时咨询我哦！
打印者:yanzhao yang <[email protected]>。打印仅供个人、私人使用。未经出版商的事先许可，不得复制或传播此图书的任何部分。违者将被起诉。

B. In the month after the CPIENG declines, Stellar’s common stock is expected to
exhibit a positive return.

C. Viewed in combination, the slope and intercept coefficients from Batten’s

regression are not statistically significant at the 0.05 level.

The following information relates to Questions 17–26

Anh Liu is an analyst researching whether a company’s debt burden affects investors’ decision
to short the company’s stock. She calculates the short interest ratio (the ratio of short interest to
average daily share volume, expressed in days) for 50 companies as of the end of 2016 and
compares this ratio with the companies’ debt ratio (the ratio of total liabilities to total assets,
expressed in decimal form).

Liu provides a number of statistics in Exhibit 1. She also estimates a simple regression to
investigate the effect of the debt ratio on a company’s short interest ratio. The results of this
simple regression, including the analysis of variance (ANOVA), are shown in Exhibit 2.

In addition to estimating a regression equation, Liu graphs the 50 observations using a

scatterplot, with the short interest ratio on the vertical axis and the debt ratio on the horizontal
axis.

Exhibit 1. Summary Statistics

Debt Ratio Short Interest Ratio

Statistic Xi Yi
Sum 19.8550 192.3000
Average 0.3971 3.8460
n n
Sum of squared
deviations from the ∑ (Xi − X
¯¯¯)2 = 2.2225 ∑ (Yi − Y
¯¯¯)2 = 412.2042
i=1 i=1
mean
n
Sum of cross-products
of deviations from the ∑ (Xi − X
¯¯¯) (Yi − Y
¯¯¯) = −9.2430
i=1
mean

Exhibit 2. Regression of the Short Interest Ratio on the Debt Ratio

Degrees of Freedom Sum of Squares Mean Square

ANOVA (df) (SS) (MS)
Regression 1 38.4404 38.4404
Residual 48 373.7638 7.7867

Degrees of Freedom Sum of Squares Mean Square

ANOVA (df) (SS) (MS)
Total 49 412.2042

Regression Statistics
Multiple R 0.3054
2
R 0.0933
Standard error of estimate 2.7905
Observations 50

Coefficients Standard Error t-Statistic

Intercept 5.4975 0.8416 6.5322
Debt ratio –4.1589 1.8718 –2.2219

Liu is considering three interpretations of these results for her report on the relationship
between debt ratios and short interest ratios:

Interpretation 1 Companies’ higher debt ratios cause lower short interest ratios.

Interpretation 2 Companies’ higher short interest ratios cause higher debt ratios.

Interpretation 3 Companies with higher debt ratios tend to have lower short interest
ratios.

She is especially interested in using her estimation results to predict the short interest ratio for
MQD Corporation, which has a debt ratio of 0.40.

17. Based on Exhibits 1 and 2, if Liu were to graph the 50 observations, the scatterplot
summarizing this relation would be best described as:

A. horizontal.

B. upward sloping.

C. downward sloping.

18. Based on Exhibit 1, the sample covariance is closest to:

A. −9.2430.

B. −0.1886.

C. 8.4123.

19. Based on Exhibit 1, the correlation between the debt ratio and the short interest ratio is
closest to:

A. −0.3054.

B. 0.0933.

C. 0.3054.

20. Which of the interpretations best describes Liu’s findings for her report?

A. Interpretation 1

B. Interpretation 2

C. Interpretation 3

21. The dependent variable in Liu’s regression analysis is the:

A. intercept.

B. debt ratio.

C. short interest ratio.

22. Based on Exhibit 2, the degrees of freedom for the t-test of the slope coefficient in this
regression are:

A. 48.

B. 49.

C. 50.

23. The upper bound for the 95% confidence interval for the coefficient on the debt ratio in
the regression is closest to:

A. −1.0199.

B. −0.3947.

C. 1.4528.

24. Which of the following should Liu conclude from these results shown in Exhibit 2?

A. The average short interest ratio is 5.4975.

B. The estimated slope coefficient is statistically significant at the 0.05 level.

C. The debt ratio explains 30.54% of the variation in the short interest ratio.

25. Based on Exhibit 2, the short interest ratio expected for MQD Corporation is closest to:

A. 3.8339.

B. 5.4975.

C. 6.2462.

26. Based on Liu’s regression results in Exhibit 2, the F-statistic for testing whether the slope
coefficient is equal to zero is closest to:

A. −2.2219.

B. 3.5036.

C. 4.9367.

SOLUTIONS
1. The critical t-value for n − 2 = 34 df, using a 5 percent significance level and a two-tailed
test, is 2.032. First, take the smallest correlation in the table, the correlation between Fund
3 and Fund 4, and see if it is significantly different from zero. Its calculated t-value is

r√n − 2 0.3102√36 − 2
t= = = 1.903
√1 − r2 √1 − 0.31022

This correlation is not significantly different from zero. If we take the next lowest
correlation, between Fund 2 and Fund 3, this correlation of 0.4156 has a calculated
t-value of 2.664. So this correlation is significantly different from zero at the 5 percent
level of significance. All of the other correlations in the table (besides the 0.3102) are
greater than 0.4156, so they too are significantly different from zero.

A. The coefficient of determination is

Explained variation 60.16

= = 0.4279
Total variation 140.58

B. For a linear regression with one independent variable, the absolute value of
correlation between the independent variable and the dependent variable equals the
square root of the coefficient of determination, so the correlation is √0.4279 =
0.6542. (The correlation will have the same sign as the slope coefficient.)

C. The standard error of the estimate is

⎛ ⎞
2 1/2
‸ ‸
⎜ n (Y − − b 1 Xi ) ⎟
⎜ ⎟
b 0
⎜∑ ⎟
i

⎜ ⎟
1/2
=( )
Unexplained variation
⎜ ⎟
⎜ i=1 n−2 ⎟
⎜ ⎟
n−2

⎝ ⎠

= √ 60−2 = 1.178
80.42

D. The sample variance of the dependent variable is

n ¯¯¯)2
(Yi − Y Total variation 140.58
∑ = = = 2.3827
i=1
n−1 n−1 60 − 1

The sample standard deviation is √2.3827 = 1.544.

A. The degrees of freedom for the regression is the number of slope parameters in the
regression, which is the same as the number of independent variables in the
regression. Because regression df = 1, we conclude that there is one independent
variable in the regression.

B. Total SS is the sum of the squared deviations of the dependent variable Y about its
mean.

C. The sample variance of the dependent variable is the total SS divided by its degrees
of freedom (n − 1 = 5 − 1 = 4 as given). Thus the sample variance of the dependent
variable is 95.2/4 = 23.8.

D. The Regression SS is the part of total sum of squares explained by the regression.
Regression SS equals the sum of the squared differences between predicted values
2
‸
of the Y and the sample mean of Y: ∑ (Y i − Y ¯¯¯) . In terms of other values in
n

i=1
the table, Regression SS is equal to Total SS minus Residual SS: 95.2 − 7.2 = 88.

E. The F-statistic tests whether all the slope coefficients in a linear regression are
equal to 0.

F. The calculated value of F in the table is equal to the Regression MSS divided by
the Residual MSS: 88/2.4 = 36.667.

G. Yes. The significance of 0.00904 given in the table is the p-value of the test (the
smallest level at which we can reject the null hypothesis). This value of 0.00904 is
less than the specified significance level of 0.05, so we reject the null hypothesis.
The regression equation has significant explanatory power.

4. The Month 2 data point is an outlier, lying far away from the other data values. Because
this outlier was caused by a data entry error, correcting the outlier improves the validity
and reliability of the regression. In this case, the true correlation is reduced from 0.996 to
0.824. The revised R-squared is substantially lower (0.678 versus 0.992). The
significance of the regression is also lower, as can be seen in the decline of the F-value
from 500.79 to 8.44 and the decline in the t-statistic of the slope coefficient from 22.379
to 2.905.

The total sum of squares and regression sum of squares were greatly exaggerated in the
incorrect analysis. With the correction, the slope coefficient changes from 1.069 to 0.623.
This change is important. When the index moves up or down, the original model indicates
that the portfolio return goes up or down by 1.069 times as much, while the revised model
indicates that the portfolio return goes up or down by only 0.623 times as much. In this
example, incorrect data entry caused the outlier. Had it been a valid observation, not
caused by a data error, then the analyst would have had to decide whether the results were
more reliable including or excluding the outlier.

5. B is correct. The coefficient of determination is the same as R-squared.

6. C is correct. Deleting observations with small residuals will degrade the strength of the
regression, resulting in an increase in the standard error and a decrease in R-squared.

7. C is correct. For a regression with one independent variable, the correlation is the same as
the Multiple R with the sign of the slope coefficient. Because the slope coefficient is
positive, the correlation is 0.8623.

8. B is correct. This answer describes the calculation of the F-statistic.

9. C is correct. To make a prediction using the regression model, multiply the slope
coefficient by the forecast of the independent variable and add the result to the intercept.

10. C is correct. The p-value is the smallest level of significance at which the null hypotheses
concerning the slope coefficient can be rejected. In this case the p-value is less than 0.05,
and thus the regression of the ratio of cash flow from operations to sales on the ratio of
net income to sales is significant at the 5 percent level.

11. B is correct because the calculated test statistic is

r√n−2
t=
√1−r2
−0.1452√248−2
= = −2.3017
√1−(−0.1452)2

Because the absolute value of t = −2.3017 is greater than 1.96, the correlation coefficient
is statistically significant. For a regression with one independent variable, the t-value (and
significance) for the slope coefficient (which is −2.3014) should equal the t-value (and
significance) of the correlation coefficient. The slight difference between these two
t-values is caused by rounding error.

12. A is correct because the data are time series, and the expected value of the error term, E
(ε), is 0.

13. C is correct. From the regression equation, Expected return = 0.0138 + −0.6486(−0.01) =
0.0138 + 0.006486 = 0.0203, or 2.03 percent.

14. C is correct. R-squared is the coefficient of determination. In this case, it shows that 2.11
percent of the variability in Stellar’s returns is explained by changes in CPIENG.

15. A is correct, because the standard error of the estimate is the standard deviation of the
regression residuals.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9
16. C is the correct response, because it is a false statement. The slope and intercept are both
statistically significant.

17. C is correct because the slope coefficient (Exhibit 2) and the cross-product (Exhibit 1) are
negative.

18. B is correct. The sample covariance is calculated as

n
∑(Xi −X
¯¯¯)(Yi −Y
¯¯¯)
i=1
n−1
= −9.2430 ÷ 49 = −0.1886.

19. A is correct. The correlation coefficient equals the covariance between variables X and Y
divided by the product of the standard deviations of variables X and Y, as follows:
−9.2430
49 −0.1886327
= 0.2130×2.9004
= −0.3054.
√ 2.2225 √ 412.2042
49 49

20. C is correct. Conclusions cannot be drawn regarding causation, only about association.

21. C is correct. Liu explains the short interest ratio using the debt ratio.

22. A is correct. The degrees of freedom are the number of observations minus the number of
parameters estimated, which equals two in this case (the intercept and the slope
coefficient). The number of degrees of freedom is 50 − 2 = 48.

23. B is correct. The calculation for the confidence interval is −4.1589 ± (2.011 × 1.8718).
The upper bound is −0.3947. The 2.011 is the critical t-value for the 5% level of
significance (2.5% in one tail) for 48 degrees of freedom.

24. B is correct. The t-statistic is −2.2219, which is outside of the bounds created by the
critical t-values of ± 2.011 for a two-tailed test with a 5% significance level. The 2.011 is
the critical t-value for the 5% level of significance (2.5% in one tail) for 48 degrees of
freedom.

25. A is correct because Predicted value = 5.4975 + (−4.1589 × 0.40) = 5.4975 − 1.6636 =
3.8339.
Mean regression sum of squares 38.4404
26. C is correct because F = Mean squared error
= 7.7867
= 4.9367.

NOTES
1Examples in this reading were updated in 2014 by Professor Sanjiv Sabherwal of the University of Texas,
Arlington.

2Later, we show that variables with a correlation of 0 can have a strong nonlinear relation.

3The use of n − 1 in the denominator is a technical point; it ensures that the sample covariance is an unbiased
estimate of population covariance.

http://e.pub/jthbhvadxmncfyrxzzot.vbk/OEBPS/CFA0014-R03-7-print-1539097160.xh... 2018/10/9

Eonometrics For Acct and Finance CH 2 2023
No ratings yet
Eonometrics For Acct and Finance CH 2 2023
19 pages
Correlation
No ratings yet
Correlation
31 pages
Engineering - Simple Correlation and Regression - 2024
No ratings yet
Engineering - Simple Correlation and Regression - 2024
35 pages
Linear Regression
No ratings yet
Linear Regression
216 pages
Regression Analysis (AI)
No ratings yet
Regression Analysis (AI)
9 pages
Chapter Eight 8 Simple Linear Regression and Correlation: N XY X Y N X X
No ratings yet
Chapter Eight 8 Simple Linear Regression and Correlation: N XY X Y N X X
5 pages
M3 Part 2: Regression Analysis
No ratings yet
M3 Part 2: Regression Analysis
21 pages
Correlation
No ratings yet
Correlation
13 pages
Fds Unit FINAL
No ratings yet
Fds Unit FINAL
27 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Correlation Regression 15 16
No ratings yet
Correlation Regression 15 16
19 pages
Simple and Multiple Regression
No ratings yet
Simple and Multiple Regression
56 pages
Module 3 PoM-Forecasting
No ratings yet
Module 3 PoM-Forecasting
5 pages
Econometrics For Finance (2017-I)
No ratings yet
Econometrics For Finance (2017-I)
6 pages
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
No ratings yet
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
16 pages
STB1003 - Unit-3 BSC
No ratings yet
STB1003 - Unit-3 BSC
12 pages
Econometrics For MGT ppt-2
No ratings yet
Econometrics For MGT ppt-2
58 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
FRM Part 1: Regression With Multiple Explanatory Variables
No ratings yet
FRM Part 1: Regression With Multiple Explanatory Variables
29 pages
QT - Unit 2 - Part B - Regression
No ratings yet
QT - Unit 2 - Part B - Regression
40 pages
Econometrics For Finance Final Exam Draft
0% (1)
Econometrics For Finance Final Exam Draft
5 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Correlation
No ratings yet
Correlation
5 pages
The G Factor - The Science of Mental Ability (1998) by Arthur Robert Jensen
100% (2)
The G Factor - The Science of Mental Ability (1998) by Arthur Robert Jensen
661 pages
FM Project REPORT - Group3
No ratings yet
FM Project REPORT - Group3
24 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Allied Food Products: A Case Study
No ratings yet
Allied Food Products: A Case Study
18 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
ArunRangrej
No ratings yet
ArunRangrej
5 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
Chapter Two Part One
No ratings yet
Chapter Two Part One
6 pages
BADM 299 Exam 4 Chap 12-Review Questions
0% (1)
BADM 299 Exam 4 Chap 12-Review Questions
7 pages
Ra Web
No ratings yet
Ra Web
70 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
CH 6
No ratings yet
CH 6
43 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Simple Linear Regression and Correlation (Continue..,)
No ratings yet
Simple Linear Regression and Correlation (Continue..,)
30 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
No ratings yet
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
69 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
FinQuiz - Smart Summary, Study Session 3, Reading 10
No ratings yet
FinQuiz - Smart Summary, Study Session 3, Reading 10
7 pages
Regression
No ratings yet
Regression
15 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Chapter No 11 (Simple Linear Regression)
No ratings yet
Chapter No 11 (Simple Linear Regression)
3 pages
325unit 1 Simple Regression Analysis
No ratings yet
325unit 1 Simple Regression Analysis
10 pages
Regression and Correlation
100% (1)
Regression and Correlation
9 pages
Unit-2 (Flow Graphs and Path Testing) : at The End of This Unit, The Student Will Be Able To
No ratings yet
Unit-2 (Flow Graphs and Path Testing) : at The End of This Unit, The Student Will Be Able To
38 pages
Econometrics Chapter 14, 15 & 16 PPT Slides
100% (2)
Econometrics Chapter 14, 15 & 16 PPT Slides
113 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Test of Creative Imaginary
No ratings yet
Test of Creative Imaginary
6 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Psychology DR Franklin
0% (1)
Psychology DR Franklin
3 pages
Research-Methodology-Litrature-Review of Fii N Fdi 2003
No ratings yet
Research-Methodology-Litrature-Review of Fii N Fdi 2003
12 pages
FinQuiz - Curriculum Note, Study Session 2, Reading 4
No ratings yet
FinQuiz - Curriculum Note, Study Session 2, Reading 4
5 pages
1 PDF
No ratings yet
1 PDF
3 pages
Effectiveness of Performance Appraisal S PDF
No ratings yet
Effectiveness of Performance Appraisal S PDF
25 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Cohesive Features in The Expository Writing of Undergraduates in Two Chinese Universities
No ratings yet
Cohesive Features in The Expository Writing of Undergraduates in Two Chinese Universities
36 pages
Multivariate Quality Control: A Historical Perspective
No ratings yet
Multivariate Quality Control: A Historical Perspective
14 pages
Chapter 16-17 - Correlation Regression Latest
No ratings yet
Chapter 16-17 - Correlation Regression Latest
23 pages
Spatstat
No ratings yet
Spatstat
75 pages
Neopatriarchy Islam and Female Labour Force Participation
No ratings yet
Neopatriarchy Islam and Female Labour Force Participation
22 pages
CFA LVL II Quantitative Methods Study Notes
No ratings yet
CFA LVL II Quantitative Methods Study Notes
10 pages
Spe 26436 MS PDF
No ratings yet
Spe 26436 MS PDF
16 pages
Panel Data 4: Fixed Effects Vs Random Effects Models
No ratings yet
Panel Data 4: Fixed Effects Vs Random Effects Models
8 pages
Hubungan Antar Volume Lalu Lintas Dengan Tingkat Kebisingan Di Jalan
No ratings yet
Hubungan Antar Volume Lalu Lintas Dengan Tingkat Kebisingan Di Jalan
7 pages
2012 An Improvement To Kogut and Singh
No ratings yet
2012 An Improvement To Kogut and Singh
8 pages
Dlsu Research Congress
No ratings yet
Dlsu Research Congress
6 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
Correlational Foundations
No ratings yet
Correlational Foundations
6 pages
Bandini 2010
No ratings yet
Bandini 2010
6 pages
Data Mining On Educational Domain: Nikhil Rajadhyax Prof. Rudresh Shirwaikar
No ratings yet
Data Mining On Educational Domain: Nikhil Rajadhyax Prof. Rudresh Shirwaikar
6 pages
Assignment 2 (2017-2018)
No ratings yet
Assignment 2 (2017-2018)
16 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
SPSS Data Analysis
100% (6)
SPSS Data Analysis
47 pages
Algebra B: Chapter 5 Review
No ratings yet
Algebra B: Chapter 5 Review
3 pages
Case Studies in Construction Materials: Hemraj R. Kumavat, Narayan R. Chandak, Ishwar T. Patil
No ratings yet
Case Studies in Construction Materials: Hemraj R. Kumavat, Narayan R. Chandak, Ishwar T. Patil
12 pages
TG - Momentum, Acceleration
No ratings yet
TG - Momentum, Acceleration
25 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Vineland Adaptive Behavior Scales
100% (2)
Vineland Adaptive Behavior Scales
17 pages
Confounding
No ratings yet
Confounding
2 pages
(Teach Yourself) Sandi Mann - Psychology - A Complete Introduction (2016, John Murray Press) PDF
100% (7)
(Teach Yourself) Sandi Mann - Psychology - A Complete Introduction (2016, John Murray Press) PDF
228 pages

Reading 07-Correlation and Regression

Uploaded by

Reading 07-Correlation and Regression

Uploaded by

3.8.

Limitations of Regression Analysis

◾ Correlations can be spurious in the sense of misleadingly pointing toward associations

• The independent variable is not random.

• The expected value of the error term is 0.

• The error term is uncorrelated across observations.

• The error term is normally distributed.

◾ To calculate a confidence interval for an estimated regression coefficient, we must know

Sharpe, William F. 1992. “Asset Allocation: Management Style and Performance

Fund 1 Large-cap fund

Fund 1 Fund 2 Fund 3 Fund 4 S&P 500

A. Compute the coefficient of determination.

C. Compute the standard error of the estimate of Moon’s regression model.

D. Compute the sample standard deviation of monthly energy consumption.

ANOVA df SS MSS F Significance F

B. Define Total SS.

E. What hypothesis does the F-statistic test?

G. Is the F-test significant at the 5 percent significance level?

Month Portfolio Return (%) Index Return (%)

Month Portfolio Return (%) Index Return (%)

ANOVA df SS MSS F Significance F

Coefficients Standard Error t-Statistic p-Value

ANOVA df SS MSS F Significance F

Coefficients Standard Error t-Statistic p-Value

Explain how the bad data affected the results.

The following information relates to Questions 5–10

ANOVA df SS MSS F Significance F

Coefficients Standard Error t-Statistic p-Value

5. What is the value of the coefficient of determination?

Standard Error of the Estimate R-Squared

7. What is the correlation between X and Y?

8. Where did the F-value in the ANOVA table come from?

A. 0.007 + 0.103(5.0) = 0.524.

B. 0.077 − 0.826(5.0) = −4.054.

C. 0.077 + 0.826(5.0) = 4.207.

A. No, because the R-squared is greater than 0.05.

The following information relates to Questions 11–16

Exhibit 1. Descriptive Statistics

Covariance, Stellar vs.

Exhibit 2. Regression Analysis with CPIENG

Coefficients Standard Error t-Statistic

Data Type Expected Value of Error Term

A. 0.0073 (0.73 percent).

B. 0.0138 (1.38 percent).

C. 0.0203 (2.03 percent).

A. Stellar’s returns explain 2.11 percent of the variability in CPIENG.

B. Stellar’s returns explain 14.52 percent of the variability in CPIENG.

C. Changes in CPIENG explain 2.11 percent of the variability in Stellar’s returns.

A. the residuals from the regression is 0.0710.

B. values estimated from the regression is 0.0710.

C. Stellar’s observed common stock returns is 0.0710.

A. The estimated intercept coefficient from Batten’s regression is statistically

扫码关注以下微信公众号 2018-2019年最新CFA一级二级考点汇总中文版根据 备考CFA的8大最有效资料和工具

C. Viewed in combination, the slope and intercept coefficients from Batten’s

The following information relates to Questions 17–26

In addition to estimating a regression equation, Liu graphs the 50 observations using a

Exhibit 1. Summary Statistics

Debt Ratio Short Interest Ratio

Exhibit 2. Regression of the Short Interest Ratio on the Debt Ratio

Degrees of Freedom Sum of Squares Mean Square

Degrees of Freedom Sum of Squares Mean Square

Coefficients Standard Error t-Statistic

18. Based on Exhibit 1, the sample covariance is closest to:

21. The dependent variable in Liu’s regression analysis is the:

C. short interest ratio.

A. The average short interest ratio is 5.4975.

B. The estimated slope coefficient is statistically significant at the 0.05 level.

A. The coefficient of determination is

Explained variation 60.16

C. The standard error of the estimate is

D. The sample variance of the dependent variable is

The sample standard deviation is √2.3827 = 1.544.

5. B is correct. The coefficient of determination is the same as R-squared.

8. B is correct. This answer describes the calculation of the F-statistic.

11. B is correct because the calculated test statistic is

18. B is correct. The sample covariance is calculated as

扫码关注以下微信公众号 2018-2019年最新CFA一级二级考点汇总中文版根据备考CFA的8大最有效资料和工具