Dbs3e PPT ch14
Dbs3e PPT ch14
• However, the article reports that S&P 500 sectors have been moving
together for almost two years which is unusual, and for some
investors, that raises red flags. The article states that increasing
correlations can create strong declines when stocks fall.
• The bottom line of the Wall Street Journal article is that correlation
tells us a lot about the trend in the financial market and many
investors use correlation analysis to make investment decisions.
Financial investments that move independently of one another are
considered to be uncorrelated, which ensure that your investment
portfolio is properly diversified. What a great reason to master
correlation.
Based on: “Stocks Are Moving in Tandem. That Can Be Scary”, Wall Street Journal, February 15, 2018.
*https://www.wsj.com/articles/stocks-are-moving-in-tandem-that-can-be-scary-1518720399
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-6
14.1 Dependent and Independent
Variables
An independent variable, x, explains the
variation in another variable, which is called
the dependent variable, y.
• Variation in x explains variation in y, but not
the reverse (direction is only one way).
Independent variable (x) → Dependent variable (y)
x x
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-8
Correlation Analysis
Example: A new car dealer wants to examine the
relationship between the number of TV ads run per
week and the number of cars sold that week.
• The number of ads per week is expected to affect
sales, not the reverse, so the number of ads is the
independent variable (x) and the number of cars
sold is the dependent variable (y).
Suppose a sample of 6 weeks is selected.
• Two values are recorded for each week: number of
TV ads and number of cars sold.
Number Number of
of TV Ads Cars Sold
Week x y
1 3 13
2 6 31
3 4 19
4 5 27
5 6 23
6 3 19
Because r = 0.836 is positive and close to +1, there is a fairly strong positive
relationship between the number of TV ads and cars sold.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-15
Using Excel to Calculate the Correlation
Coefficient
Use the CORREL Function in Excel to calculate the correlation
coefficient
=CORREL(array1, array2)
where:
r = The sample correlation coefficient
n = The number of ordered pairs
where:
= The predicted value of y
given a value of x
x = The independent variable
b0 = The y-intercept of the straight line
b1 = The slope of the straight line
This is a line described by the equation
= 2.0 + 0.5x.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-21
Simple Regression Analysis
where
yi = The i th observation for the dependent variable from the population
β0 = The population y-intercept
β1 = The population slope
xi = The i th observation for the independent variable from the population
ei = The residual for the i th observation from the population
where:
= The residual of the ith observation in the sample
= The actual value of the dependent variable for the ith data point
= The predicted value of the dependent variable for the ith data point
Intercept = b0
xi x
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-24
The Least Squares Method
The least squares method identifies the linear equation that
best fits a set of ordered pairs.
• used to find the values for b (the y-intercept) and b (the slope of
0 1
the line)
• The resulting best fit line is called the regression line.
Slope
= 3.8947
Intercept
= 4.4737
If we set x = 5 in the
regression equation, we
get:
correlation
coefficient, r
y-intercept
slope value
where:
y = A value of the dependent variable from the sample
= The average value of the dependent variable from the sample
= The estimated value of y for a given x value
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-33
Partitioning the Sum of Squares
Sum of
Squares
in Excel
SSR
SSE
SST
Calculated p-value
F-test for the
statistic F-test
n
SSR
SSE
df
x x
Small se Large se
se can be
calculated Standard
or found in error of the
estimate, se
Excel
n
SSE
where:
CI = The confidence interval for an average value of y
= The predicted y value for the desired value of x
tα/2 = The critical t-statistic from the Students’ t-distribution with n – 2 df
se = The standard error of the estimate
n = The number of ordered pairs
= The average value of x from the sample
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-46
The Confidence Interval for an Average
Value of y Based on a Value of x
Example: Using the car sales vs. TV ads data with 5 ads per
week (x = 5),
tα/2 = 2.776
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-47
The Confidence Interval for an Average
Value of y Based on a Value of x
Example: (continued)
Completing the computation for the confidence interval:
where:
b1 = The sample regression slope
β1 = The population regression slope from the null hypothesis
sb = The standard error of the slope
H0 : β1 = 0
H1 : β1 ≠ 0
Since t = 3.05 > tα/2 = 2.776, we reject H0 and conclude that the
population regression slope is not equal to zero and that there is a
relationship between TV ads and car sales.
sb, the
Standard
calculated t- The t-test
error of the
test statistic, slope, sb
statistic for
the slope
and the
confidence
interval are
reported by
Excel.
Confidence interval
for the slope
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-61
14.6 Assumptions for
Regression Analysis
For the results from regression analysis to be
reliable, certain key assumptions need to be
satisfied.
It is important when performing a regression
analysis to examine a scatter plot and a residual
plot for violations of regression assumptions.
• Regression estimates and predictions will be
less accurate or misleading if assumptions are
violated.
y y
x x
Linear Not Linear
For low and high values of x, the
Data in this scatter plot appear
estimated value will be too high;
to follow a linear pattern.
estimated values for x s in the
middle of the x range will be too low.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-63
Assumptions for Regression Analysis
Assumption 2: The residuals exhibit no patterns across values of
the independent variable.
• The residual for each ordered pair is the difference between the actual
and the predicted values of the dependent variable.
• Excel will generate a residual plot for each ordered pair in the
data set:
1. Enter the x and y data in separate columns in a worksheet.
2. Go to Data > Data Analysis.
3. Select Regression from Data Analysis and click OK.
4. In the Regression dialog box, check the Residuals and
Residual Plots options and click OK.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-64
Assumptions for Regression Analysis
Assumption 2: (continued) The residuals exhibit no patterns across
values for the independent variable.
y y
x x
residuals
residuals
x
x
No pattern in this residual plot Residuals for low and high values of x are
(random residuals) mostly negative, residuals for x s in the
middle of the x range are mostly positive.
Copyright © 2020, 2015, 2013 Pearson Education, Inc.
14-65
Assumptions for Regression Analysis
Assumption 3: (homoscedasticity)
The variation of the dependent variable is the same across
all values of the independent variable.
• Can view the residual plot to evaluate this assumption:
residuals
residuals
x x
Assumption 4:
The residuals from the ordered pairs follow the normal
probability distribution. Inspect the residual plot to
evaluate this assumption.
t = –5.276 is in the
rejection region, so we
reject the null hypothesis Reject H0 Reject H0
Do not reject H0
and conclude that the 0.98 α/2 = 0.01
α/2 = 0.01
population correlation
coefficient is not equal to
zero. 0
SSE = 15.155
SST = 85.875
SSR = 70.720
Confidence interval:
UCL = 25.7 + 1.81 = 27.51
Prediction interval:
UCL = 25.7 + 5.31 = 31.01
H0 : β1 = 0
H1 : β1 ≠ 0
Since | t | = 5.31 > tα/2 = 3.143, we reject H0 and conclude that the
population regression slope is not equal to zero and that there is a
relationship between car speed and MPG.