Coefficient of Determination
Coefficient of Determination
Coefficient of Determination
Correlation Coefficient
The CORREL function returns the correlation coefficient.
11–5 The previous sections stated that if the correlation coefficient is significant, the equation
of the regression line can be determined. Also, for various values of the independent
Coefficient of variable x, the corresponding values of the dependent variable y can be predicted. Sev-
Determination and eral other measures are associated with the correlation and regression techniques. They
Standard Error of include the coefficient of determination, the standard error of estimate, and the predic-
Estimate tion interval. But before these concepts can be explained, the different types of variation
associated with the regression model must be defined.
Figure 11–17 y
(x, y )
Deviations for the
Unexplained
Regression Equation deviation
(y – y ′)
y –y Total deviation
(x, y ′)
Explained
deviation
y ′ –y
y y
(x,y )
x
The procedure for finding the three types of variation is illustrated next.
STEP 1 Find the predicted y' values.
for x 1 y' 4.8 2.8x 4.8 (2.8)(1) 7.6
for x 2 y' 4.8 (2.8)(2) 10.4
for x 3 y' 4.8 (2.8)(3) 13.2
for x 4 y' 4.8 (2.8)(4) 16
for x 5 y' 4.8 (2.8)(5) 18.8
Hence, the values for this example are as follows:
x y y'
1 10 7.6
2 8 10.4
3 12 13.2
4 16 16
5 20 18.8
Coefficient of The coefficient of determination is the ratio of the explained variation to the total varia-
Determination tion and is denoted by r 2. That is,
Objective 5. Compute the
explained variation
coefficient of determination. r2
total variation
Historical Note For the example, r 2 78.4/92.8 0.845. The term r 2 is usually expressed as a per-
centage. So in this case, 84.5% of the total variation is explained by the regression line
Karl Pearson recom- using the independent variable.
mended in 1897 that the
French government close
Another way to arrive at the value for r 2 is to square the correlation coefficient. In
all its casinos and turn the this case, r 0.919, and r 2 0.845, which is the same value found by using the varia-
gambling devices over to tion ratio.
the academic community
to use in the study of
probability. The coefficient of determination is a measure of the variation of the dependent variable that is explained
by the regression line and the independent variable. The symbol for the coefficient of determination is r 2.
1.00 r 2
Standard Error of When a y' value is predicted for a specific x value, the prediction is a point prediction.
Estimate However, a prediction interval about the y' value can be constructed, just as a confidence
Objective 6. Compute the interval was constructed for an estimate of the population mean. The prediction interval
standard error of estimate. uses a statistic called the standard error of estimate.
Section 11–5 Coefficient of Determination and Standard Error of Estimate 493
The standard error of estimate, denoted by s est is the standard deviation of the observed y values about
the predicted y' values. The formula for the standard error of estimate is
y y ' 2
sest
n2
The standard error of estimate is similar to the standard deviation, but the mean is
not used. As can be seen from the formula, the standard error of estimate is the square
root of the unexplained variation—i.e., the variation due to the difference of the
observed values and the expected values—divided by n 2. So the closer the observed
values are to the predicted values, the smaller the standard error of estimate will be.
The next example shows how to compute the standard error of estimate.
Example 11–12 A researcher collects the following data and determines that there is a significant rela-
tionship between the age of a copy machine and its monthly maintenance cost. The re-
gression equation is y' 55.57 8.13x Find the standard error of estimate.
Machine Age, x (years) Monthly cost, y
A 1 $ 62
B 2 78
C 3 70
D 4 90
E 4 93
F 6 103
Solution
STEP 1 Make a table, as shown.
x y y' y y' (y y')2
1 62
2 78
3 70
4 90
4 93
6 103
STEP 2 Using the regression line equation, y' 55.57 8.13x, compute the
predicted values y' for each x and place the results in the column labeled y'.
x1 y' 55.57 (8.13)(1) 63.70
x2 y' 55.57 (8.13)(2) 71.83
x3 y' 55.57 (8.13)(3) 79.96
x4 y' 55.57 (8.13)(4) 88.09
x6 y' 55.57 (8.13)(6) 104.35
STEP 3 For each y, subtract y' and place the answer in the column labeled y y'.
62 63.70 1.70 90 (88.09) 1.91
78 71.83 6.17 93 (88.09) 4.91
70 79.96 9.96 103 (104.35) 1.35
STEP 4 Square the numbers found in step 3 and place the squares in the column
labeled (y y')2.
494 Chapter 11 Correlation and Regression
STEP 5 Find the sum of the numbers in the last column. The completed table
follows.
x y y' y y' (y y')2
1 62 63.70 1.70 2.89
2 78 71.83 6.17 38.0689
3 70 79.96 9.96 99.2016
4 90 88.09 1.91 3.6481
4 93 88.09 4.91 24.1081
6 103 104.35 1.35 1.8225
169.7392
y y' 2 169.7392
sest 6.51
n2 62
In this case, the standard deviation of observed values about the predicted
values is 6.51.
The standard error of estimate can also be found by using the formula
y2 a y b xy
sest
n2
Example 11–13 Find the standard error of estimate for the data for Example 11–12 by using the preced-
ing formula. The equation of the regression line is y' 55.57 8.13x.
Solution
STEP 1 Make a table.
STEP 2 Find the product of x and y values and place the results in the third column.
STEP 3 Square the y values and place the results in the fourth column.
STEP 4 Find the sums of the second, third, and fourth columns. The completed table
is shown here.
x y xy y2
1 62 62 3,844
2 78 156 6,084
3 70 210 4,900
4 90 360 8,100
4 93 372 8,649
6 103 618 10,609
y 496 xy 1778 y2 42,186
STEP 5 From the regression equation, y' 55.57 8.13x, a 55.57, and b 8.13.
STEP 6 Substitute in the formula and solve for sest.
Section 11–5 Coefficient of Determination and Standard Error of Estimate 495
y 2 a y b xy
sest
n2
42,186 55.57 496 8.13 1778
6.48
62
This value is close to the value found in Example 11–12. The difference is
due to rounding.
Prediction Interval The standard error of estimate can be used for constructing a prediction interval (sim-
ilar to a confidence interval) about a y' value.
Objective 7. Find a When a specific value x is substituted into the regression equation, one gets y' which
prediction interval. is a point estimate for y. For example, if the regression line equation for the age of a ma-
chine and the monthly maintenance cost is y' 55.57 8.13x (Example 11–12), then the
predicted maintenance cost for a 3-year-old machine would be y' 5.57 8.13(3) or
$79.96. Since this is a point estimate, one has no ideas how accurate it is. But one can
construct a prediction interval about the estimate. By selecting an value, one can
achieve a 1 confidence that the interval contains the actual mean of the y values that
correspond to the given value of x.
The reason is that there are possible sources of prediction errors in finding the
regression line equation. One source occurs when finding the standard error of estimate,
sest. Two others are errors made in estimating the slope and the y intercept, since the
equation of the regression line will change somewhat if different random samples are
used when calculating the equation.
1 n x X 2
y' t 2 sest 1 y y'
n n x2 x 2
1 n x X 2
t 2 sest 1
n n x2 x 2
with d.f. n 2
Example 11–14 For the data in Example 11–12, find the 95% prediction interval for the monthly main-
tenance cost of a machine that is 3 years old.
Solution
STEP 1 Find x, x2, and X .
20
x 20 x2 82 X 3.3
6
STEP 2 Find y' for x 3
y' 55.57 8.13x
y' 55.57 8.13(3) 79.96
STEP 3 Find sest
sest 6.48, as shown in Example 11–13.
496 Chapter 11 Correlation and Regression
STEP 4 Substitute in the formula and solve : t /2 2.776, d.f. 6 2 4 for 95%
1 nx X 2
y' t 2 sest 1 y y'
n n x 2 x 2
1 nx X 2
t 2 sest 1
n n x2 x 2
1 63 3.3 2
79.96 2.776 6.48 1 y 79.96
6 682 20 2
1 63 3.3 2
2.7766.48 1
6 682 20 2
79.96 2.776 6.48 1.08 y 79.96 2.776 6.48 1.08
79.96 19.43 y 79.96 19.43
60.53 y 99.39
Hence, one can be 95% confident that the interval 60.53 y 99.39
contains the actual value of y.
Exercises
11–67. What is meant by the explained variation? How is 11–81. Compute the standard error of estimate for
it computed? Exercise 11–13. The regression line equation was found in
Exercise 11–41.
11–68. What is meant by the unexplained variation? How
is it computed? 11–82. Compute the standard error of estimate for
Exercise 11–14. The regression line equation was found in
11–69. What is meant by the total variation? How is it
Exercise 11–42.
computed?
11–83. Compute the standard error of estimate for
11–70. Define coefficient of determination.
Exercise 11–15. The regression line equation was found in
11–71. How is the coefficient of determination found? Exercise 11–43.
11–72. Define coefficient of nondetermination. 11–84. Compute the standard error of estimate for
11–73. How is the coefficient of nondetermination found? Exercise 11–16. The regression line equation was found in
Exercise 11–44.
For Exercises 11–74 through 11–79, find the coefficients
11–85. For the data in Exercises 11–13, 11–41, and 11–81,
of determination and nondetermination and explain the
find the 90% prediction interval when x 20 years.
meaning of each.
11–86. For the data in Exercises 11–14, 11–42, and 11–82,
11–74. r 0.81
find the 95% prediction interval when x $1100.
11–75. r 0.70
11–87. For the data in Exercises 11–15, 11–43, and 11–83,
11–76. r 0.45 find the 90% prediction interval when x 4 years.
11–77. r 0.37 11–88. For the data in Exercises 11–16, 11–44, and 11–84,
find the 98% prediction interval when x 47 years.
11–78. r 0.15
11–79. r 0.05
11–80. Define standard error of estimate for regression.
When can the standard error of estimate be used to
construct a prediction interval about a value y'?