Correlation and Regression
Correlation and Regression
interest rate set by the Federal Reserve. A marketing executive might want to know how
strong the relationship is between advertising dollars and sales dollars for a product or a
company.
In this chapter, we will study the concept of correlation and how it can be used to
estimate the relationship between two variables. We will also explore simple regression
analysis through which mathematical models can be developed to predict one variable
by another. We will examine tools for testing the strength and predictability of regres-
sion models, and we will learn how to use regression analysis to develop a forecasting
trend line.
12.1 CORRELATION
Figure 12.1 depicts five different degrees of correlation: (a) represents strong negative
correlation, (b) represents moderate negative correlation, (c) represents moderate positive
correlation, (d) represents strong positive correlation, and (e) contains no correlation.
What is the measure of correlation between the interest rate of federal funds and the
commodities futures index? With data such as those shown in Table 12.1, which represent
the values for interest rates of federal funds and commodities futures indexes for a sample
of 12 days, a correlation coefficient, r, can be computed.
12.1 Correlation 467
FIGURE 12.1
Five Correlations
(a) Strong Negative Correlation (r = –.933) (b) Moderate Negative Correlation (r = –.674)
(c) Moderate Positive Correlation (r = .518) (d) Strong Positive Correlation (r = .909)
TA B L E 1 2 . 2 Futures
Computation of r for the Interest Index
Economics Example Day x y x2 y2 xy
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
© x = 92.93 © y = 2,725 © x 2 = 720.220 © y 2 = 619,207 © xy = 21,115.07
(92.93)(2725)
(21,115.07) -
12
r = = .815
(92.93)2 (2725)2
c(720.22) - d c(619,207) - d
B 12 12
FIGURE 12.2
Excel Output
Excel and Minitab Output for
the Economics Example
Interest Rate Futures Index
Interest Rate 1
Futures Index 0.815 1
Minitab Output
Correlations: Interest Rate, Futures Index
Pearson correlation of Interest Rate and Futures Index = 0.815
p-Value = 0.001
12.1 PROBLEMS 12.1 Determine the value of the coefficient of correlation, r, for the following data.
X 4 6 7 11 14 17 21
Y 18 12 13 8 7 7 4
12.3 In an effort to determine whether any correlation exists between the price of stocks of
airlines, an analyst sampled six days of activity of the stock market. Using the following
prices of Delta stock and Southwest stock, compute the coefficient of correlation.
Stock prices have been rounded off to the nearest tenth for ease of computation.
Delta Southwest
47.6 15.1
46.3 15.4
50.6 15.9
52.6 15.6
52.4 16.4
52.7 18.1
12.2 Introduction to Simple Regression Analysis 469
12.4 The following data are the claims (in $ millions) for BlueCross BlueShield benefits
for nine states, along with the surplus (in $ millions) that the company had in assets
in those states.
State Claims Surplus
Alabama $1,425 $277
Colorado 273 100
Florida 915 120
Illinois 1,687 259
Maine 234 40
Montana 142 25
North Dakota 259 57
Oklahoma 258 31
Texas 894 141
Compute r for each pair of years and determine which years are most highly correlated.
Regression analysis is the process of constructing a mathematical model or function that can be
used to predict or determine one variable by another variable or other variables. The most ele-
mentary regression model is called simple regression or bivariate regression involving two
variables in which one variable is predicted by another variable. In simple regression, the vari-
able to be predicted is called the dependent variable and is designated as y. The predictor is
called the independent variable, or explanatory variable, and is designated as x. In simple
regression analysis, only a straight-line relationship between two variables is examined.
TA B L E 1 2 . 3
Nonlinear relationships and regression models with more than one independent variable can
Airline Cost Data be explored by using multiple regression models, which are presented in Chapters 13 and 14.
Number of Cost Can the cost of flying a commercial airliner be predicted using regression analysis? If so,
Passengers ($1,000) what variables are related to such cost? A few of the many variables that can potentially con-
tribute are type of plane, distance, number of passengers, amount of luggage/freight,
61 4.280
weather conditions, direction of destination, and perhaps even pilot skill. Suppose a study is
63 4.080
conducted using only Boeing 737s traveling 500 miles on comparable routes during the
67 4.420
same season of the year. Can the number of passengers predict the cost of flying such routes?
69 4.170
It seems logical that more passengers result in more weight and more baggage, which could,
70 4.480
in turn, result in increased fuel consumption and other costs. Suppose the data displayed in
74 4.300
Table 12.3 are the costs and associated number of passengers for twelve 500-mile commer-
76 4.820
cial airline flights using Boeing 737s during the same season of the year. We will use these
81 4.700
data to develop a regression model to predict cost by number of passengers.
86 5.110
Usually, the first step in simple regression analysis is to construct a scatter plot (or
91 5.130
scatter diagram), discussed in Chapter 2. Graphing the data in this way yields preliminary
95 5.640
information about the shape and spread of the data. Figure 12.3 is an Excel scatter plot of
97 5.560
the data in Table 12.3. Figure 12.4 is a close-up view of the scatter plot produced by
470 Chapter 12 Simple Regression Analysis and Correlation
FIGURE 12.3
6.000
Excel Scatter Plot of Airline
Cost Data 5.000
Cost ($1,000)
4.000
3.000
2.000
1.000
0.000
0 20 40 60 80 100 120
Number of Passengers
FIGURE 12.4
Close-Up Minitab Scatter Plot
of Airline Cost Data 5500
5000
Cost
4500
4000
60 70 80 90 100
Number of Passengers
Minitab. Try to imagine a line passing through the points. Is a linear fit possible? Would a
curve fit the data better? The scatter plot gives some idea of how well a regression line fits
the data. Later in the chapter, we present statistical techniques that can be used to deter-
mine more precisely how well a regression line fits the data.
The first step in determining the equation of the regression line that passes through the
sample data is to establish the equation’s form. Several different types of equa-
tions of lines are discussed in algebra, finite math, or analytic geometry courses.
Recall that among these equations of a line are the two-point form, the point-
slope form, and the slope-intercept form. In regression analysis, researchers use
the slope-intercept equation of a line. In math courses, the slope-intercept form of the
equation of a line often takes the form
y = mx + b
where
m = slope of the line
b = y intercept of the line
In statistics, the slope-intercept form of the equation of the regression line through the
population points is
yN = b 0 + b 1x
where
yN = the predicted value of y
b 0 = the population y intercept
b 1 = the population slope
12.3 Determining the Equation of the Regression Line 471
To determine the equation of the regression line for a sample of data, the researcher must
determine the values for b0 and b1. This process is sometimes referred to as least squares
analysis. Least squares analysis is a process whereby a regression model is developed by produc-
ing the minimum sum of the squared error values. On the basis of this premise and calculus, a
particular set of equations has been developed to produce components of the regression
model.*
*Derivation of these formulas is beyond the scope of information being discussed here but is presented in
WileyPLUS.
472 Chapter 12 Simple Regression Analysis and Correlation
FIGURE 12.5
Error of the
Minitab Plot of a Regression
Prediction
Line
Regression Line
Points (X, Y)
Examine the regression line fit through the points in Figure 12.5. Observe that the line
does not actually pass through any of the points. The vertical distance from each point to
the line is the error of the prediction. In theory, an infinite number of lines could be con-
structed to pass through these points in some manner. The least squares regression line is
the regression line that results in the smallest sum of errors squared.
Formula 12.2 is an equation for computing the value of the sample slope. Several ver-
sions of the equation are given to afford latitude in doing the computations.
The expression in the numerator of the slope formula 12.2 appears frequently in this
chapter and is denoted as SSxy .
(©x)(©y)
SSxy = ©(x - x)(y - y) = ©xy -
n
The expression in the denominator of the slope formula 12.2 also appears frequently
in this chapter and is denoted as SSxx .
(©x)2
SSxx = ©(x - x)2 = ©x 2 -
n
With these abbreviations, the equation for the slope can be expressed as in Formula 12.3.
Formula 12.4 is used to compute the sample y intercept. The slope must be computed
before the y intercept.
Formulas 12.2, 12.3, and 12.4 show that the following data are needed from sample
information to compute the slope and intercept: ©x, ©y, ©x 2, and, ©xy, unless sample
means are used. Table 12.4 contains the results of solving for the slope and intercept and
determining the equation of the regression line for the data in Table 12.3.
The least squares equation of the regression line for this problem is
yN = 1.57 + .0407x
12.3 Determining the Equation of the Regression Line 473
TA B L E 1 2 . 4 Number of
Solving for the Slope and the Passengers Cost ($1,000)
y Intercept of the Regression
x y x2 xy
Line for the Airline Cost
61 4.280 3,721 261.080
Example
63 4.080 3,969 257.040
67 4.420 4,489 296.140
69 4.170 4,761 287.730
70 4.480 4,900 313.600
74 4.300 5,476 318.200
76 4.820 5,776 366.320
81 4.700 6,561 380.700
86 5.110 7,396 439.460
91 5.130 8,281 466.830
95 5.640 9,025 535.800
97 5.560 9,409 539.320
© x = 930 © y = 56.690 © x 2 = 73,764 © xy = 4462.220
(©x)(©y) (930)(56.69)
SSxy = ©xy - = 4462.22 - = 68.745
n 12
(©x 2) (930)2
SSxx = ©x 2 - = 73,764 - = 1689
n 12
SSxy 68.745
b1 = = = .0407
SSxx 1689
©y ©x 56.19 930
b0 = - b1 = - (.0407) = 1.57
n n 12 12
yN = 1.57 + .0407x
The slope of this regression line is .0407. Because the x values were recoded for the ease
of computation and are actually in $1,000 denominations, the slope is actually $40.70. One
interpretation of the slope in this problem is that for every unit increase in x (every person
added to the flight of the airplane), there is a $40.70 increase in the cost of the flight. The
y-intercept is the point where the line crosses the y-axis (where x is zero). Sometimes in
regression analysis, the y-intercept is meaningless in terms of the variables studied.
However, in this problem, one interpretation of the y-intercept, which is 1.570 or $1,570, is
that even if there were no passengers on the commercial flight, it would still cost $1,570. In
other words, there are costs associated with a flight that carries no passengers.
Superimposing the line representing the least squares equation for this problem on the
scatter plot indicates how well the regression line fits the data points, as shown in the Excel
graph in Figure 12.6. The next several sections explore mathematical ways of testing how
well the regression line fits the points.
FIGURE 12.6
Excel Graph of Regression 6
Line for the Airline Cost 5
Example
Cost ($1,000)
0
50 55 60 65 70 75 80 85 90 95 100
Number of Passengers
12.4 Residual Analysis 477
How does a business researcher test a regression line to determine whether the line is a good
fit of the data other than by observing the fitted line plot (regression line fit through a scat-
ter plot of the data)? One particularly popular approach is to use the historical data (x and
y values used to construct the regression model) to test the model. With this approach, the
values of the independent variable (x values) are inserted into the regression model and a
predicted value (yN) is obtained for each x value. These predicted values (yN) are then com-
pared to the actual y values to determine how much error the equation of the regression line
produced. Each difference between the actual y values and the predicted y values is the error of
the regression line at a given point, y - yN, and is referred to as the residual. It is the sum of
squares of these residuals that is minimized to find the least squares line.
Table 12.5 shows yN values and the residuals for each pair of data for the airline cost
regression model developed in Section 12.3. The predicted values are calculated by insert-
ing an x value into the equation of the regression line and solving for yN. For example, when
x = 61, yN = 1.57 + .0407(61) = 4.053, as displayed in column 3 of the table. Each of
these predicted y values is subtracted from the actual y value to determine the error, or
residual. For example, the first y value listed in the table is 4.280 and the first predicted
value is 4.053, resulting in a residual of 4.280 - 4.053 = .227. The residuals for this problem
are given in column 4 of the table.
Note that the sum of the residuals is approximately zero. Except for rounding error, the
sum of the residuals is always zero. The reason is that a residual is geometrically the vertical
distance from the regression line to a data point. The equations used to solve for the slope
TA B L E 1 2 . 5 Number of Predicted
Predicted Values and Passengers Cost ($1,000) Value Residual
Residuals for the Airline Cost x y yN y - yN
Example
61 4.280 4.053 .227
63 4.080 4.134 -.054
67 4.420 4.297 .123
69 4.170 4.378 -.208
70 4.480 4.419 .061
74 4.300 4.582 -.282
76 4.820 4.663 .157
81 4.700 4.867 -.167
86 5.110 5.070 .040
91 5.130 5.274 -.144
95 5.640 5.436 .204
97 5.560 5.518 .042
©(y - yN) = - .001
478 Chapter 12 Simple Regression Analysis and Correlation
FIGURE 12.7
Close-Up Minitab Scatter Plot
with Residuals for the Airline .204
5.5
Cost Example
−.144
5.0
Cost .157
4.5
–.282
4.0
60 70 80 90 100
Number of Passengers
and intercept place the line geometrically in the middle of all points. Therefore, vertical dis-
tances from the line to the points will cancel each other and sum to zero. Figure 12.7 is a
Minitab-produced scatter plot of the data and the residuals for the airline cost example.
An examination of the residuals may give the researcher an idea of how well the regres-
sion line fits the historical data points. The largest residual for the airline cost example is -.282,
and the smallest is .040. Because the objective of the regression analysis was to predict the
cost of flight in $1,000s, the regression line produces an error of $282 when there are
74 passengers and an error of only $40 when there are 86 passengers. This result presents
the best and worst cases for the residuals. The researcher must examine other residuals to
determine how well the regression model fits other data points.
Sometimes residuals are used to locate outliers. Outliers are data points that lie apart
from the rest of the points. Outliers can produce residuals with large magnitudes and are
usually easy to identify on scatter plots. Outliers can be the result of misrecorded or mis-
coded data, or they may simply be data points that do not conform to the general trend.
The equation of the regression line is influenced by every data point used in its calculation
in a manner similar to the arithmetic mean. Therefore, outliers sometimes can unduly
influence the regression line by “pulling” the line toward the outliers. The origin of outliers
must be investigated to determine whether they should be retained or whether the regres-
sion equation should be recomputed without them.
Residuals are usually plotted against the x-axis, which reveals a view of the residuals as
x increases. Figure 12.8 shows the residuals plotted by Excel against the x-axis for the air-
line cost example.
FIGURE 12.8
Excel Graph of Residuals for 0.2
the Airline Cost Example
0.1
Residual
0.0
–0.1
–0.2
–0.3
60 70 80 90 100
Number of Passengers
12.4 Residual Analysis 479
0 0 0
x x x
(a) (b)
FIGURE 12.11
Graphs of Nonindependent
Error Terms
0 0
x x
(a) (b)
480 Chapter 12 Simple Regression Analysis and Correlation
FIGURE 12.12
Healthy Residual Graph
0
FIGURE 12.13
Minitab Residual Analyses
90 50000
Residual
Percent
50 0
10 –50000
1 –100000
0.1
−100000 0 100000 100000 150000 200000
Residual Fitted Value
Histogram
30
Frequency
20
10
0
–120000 –80000 –40000 0 40000 80000
Residual
484 Chapter 12 Simple Regression Analysis and Correlation
Residuals represent errors of estimation for individual points. With large samples of data,
residual computations become laborious. Even with computers, a researcher sometimes
has difficulty working through pages of residuals in an effort to understand the error of the
regression model. An alternative way of examining the error of the model is the standard
error of the estimate, which provides a single measurement of the regression error.
Because the sum of the residuals is zero, attempting to determine the total amount of
error by summing the residuals is fruitless. This zero-sum characteristic of residuals can be
avoided by squaring the residuals and then summing them.
Table 12.6 contains the airline cost data from Table 12.3, along with the residuals and
the residuals squared. The total of the residuals squared column is called the sum of squares
of error (SSE).
In theory, infinitely many lines can be fit to a sample of points. However, formulas 12.2
and 12.4 produce a line of best fit for which the SSE is the smallest for any line that can be
fit to the sample data. This result is guaranteed, because formulas 12.2 and 12.4 are derived
from calculus to minimize SSE. For this reason, the regression process used in this chapter
is called least squares regression.
A computational version of the equation for computing SSE is less meaningful in
terms of interpretation than ©(y - yN)2 but it is usually easier to compute. The computa-
tional formula for SSE follows.
COMPUTATIONAL FORMULA
SSE = ©y 2 - b0 ©y - b1 ©xy
FOR SSE
b1 = .0407016*
©y = 56.69
©xy = 4462.22
SSE = ©y 2 - b0 ©y - b1 ©xy
= 270.9251 - (1.5697928)(56.69) - (.0407016)(4462.22) = .31405
The slight discrepancy between this value and the value computed in Table 12.6 is due
to rounding error.
The sum of squares error is in part a function of the number of pairs of data being
used to compute the sum, which lessens the value of SSE as a measurement of error. A more
useful measurement of error is the standard error of the estimate. The standard error of
the estimate, denoted se , is a standard deviation of the error of the regression model and has
a more practical use than SSE. The standard error of the estimate follows.
The standard error of the estimate for the airline cost example is
SSE .31434
se = = = .1773
An - 2 A 10
How is the standard error of the estimate used? As previously mentioned, the standard
error of the estimate is a standard deviation of error. Recall from Chapter 3 that if data are
approximately normally distributed, the empirical rule states that about 68% of all values
are within m ; 1s and that about 95% of all values are within m ; 2s. One of the assump-
tions for regression states that for a given x the error terms are normally distributed.
Because the error terms are normally distributed, se is the standard deviation of error, and
the average error is zero, approximately 68% of the error values (residuals) should be
within 0 ; 1se and 95% of the error values (residuals) should be within 0 ; 2se . By having
knowledge of the variables being studied and by examining the value of se , the researcher
can often make a judgment about the fit of the regression model to the data by using se .
How can the se value for the airline cost example be interpreted?
The regression model in that example is used to predict airline cost by number of
passengers. Note that the range of the airline cost data in Table 12.3 is from 4.08 to 5.64
($4,080 to $5,640). The regression model for the data yields an se of .1773. An interpre-
tation of se is that the standard deviation of error for the airline cost example is $177.30.
If the error terms were normally distributed about the given values of x, approximately
68% of the error terms would be within ;$177.30 and 95% would be within ;2($177.30) =
;$354.60. Examination of the residuals reveals that 100% of the residuals are within 2se.
The standard error of the estimate provides a single measure of error, which, if the
researcher has enough background in the area being analyzed, can be used to understand
the magnitude of errors in the model. In addition, some researchers use the standard
error of the estimate to identify outliers. They do so by looking for data that are outside
;2se or ;3se .
D E M O N S T R AT I O N
PROBLEM 12.3
Compute the sum of squares of error and the standard error of the estimate for
Demonstration Problem 12.1, in which a regression model was developed to predict
the number of FTEs at a hospital by the number of beds.
*Note: In previous sections, the values of the slope and intercept were rounded off for ease of computation and
interpretation. They are shown here with more precision in an effort to reduce rounding error.
518 Chapter 13 Multiple Regression Analysis
Simple regression analysis (discussed in Chapter 12) is bivariate linear regression in which
one dependent variable, y, is predicted by one independent variable, x. Examples of sim-
ple regression applications include models to predict retail sales by population density,
Dow Jones averages by prime interest rates, crude oil production by energy consumption,
and CEO compensation by quarterly sales. However, in many cases, other independent
variables, taken in conjunction with these variables, can make the regression model a bet-
ter fit in predicting the dependent variable. For example, sales could be predicted by the size
of store and number of competitors in addition to population density. A model to predict
the Dow Jones average of 30 industrials could include, in addition to the prime interest
rate, such predictors as yesterday’s volume, the bond interest rate, and the producer price
index. A model to predict CEO compensation could be developed by using variables such
as company earnings per share, age of CEO, and size of company in addition to quarterly
sales. A model could perhaps be developed to predict the cost of outsourcing by such vari-
ables as unit price, export taxes, cost of money, damage in transit, and other factors. Each
of these examples contains only one dependent variable, y, as with simple regression analy-
sis. However, multiple independent variables, x (predictors) are involved. Regression analy-
sis with two or more independent variables or with at least one nonlinear predictor is called
multiple regression analysis.
values are estimated by using sample information. Shown here is the form of the equation
for estimating y with sample information.
yN = b0 + b1x1 + b2x2 + b3x3 + # # # + bk xk
where
yN = the predicted value of y
b0 = the estimate of the regression constant
b1 = the estimate of regression coefficient 1
b2 = the estimate of regression coefficient 2
b3 = the estimate of regression coefficient 3
bk = the estimate of regression coefficient k
k = the number of independent variables
FIGURE 13.1
Real Estate Data
Points in a Sample Space
140
120
100
Price
80
60
40
50 0
4
30 0
5
4 0
4 0
3 0
2
3
0
5
10
2 0
Ag
2
5
1 0
0
1
0
5
5
0
0
0
0
0
5
0
0
e
0
0
0
0
0
0
0
0
sq. ft.
520 Chapter 13 Multiple Regression Analysis
FIGURE 13.2
Real Estate Data
Response Plane for a First-
Order Two-Predictor Multiple 140
Regression Model
120
100
Price
80
40
50 0
4
30 0
5
4 0
4 0
3 0
2
0
5
10
2 0
0
Ag
5
1 0
0
1
0
5
5
0
0
0
0
0
5
0
0
e
0
0
0
0
0
0
0
0
0
sq. ft.
some values of y for various values of x1 and x2. The error of the response plane (H) in pre-
dicting or determining the y values is the distance from the points to the plane.
A number of statistical software packages can perform multiple regression analysis, includ-
ing Excel and Minitab. The output for the Minitab multiple regression analysis on the real estate
data is given in Figure 13.3. (Excel output is shown in Demonstration Problem 13.1.)
The Minitab output for regression analysis begins with “The regression equation is.”
From Figure 13.3, the regression equation for the real estate data in Table 13.1 is
yN = 57.4 + .0177x1 - .666x2
The regression constant, 57.4, is the y-intercept. The y-intercept is the value of yN if both
x1 (number of square feet) and x2 (age) are zero. In this example, a practical understanding
of the y-intercept is meaningless. It makes little sense to say that a house containing no
square feet (x1 = 0) and no years of age (x2 = 0) would cost $57,400. Note in Figure 13.2
that the response plane crosses the y-axis (price) at 57.4.
FIGURE 13.3
Regression Analysis: Price versus Square Feet, Age
Minitab Output of Regression
for the Real Estate Example The regression equation is
Price = 57.4 + 0.0177 Square Feet - 0.666 Age
Predictor Coef SE Coef T P
Constant 57.35 10.01 5.73 0.000
Square Feet 0.017718 0.003146 5.63 0.000
Age –0.6663 0.2280 –2.92 0.008
S = 11.9604 R-Sq = 74.1% R-Sq(adj) = 71.5%
Analysis of Variance
Source DF SS MS F P
Regression 2 8189.7 4094.9 28.63 0.000
Residual Error 20 2861.0 143.1
Total 22 11050.7
522 Chapter 13 Multiple Regression Analysis
The coefficient of x1 (total number of square feet in the house) is .0177, which
means that a one-unit increase in square footage would result in a predicted increase of
.0177 # ($1,000) = $17.70 in the price of the home if age were held constant. All other vari-
ables being held constant, the addition of 1 square foot of space in the house results in a
predicted increase of $17.70 in the price of the home.
The coefficient of x2 (age) is -.666. The negative sign on the coefficient denotes an inverse
relationship between the age of a house and the price of the house: the older the house, the
lower the price. In this case, if the total number of square feet in the house is kept constant, a
one-unit increase in the age of the house (1 year) will result in - .666 # ($1,000) = -$666, a
predicted $666 drop in the price.
In examining the regression coefficients, it is important to remember that the inde-
pendent variables are often measured in different units. It is usually not wise to compare
the regression coefficients of predictors in a multiple regression model and decide that the
variable with the largest regression coefficient is the best predictor. In this example, the two
variables are in different units, square feet and years. Just because x2 has the larger coeffi-
cient (.666) does not necessarily make x2 the strongest predictor of y.
This regression model can be used to predict the price of a house in this small
Louisiana city. If the house has 2500 square feet total and is 12 years old, x1 = 2500 and x2 = 12.
Substituting these values into the regression model yields
The predicted price of the house is $93,658. Figure 13.2 is a graph of these data with
the response plane and the residual distances.
D E M O N S T R AT I O N
PROBLEM 13.1
Since 1980, the prime interest rate in the United States has varied from less than 5%
to over 15%. What factor in the U.S. economy seems to be related to
the prime interest rate? Two possible predictors of the prime interest
rate are the annual unemployment rate and the savings rate in the
United States. Shown below are data for the annual prime interest
rate for the even-numbered years over a 28-year period in the United
States along with the annual unemployment rate and the annual average personal
saving (as a percentage of disposable personal income). Use these data to develop a
multiple regression model to predict the annual prime interest rate by the unemploy-
ment rate and the average personal saving. Determine the predicted prime interest
rate if the unemployment rate is 6.5 and the average personal saving is 5.0.