0% found this document useful (0 votes)

23 views19 pages

Correlation and Regression

Uploaded by

gchandaliabe21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views19 pages

Correlation and Regression

Uploaded by

gchandaliabe21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

466 Chapter 12 Simple Regression Analysis and Correlation

interest rate set by the Federal Reserve. A marketing executive might want to know how
strong the relationship is between advertising dollars and sales dollars for a product or a
company.
In this chapter, we will study the concept of correlation and how it can be used to
estimate the relationship between two variables. We will also explore simple regression
analysis through which mathematical models can be developed to predict one variable
by another. We will examine tools for testing the strength and predictability of regres-
sion models, and we will learn how to use regression analysis to develop a forecasting
trend line.

12.1 CORRELATION

Correlation is a measure of the degree of relatedness of variables. It can help a business

researcher determine, for example, whether the stocks of two airlines rise and fall in any
related manner. For a sample of pairs of data, correlation analysis can yield a
numerical value that represents the degree of relatedness of the two stock prices
over time. In the transportation industry, is a correlation evident between the
price of transportation and the weight of the object being shipped? If so, how
strong are the correlations? In economics, how strong is the correlation between the pro-
ducer price index and the unemployment rate? In retail sales, are sales related to popu-
TA B L E 1 2 . 1 lation density, number of competitors, size of the store, amount of advertising, or other
Data for the Economics variables?
Example Several measures of correlation are available, the selection of which depends mostly
Interest Futures on the level of data being analyzed. Ideally, researchers would like to solve for r, the pop-
Day Rate Index ulation coefficient of correlation. However, because researchers virtually always deal with
sample data, this section introduces a widely used sample coefficient of correlation, r.
1 7.43 221 This measure is applicable only if both variables being analyzed have at least an interval
2 7.48 222 level of data. Chapter 17 presents a correlation measure that can be used when the data
3 8.00 226 are ordinal.
4 7.75 225 The statistic r is the Pearson product-moment correlation coefficient, named after
5 7.60 224 Karl Pearson (1857–1936), an English statistician who developed several coefficients of cor-
6 7.63 223
relation along with other significant statistical concepts. The term r is a measure of the linear
7 7.68 223
correlation of two variables. It is a number that ranges from -1 to 0 to +1, representing the
8 7.67 226
strength of the relationship between the variables. An r value of +1 denotes a perfect posi-
9 7.59 226
tive relationship between two sets of numbers. An r value of -1 denotes a perfect negative
10 8.07 235
correlation, which indicates an inverse relationship between two variables: as one variable
11 8.03 233
gets larger, the other gets smaller. An r value of 0 means no linear relationship is present
12 8.00 241 between the two variables.

PEARSON PRODUCT- (©x©y)

MOMENT CORRELATION ©xy -
©(x - x)(y - y) n
COEFFICIENT (12.1) r = =
2©(x - x)2 ©(y - y)2 (©x)2 (©y)2
c ©x 2 - d c ©y 2 - d
C n n

Figure 12.1 depicts five different degrees of correlation: (a) represents strong negative
correlation, (b) represents moderate negative correlation, (c) represents moderate positive
correlation, (d) represents strong positive correlation, and (e) contains no correlation.
What is the measure of correlation between the interest rate of federal funds and the
commodities futures index? With data such as those shown in Table 12.1, which represent
the values for interest rates of federal funds and commodities futures indexes for a sample
of 12 days, a correlation coefficient, r, can be computed.
12.1 Correlation 467

FIGURE 12.1
Five Correlations

(a) Strong Negative Correlation (r = –.933) (b) Moderate Negative Correlation (r = –.674)

(e) Virtually No Correlation (r = –.004)

Examination of the formula for computing a Pearson product-moment correlation

coefficient (12.1) reveals that the following values must be obtained to compute r : ©x, ©x 2,
©y, ©y 2, ©xy, and n. In correlation analysis, it does not matter which variable is designated
x and which is designated y. For this example, the correlation coefficient is computed as
shown in Table 12.2. The r value obtained (r = .815) represents a relatively strong positive
relationship between interest rates and commodities futures index over this 12-day period.
Figure 12.2 shows both Excel and Minitab output for this problem.
468 Chapter 12 Simple Regression Analysis and Correlation

TA B L E 1 2 . 2 Futures
Computation of r for the Interest Index
Economics Example Day x y x2 y2 xy
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
© x = 92.93 © y = 2,725 © x 2 = 720.220 © y 2 = 619,207 © xy = 21,115.07
(92.93)(2725)
(21,115.07) -
12
r = = .815
(92.93)2 (2725)2
c(720.22) - d c(619,207) - d
B 12 12

FIGURE 12.2
Excel Output
Excel and Minitab Output for
the Economics Example
Interest Rate Futures Index
Interest Rate 1
Futures Index 0.815 1

Minitab Output
Correlations: Interest Rate, Futures Index
Pearson correlation of Interest Rate and Futures Index = 0.815
p-Value = 0.001

12.1 PROBLEMS 12.1 Determine the value of the coefficient of correlation, r, for the following data.
X 4 6 7 11 14 17 21
Y 18 12 13 8 7 7 4

12.2 Determine the value of r for the following data.

X 158 296 87 110 436
Y 349 510 301 322 550

12.3 In an effort to determine whether any correlation exists between the price of stocks of
airlines, an analyst sampled six days of activity of the stock market. Using the following
prices of Delta stock and Southwest stock, compute the coefficient of correlation.
Stock prices have been rounded off to the nearest tenth for ease of computation.
Delta Southwest
47.6 15.1
46.3 15.4
50.6 15.9
52.6 15.6
52.4 16.4
52.7 18.1
12.2 Introduction to Simple Regression Analysis 469

12.4 The following data are the claims (in $ millions) for BlueCross BlueShield benefits
for nine states, along with the surplus (in $ millions) that the company had in assets
in those states.
State Claims Surplus
Alabama $1,425 $277
Colorado 273 100
Florida 915 120
Illinois 1,687 259
Maine 234 40
Montana 142 25
North Dakota 259 57
Oklahoma 258 31
Texas 894 141

Use the data to compute a correlation coefficient, r, to determine the correlation

between claims and surplus.
12.5 The National Safety Council released the following data on the incidence rates for fatal
or lost-worktime injuries per 100 employees for several industries in three recent years.
Industry Year 1 Year 2 Year 3
Textile .46 .48 .69
Chemical .52 .62 .63
Communication .90 .72 .81
Machinery 1.50 1.74 2.10
Services 2.89 2.03 2.46
Nonferrous metals 1.80 1.92 2.00
Food 3.29 3.18 3.17
Government 5.73 4.43 4.00

Compute r for each pair of years and determine which years are most highly correlated.

12.2 INTRODUCTION TO SIMPLE REGRESSION ANALYSIS

Regression analysis is the process of constructing a mathematical model or function that can be
used to predict or determine one variable by another variable or other variables. The most ele-
mentary regression model is called simple regression or bivariate regression involving two
variables in which one variable is predicted by another variable. In simple regression, the vari-
able to be predicted is called the dependent variable and is designated as y. The predictor is
called the independent variable, or explanatory variable, and is designated as x. In simple
regression analysis, only a straight-line relationship between two variables is examined.
TA B L E 1 2 . 3
Nonlinear relationships and regression models with more than one independent variable can
Airline Cost Data be explored by using multiple regression models, which are presented in Chapters 13 and 14.
Number of Cost Can the cost of flying a commercial airliner be predicted using regression analysis? If so,
Passengers ($1,000) what variables are related to such cost? A few of the many variables that can potentially con-
tribute are type of plane, distance, number of passengers, amount of luggage/freight,
61 4.280
weather conditions, direction of destination, and perhaps even pilot skill. Suppose a study is
63 4.080
conducted using only Boeing 737s traveling 500 miles on comparable routes during the
67 4.420
same season of the year. Can the number of passengers predict the cost of flying such routes?
69 4.170
It seems logical that more passengers result in more weight and more baggage, which could,
70 4.480
in turn, result in increased fuel consumption and other costs. Suppose the data displayed in
74 4.300
Table 12.3 are the costs and associated number of passengers for twelve 500-mile commer-
76 4.820
cial airline flights using Boeing 737s during the same season of the year. We will use these
81 4.700
data to develop a regression model to predict cost by number of passengers.
86 5.110
Usually, the first step in simple regression analysis is to construct a scatter plot (or
91 5.130
scatter diagram), discussed in Chapter 2. Graphing the data in this way yields preliminary
95 5.640
information about the shape and spread of the data. Figure 12.3 is an Excel scatter plot of
97 5.560
the data in Table 12.3. Figure 12.4 is a close-up view of the scatter plot produced by
470 Chapter 12 Simple Regression Analysis and Correlation

FIGURE 12.3
6.000
Excel Scatter Plot of Airline
Cost Data 5.000

Cost ($1,000)
4.000

3.000

2.000

1.000

0.000
0 20 40 60 80 100 120
Number of Passengers

FIGURE 12.4
Close-Up Minitab Scatter Plot
of Airline Cost Data 5500

5000
Cost

4500

4000
60 70 80 90 100
Number of Passengers

Minitab. Try to imagine a line passing through the points. Is a linear fit possible? Would a
curve fit the data better? The scatter plot gives some idea of how well a regression line fits
the data. Later in the chapter, we present statistical techniques that can be used to deter-
mine more precisely how well a regression line fits the data.

12.3 DETERMINING THE EQUATION OF THE REGRESSION LINE

The first step in determining the equation of the regression line that passes through the
sample data is to establish the equation’s form. Several different types of equa-
tions of lines are discussed in algebra, finite math, or analytic geometry courses.
Recall that among these equations of a line are the two-point form, the point-
slope form, and the slope-intercept form. In regression analysis, researchers use
the slope-intercept equation of a line. In math courses, the slope-intercept form of the
equation of a line often takes the form
y = mx + b
where
m = slope of the line
b = y intercept of the line
In statistics, the slope-intercept form of the equation of the regression line through the
population points is
yN = b 0 + b 1x
where
yN = the predicted value of y
b 0 = the population y intercept
b 1 = the population slope
12.3 Determining the Equation of the Regression Line 471

For any specific dependent variable value, yi ,

yi = b 0 + b 1xi + H i
where
xi = the value of the independent variable for the ith value
yi = the value of the dependent variable for the ith value
b 0 = the population y intercept
b 1 = the population slope
H i = the error of prediction for the ith value
Unless the points being fitted by the regression equation are in perfect alignment, the
regression line will miss at least some of the points. In the preceding equation, H i represents the
error of the regression line in fitting these points. If a point is on the regression line, H i = 0.
These mathematical models can be either deterministic models or probabilistic models.
Deterministic models are mathematical models that produce an “exact” output for a given
input. For example, suppose the equation of a regression line is
y = 1.68 + 2.40x
For a value of x = 5, the exact predicted value of y is
y = 1.68 + 2.40(5) = 13.68
We recognize, however, that most of the time the values of y will not equal exactly the
values yielded by the equation. Random error will occur in the prediction of the y values
for values of x because it is likely that the variable x does not explain all the variability of
the variable y. For example, suppose we are trying to predict the volume of sales (y) for a
company through regression analysis by using the annual dollar amount of advertising (x)
as the predictor. Although sales are often related to advertising, other factors related to sales
are not accounted for by amount of advertising. Hence, a regression model to predict sales
volume by amount of advertising probably involves some error. For this reason, in regres-
sion, we present the general model as a probabilistic model. A probabilistic model is one
that includes an error term that allows for the y values to vary for any given value of x.
A deterministic regression model is
y = b 0 + b 1x
The probabilistic regression model is
y = b 0 + b 1x + H
b 0 + b 1 x is the deterministic portion of the probabilistic model, b 0 + b 1x + H.
In a deterministic model, all points are assumed to be on the line and in all cases H is zero.
Virtually all regression analyses of business data involve sample data, not population
data. As a result, b 0 and b 1 are unattainable and must be estimated by using the sample sta-
tistics, b0 and b1. Hence the equation of the regression line contains the sample y intercept,
b0, and the sample slope, b1.

EQUATION OF THE SIMPLE yN = b0 + b1x

REGRESSION LINE Where
b0 = the sample intercept
b1 = the sample slope

To determine the equation of the regression line for a sample of data, the researcher must
determine the values for b0 and b1. This process is sometimes referred to as least squares
analysis. Least squares analysis is a process whereby a regression model is developed by produc-
ing the minimum sum of the squared error values. On the basis of this premise and calculus, a
particular set of equations has been developed to produce components of the regression
model.*
*Derivation of these formulas is beyond the scope of information being discussed here but is presented in
WileyPLUS.
472 Chapter 12 Simple Regression Analysis and Correlation

FIGURE 12.5
Error of the
Minitab Plot of a Regression
Prediction
Line

Regression Line

Points (X, Y)

Examine the regression line fit through the points in Figure 12.5. Observe that the line
does not actually pass through any of the points. The vertical distance from each point to
the line is the error of the prediction. In theory, an infinite number of lines could be con-
structed to pass through these points in some manner. The least squares regression line is
the regression line that results in the smallest sum of errors squared.
Formula 12.2 is an equation for computing the value of the sample slope. Several ver-
sions of the equation are given to afford latitude in doing the computations.

SLOPE OF THE REGRESSION (©x)(©y)

LINE (12.2) ©xy -
©(x - x)(y - y) ©xy - nx y n
b1 = = =
©(x - x) 2
©x 2 - nx 2 (©x)2
©x 2 -
n

The expression in the numerator of the slope formula 12.2 appears frequently in this
chapter and is denoted as SSxy .
(©x)(©y)
SSxy = ©(x - x)(y - y) = ©xy -
n
The expression in the denominator of the slope formula 12.2 also appears frequently
in this chapter and is denoted as SSxx .
(©x)2
SSxx = ©(x - x)2 = ©x 2 -
n
With these abbreviations, the equation for the slope can be expressed as in Formula 12.3.

ALTERNATIVE FORMULA SSxy

FOR SLOPE (12.3) b1 =
SSxx

Formula 12.4 is used to compute the sample y intercept. The slope must be computed
before the y intercept.

y INTERCEPT OF THE ©y (©x)

REGRESSION LINE (12.4) b0 = y - b1x = - b1
n n

Formulas 12.2, 12.3, and 12.4 show that the following data are needed from sample
information to compute the slope and intercept: ©x, ©y, ©x 2, and, ©xy, unless sample
means are used. Table 12.4 contains the results of solving for the slope and intercept and
determining the equation of the regression line for the data in Table 12.3.
The least squares equation of the regression line for this problem is
yN = 1.57 + .0407x
12.3 Determining the Equation of the Regression Line 473

TA B L E 1 2 . 4 Number of
Solving for the Slope and the Passengers Cost ($1,000)
y Intercept of the Regression
x y x2 xy
Line for the Airline Cost
61 4.280 3,721 261.080
Example
63 4.080 3,969 257.040
67 4.420 4,489 296.140
69 4.170 4,761 287.730
70 4.480 4,900 313.600
74 4.300 5,476 318.200
76 4.820 5,776 366.320
81 4.700 6,561 380.700
86 5.110 7,396 439.460
91 5.130 8,281 466.830
95 5.640 9,025 535.800
97 5.560 9,409 539.320
© x = 930 © y = 56.690 © x 2 = 73,764 © xy = 4462.220
(©x)(©y) (930)(56.69)
SSxy = ©xy - = 4462.22 - = 68.745
n 12
(©x 2) (930)2
SSxx = ©x 2 - = 73,764 - = 1689
n 12
SSxy 68.745
b1 = = = .0407
SSxx 1689
©y ©x 56.19 930
b0 = - b1 = - (.0407) = 1.57
n n 12 12
yN = 1.57 + .0407x

The slope of this regression line is .0407. Because the x values were recoded for the ease
of computation and are actually in $1,000 denominations, the slope is actually $40.70. One
interpretation of the slope in this problem is that for every unit increase in x (every person
added to the flight of the airplane), there is a $40.70 increase in the cost of the flight. The
y-intercept is the point where the line crosses the y-axis (where x is zero). Sometimes in
regression analysis, the y-intercept is meaningless in terms of the variables studied.
However, in this problem, one interpretation of the y-intercept, which is 1.570 or $1,570, is
that even if there were no passengers on the commercial flight, it would still cost $1,570. In
other words, there are costs associated with a flight that carries no passengers.
Superimposing the line representing the least squares equation for this problem on the
scatter plot indicates how well the regression line fits the data points, as shown in the Excel
graph in Figure 12.6. The next several sections explore mathematical ways of testing how
well the regression line fits the points.

FIGURE 12.6
Excel Graph of Regression 6
Line for the Airline Cost 5
Example
Cost ($1,000)

0
50 55 60 65 70 75 80 85 90 95 100
Number of Passengers
12.4 Residual Analysis 477

Raw Steel Production New Orders

(100,000s of net tons) ($ trillions)
99.9 2.74
97.9 2.87
98.9 2.93
87.9 2.87
92.9 2.98
97.9 3.09
100.6 3.36
104.9 3.61
105.3 3.75
108.6 3.95

12.4 RESIDUAL ANALYSIS

How does a business researcher test a regression line to determine whether the line is a good
fit of the data other than by observing the fitted line plot (regression line fit through a scat-
ter plot of the data)? One particularly popular approach is to use the historical data (x and
y values used to construct the regression model) to test the model. With this approach, the
values of the independent variable (x values) are inserted into the regression model and a
predicted value (yN) is obtained for each x value. These predicted values (yN) are then com-
pared to the actual y values to determine how much error the equation of the regression line
produced. Each difference between the actual y values and the predicted y values is the error of
the regression line at a given point, y - yN, and is referred to as the residual. It is the sum of
squares of these residuals that is minimized to find the least squares line.
Table 12.5 shows yN values and the residuals for each pair of data for the airline cost
regression model developed in Section 12.3. The predicted values are calculated by insert-
ing an x value into the equation of the regression line and solving for yN. For example, when
x = 61, yN = 1.57 + .0407(61) = 4.053, as displayed in column 3 of the table. Each of
these predicted y values is subtracted from the actual y value to determine the error, or
residual. For example, the first y value listed in the table is 4.280 and the first predicted
value is 4.053, resulting in a residual of 4.280 - 4.053 = .227. The residuals for this problem
are given in column 4 of the table.
Note that the sum of the residuals is approximately zero. Except for rounding error, the
sum of the residuals is always zero. The reason is that a residual is geometrically the vertical
distance from the regression line to a data point. The equations used to solve for the slope

TA B L E 1 2 . 5 Number of Predicted
Predicted Values and Passengers Cost ($1,000) Value Residual
Residuals for the Airline Cost x y yN y - yN
Example
61 4.280 4.053 .227
63 4.080 4.134 -.054
67 4.420 4.297 .123
69 4.170 4.378 -.208
70 4.480 4.419 .061
74 4.300 4.582 -.282
76 4.820 4.663 .157
81 4.700 4.867 -.167
86 5.110 5.070 .040
91 5.130 5.274 -.144
95 5.640 5.436 .204
97 5.560 5.518 .042
©(y - yN) = - .001
478 Chapter 12 Simple Regression Analysis and Correlation

FIGURE 12.7
Close-Up Minitab Scatter Plot
with Residuals for the Airline .204
5.5
Cost Example

−.144

5.0

Cost .157

4.5
–.282

4.0
60 70 80 90 100
Number of Passengers

and intercept place the line geometrically in the middle of all points. Therefore, vertical dis-
tances from the line to the points will cancel each other and sum to zero. Figure 12.7 is a
Minitab-produced scatter plot of the data and the residuals for the airline cost example.
An examination of the residuals may give the researcher an idea of how well the regres-
sion line fits the historical data points. The largest residual for the airline cost example is -.282,
and the smallest is .040. Because the objective of the regression analysis was to predict the
cost of flight in $1,000s, the regression line produces an error of $282 when there are
74 passengers and an error of only $40 when there are 86 passengers. This result presents
the best and worst cases for the residuals. The researcher must examine other residuals to
determine how well the regression model fits other data points.
Sometimes residuals are used to locate outliers. Outliers are data points that lie apart
from the rest of the points. Outliers can produce residuals with large magnitudes and are
usually easy to identify on scatter plots. Outliers can be the result of misrecorded or mis-
coded data, or they may simply be data points that do not conform to the general trend.
The equation of the regression line is influenced by every data point used in its calculation
in a manner similar to the arithmetic mean. Therefore, outliers sometimes can unduly
influence the regression line by “pulling” the line toward the outliers. The origin of outliers
must be investigated to determine whether they should be retained or whether the regres-
sion equation should be recomputed without them.
Residuals are usually plotted against the x-axis, which reveals a view of the residuals as
x increases. Figure 12.8 shows the residuals plotted by Excel against the x-axis for the air-
line cost example.

FIGURE 12.8
Excel Graph of Residuals for 0.2
the Airline Cost Example
0.1
Residual

0.0
–0.1
–0.2
–0.3
60 70 80 90 100
Number of Passengers
12.4 Residual Analysis 479

FIGURE 12.9 FIGURE 12.10

Nonlinear Residual Plot Nonconstant Error Variance

0 0 0

x x x
(a) (b)

Using Residuals to Test the Assumptions

of the Regression Model
One of the major uses of residual analysis is to test some of the assumptions underlying
regression. The following are the assumptions of simple regression analysis.
1. The model is linear.
2. The error terms have constant variances.
3. The error terms are independent.
4. The error terms are normally distributed.
A particular method for studying the behavior of residuals is the residual plot. The
residual plot is a type of graph in which the residuals for a particular regression model are
plotted along with their associated value of x as an ordered pair (x, y - yN). Information
about how well the regression assumptions are met by the particular regression model can
be gleaned by examining the plots. Residual plots are more meaningful with larger sample
sizes. For small sample sizes, residual plot analyses can be problematic and subject to over-
interpretation. Hence, because the airline cost example is constructed from only 12 pairs of
data, one should be cautious in reaching conclusions from Figure 12.8. The residual plots
in Figures 12.9, 12.10, and 12.11, however, represent large numbers of data points and
therefore are more likely to depict overall trends accurately.
If a residual plot such as the one in Figure 12.9 appears, the assumption that the model
is linear does not hold. Note that the residuals are negative for low and high values of x and
are positive for middle values of x. The graph of these residuals is parabolic, not linear. The
residual plot does not have to be shaped in this manner for a nonlinear relationship to
exist. Any significant deviation from an approximately linear residual plot may mean that
a nonlinear relationship exists between the two variables.
The assumption of constant error variance sometimes is called homoscedasticity. If the
error variances are not constant (called heteroscedasticity), the residual plots might look like one
of the two plots in Figure 12.10. Note in Figure 12.10(a) that the error variance is greater for
small values of x and smaller for large values of x. The situation is reversed in Figure 12.10(b).
If the error terms are not independent, the residual plots could look like one of the
graphs in Figure 12.11. According to these graphs, instead of each error term being inde-
pendent of the one next to it, the value of the residual is a function of the residual value
next to it. For example, a large positive residual is next to a large positive residual and a
small negative residual is next to a small negative residual.
The graph of the residuals from a regression analysis that meets the assumptions—a
healthy residual graph—might look like the graph in Figure 12.12. The plot is relatively
linear; the variances of the errors are about equal for each value of x, and the error terms
do not appear to be related to adjacent terms.

FIGURE 12.11
Graphs of Nonindependent
Error Terms
0 0

x x
(a) (b)
480 Chapter 12 Simple Regression Analysis and Correlation

FIGURE 12.12
Healthy Residual Graph
0

Using the Computer for Residual Analysis

Some computer programs contain mechanisms for analyzing residuals for violations of
the regression assumptions. Minitab has the capability of providing graphical analysis of
residuals. Figure 12.13 displays Minitab’s residual graphic analyses for a regression model
developed to predict the production of carrots in the United States per month by the total
production of sweet corn. The data were gathered over a time period of 168 consecutive
months (see WileyPLUS for the agricultural database).
These Minitab residual model diagnostics consist of three different plots. The graph
on the upper right is a plot of the residuals versus the fits. Note that this residual plot
“flares-out” as x gets larger. This pattern is an indication of heteroscedasticity, which is a
violation of the assumption of constant variance for error terms. The graph in the upper
left is a normal probability plot of the residuals. A straight line indicates that the residuals
are normally distributed. Observe that this normal plot is relatively close to being a straight
line, indicating that the residuals are nearly normal in shape. This normal distribution is
confirmed by the graph on the lower left, which is a histogram of the residuals. The histogram
groups residuals in classes so the researcher can observe where groups of the residuals lie
without having to rely on the residual plot and to validate the notion that the residuals are
approximately normally distributed. In this problem, the pattern is indicative of at least a
mound-shaped distribution of residuals.

FIGURE 12.13
Minitab Residual Analyses

Normal Probability Plot Versus Fits

99.9
99 100000

90 50000
Residual
Percent

50 0

10 –50000

1 –100000
0.1
−100000 0 100000 100000 150000 200000
Residual Fitted Value

Histogram
30
Frequency

0
–120000 –80000 –40000 0 40000 80000
Residual
484 Chapter 12 Simple Regression Analysis and Correlation

12.5 STANDARD ERROR OF THE ESTIMATE

Residuals represent errors of estimation for individual points. With large samples of data,
residual computations become laborious. Even with computers, a researcher sometimes
has difficulty working through pages of residuals in an effort to understand the error of the
regression model. An alternative way of examining the error of the model is the standard
error of the estimate, which provides a single measurement of the regression error.
Because the sum of the residuals is zero, attempting to determine the total amount of
error by summing the residuals is fruitless. This zero-sum characteristic of residuals can be
avoided by squaring the residuals and then summing them.
Table 12.6 contains the airline cost data from Table 12.3, along with the residuals and
the residuals squared. The total of the residuals squared column is called the sum of squares
of error (SSE).

SUM OF SQUARES OF ERROR SSE = ©(y - yN)2

In theory, infinitely many lines can be fit to a sample of points. However, formulas 12.2
and 12.4 produce a line of best fit for which the SSE is the smallest for any line that can be
fit to the sample data. This result is guaranteed, because formulas 12.2 and 12.4 are derived
from calculus to minimize SSE. For this reason, the regression process used in this chapter
is called least squares regression.
A computational version of the equation for computing SSE is less meaningful in
terms of interpretation than ©(y - yN)2 but it is usually easier to compute. The computa-
tional formula for SSE follows.

COMPUTATIONAL FORMULA
SSE = ©y 2 - b0 ©y - b1 ©xy
FOR SSE

For the airline cost example,

©y 2 = ©[(4.280)2 + (4.080)2 + (4.420)2 + (4.170)2 + (4.480)2 + (4.300)2 + (4.820)2
+(4.700)2 + (5.110)2 + (5.130)2 + (5.640)2 + (5.560)2] = 270.9251
b0 = 1.5697928

TA B L E 1 2 . 6 Number of Passengers Cost ($1,000) Residual

Determining SSE for the x y y - yN (y - yN)2
Airline Cost Example
61 4.280 .227 .05153
63 4.080 -.054 .00292
67 4.420 .123 .01513
69 4.170 -.208 .04326
70 4.480 .061 .00372
74 4.300 -.282 .07952
76 4.820 .157 .02465
81 4.700 -.167 .02789
86 5.110 .040 .00160
91 5.130 -.144 .02074
95 5.640 .204 .04162
97 5.560 .042 .00176
©(y - yN) = - .001 ©(y - yN)2 = .31434
Sum of squares of error = SSE = . 31434
12.5 Standard Error of the Estimate 485

b1 = .0407016*
©y = 56.69
©xy = 4462.22
SSE = ©y 2 - b0 ©y - b1 ©xy
= 270.9251 - (1.5697928)(56.69) - (.0407016)(4462.22) = .31405
The slight discrepancy between this value and the value computed in Table 12.6 is due
to rounding error.
The sum of squares error is in part a function of the number of pairs of data being
used to compute the sum, which lessens the value of SSE as a measurement of error. A more
useful measurement of error is the standard error of the estimate. The standard error of
the estimate, denoted se , is a standard deviation of the error of the regression model and has
a more practical use than SSE. The standard error of the estimate follows.

STANDARD ERROR OF SSE

THE ESTIMATE se =
An - 2

The standard error of the estimate for the airline cost example is
SSE .31434
se = = = .1773
An - 2 A 10
How is the standard error of the estimate used? As previously mentioned, the standard
error of the estimate is a standard deviation of error. Recall from Chapter 3 that if data are
approximately normally distributed, the empirical rule states that about 68% of all values
are within m ; 1s and that about 95% of all values are within m ; 2s. One of the assump-
tions for regression states that for a given x the error terms are normally distributed.
Because the error terms are normally distributed, se is the standard deviation of error, and
the average error is zero, approximately 68% of the error values (residuals) should be
within 0 ; 1se and 95% of the error values (residuals) should be within 0 ; 2se . By having
knowledge of the variables being studied and by examining the value of se , the researcher
can often make a judgment about the fit of the regression model to the data by using se .
How can the se value for the airline cost example be interpreted?
The regression model in that example is used to predict airline cost by number of
passengers. Note that the range of the airline cost data in Table 12.3 is from 4.08 to 5.64
($4,080 to $5,640). The regression model for the data yields an se of .1773. An interpre-
tation of se is that the standard deviation of error for the airline cost example is $177.30.
If the error terms were normally distributed about the given values of x, approximately
68% of the error terms would be within ;$177.30 and 95% would be within ;2($177.30) =
;$354.60. Examination of the residuals reveals that 100% of the residuals are within 2se.
The standard error of the estimate provides a single measure of error, which, if the
researcher has enough background in the area being analyzed, can be used to understand
the magnitude of errors in the model. In addition, some researchers use the standard
error of the estimate to identify outliers. They do so by looking for data that are outside
;2se or ;3se .

D E M O N S T R AT I O N
PROBLEM 12.3
Compute the sum of squares of error and the standard error of the estimate for
Demonstration Problem 12.1, in which a regression model was developed to predict
the number of FTEs at a hospital by the number of beds.

*Note: In previous sections, the values of the slope and intercept were rounded off for ease of computation and
interpretation. They are shown here with more precision in an effort to reduce rounding error.
518 Chapter 13 Multiple Regression Analysis

Simple regression analysis (discussed in Chapter 12) is bivariate linear regression in which
one dependent variable, y, is predicted by one independent variable, x. Examples of sim-
ple regression applications include models to predict retail sales by population density,
Dow Jones averages by prime interest rates, crude oil production by energy consumption,
and CEO compensation by quarterly sales. However, in many cases, other independent
variables, taken in conjunction with these variables, can make the regression model a bet-
ter fit in predicting the dependent variable. For example, sales could be predicted by the size
of store and number of competitors in addition to population density. A model to predict
the Dow Jones average of 30 industrials could include, in addition to the prime interest
rate, such predictors as yesterday’s volume, the bond interest rate, and the producer price
index. A model to predict CEO compensation could be developed by using variables such
as company earnings per share, age of CEO, and size of company in addition to quarterly
sales. A model could perhaps be developed to predict the cost of outsourcing by such vari-
ables as unit price, export taxes, cost of money, damage in transit, and other factors. Each
of these examples contains only one dependent variable, y, as with simple regression analy-
sis. However, multiple independent variables, x (predictors) are involved. Regression analy-
sis with two or more independent variables or with at least one nonlinear predictor is called
multiple regression analysis.

13.1 THE MULTIPLE REGRESSION MODEL

Multiple regression analysis is similar in principle to simple regression analysis. However,

it is more complex conceptually and computationally. Recall from Chapter 12 that the
equation of the probabilistic simple regression model is
y = b 0 + b 1x + H
where
y = the value of the dependent variable
b0 = the population y intercept
b1 = the population slope
H = the error of prediction
Extending this notion to multiple regression gives the general equation for the proba-
bilistic multiple regression model.
y = b 0 + b 1x1 + b 2x2 + b 3 x3 + # # # + b k xk + H
where
y = the value of the dependent variable
b0 = the regression constant
b1 = the partial regression coefficient for independent variable 1
b2 = the partial regression coefficient for independent variable 2
b3 = the partial regression coefficient for independent variable 3
bk = the partial regression coefficient for independent variable k
k = the number of independent variables
In multiple regression analysis, the dependent variable, y, is sometimes referred to as the
response variable. The partial regression coefficient of an independent variable, b i , repre-
sents the increase that will occur in the value of y from a one-unit increase in that independent
variable if all other variables are held constant. The “full” (versus partial) regression coefficient
of an independent variable is a coefficient obtained from the bivariate model (simple regres-
sion) in which the independent variable is the sole predictor of y. The partial regression coef-
ficients occur because more than one predictor is included in a model. The partial regression
coefficients are analogous to b 1, the slope of the simple regression model in Chapter 12.
In actuality, the partial regression coefficients and the regression constant of a multi-
ple regression model are population values and are unknown. In virtually all research, these
13.1 The Multiple Regression Model 519

values are estimated by using sample information. Shown here is the form of the equation
for estimating y with sample information.
yN = b0 + b1x1 + b2x2 + b3x3 + # # # + bk xk
where
yN = the predicted value of y
b0 = the estimate of the regression constant
b1 = the estimate of regression coefficient 1
b2 = the estimate of regression coefficient 2
b3 = the estimate of regression coefficient 3
bk = the estimate of regression coefficient k
k = the number of independent variables

Multiple Regression Model with Two Independent

Variables (First-Order)
The simplest multiple regression model is one constructed with two independent variables,
where the highest power of either variable is 1 (first-order regression model). The regres-
sion model is
y = b 0 + b 1x1 + b 2x2 + H
The constant and coefficients are estimated from sample information, resulting in the
following model.
yN = b0 + b1x1 + b2x2
Figure 13.1 is a three-dimensional graph of a series of points (x1, x2, y) representing val-
ues from three variables used in a multiple regression model to predict the sales price of a
house by the number of square feet in the house and the age of the house. Simple regression
models yield a line that is fit through data points in the xy plane. In multiple regression
analysis, the resulting model produces a response surface. In the multiple regression model
shown here with two independent first-order variables, the response surface is a response
plane. The response plane for such a model is fit in a three-dimensional space (x1, x2, y).
If such a response plane is fit into the points shown in Figure 13.1, the result is the
graph in Figure 13.2. Notice that most of the points are not on the plane. As in simple
regression, an error in the fit of the model in multiple regression is usually present. The dis-
tances shown in the graph from the points to the response plane are the errors of fit, or
residuals (y - yN). Multiple regression models with three or more independent variables
involve more than three dimensions and are difficult to depict geometrically.
Observe in Figure 13.2 that the regression model attempts to fit a plane into the three-
dimensional plot of points. Notice that the plane intercepts the y axis. Figure 13.2 depicts

FIGURE 13.1
Real Estate Data
Points in a Sample Space
140

120

100
Price

40
50 0
4
30 0
5
4 0
4 0
3 0

2
3

0
5

10
2 0

Ag
2

5
1 0

0
1

0
5
5

0
0
0
0

0
5

0
0

e
0
0
0

0
0
0

0
0

sq. ft.
520 Chapter 13 Multiple Regression Analysis

FIGURE 13.2
Real Estate Data
Response Plane for a First-
Order Two-Predictor Multiple 140
Regression Model
120

100

Price
80

40
50 0
4
30 0

5
4 0
4 0
3 0
2

0
5
10

2 0

0
Ag

5
1 0

0
1

0
5
5
0

0
0
0

0
5

0
0
e

0
0
0

0
0

0
sq. ft.

some values of y for various values of x1 and x2. The error of the response plane (H) in pre-
dicting or determining the y values is the distance from the points to the plane.

Determining the Multiple Regression Equation

The simple regression equations for determining the sample slope and intercept given in
Chapter 12 are the result of using methods of calculus to minimize the sum of squares of
error for the regression model. The procedure for developing these equations involves solv-
ing two simultaneous equations with two unknowns, b0 and b1. Finding the sample slope
and intercept from these formulas requires the values of ©x, ©y, ©xy, and ©x 2.
The procedure for determining formulas to solve for multiple regression coefficients is
similar. The formulas are established to meet an objective of minimizing the sum of squares
of error for the model. Hence, the regression analysis shown here is referred to as least
squares analysis. Methods of calculus are applied, resulting in k + 1 equations with k + 1
unknowns (b0 and k values of bi ) for multiple regression analyses with k independent vari-
ables. Thus, a regression model with six independent variables will generate seven simulta-
neous equations with seven unknowns (b0, b1, b2, b3, b4, b5, b6).
For multiple regression models with two independent variables, the result is three
simultaneous equations with three unknowns (b0, b1, and b2).
b0n + b1 ©x1 + b2 ©x2 = ©y
b0 ©x1 + b1 ©x1 2 + b2 ©x1x2 = ©x1y
b0 ©x2 + b1 ©x1x2 + b2 ©x2 2 = ©x2y
The process of solving these equations by hand is tedious and time-consuming.
Solving for the regression coefficients and regression constant in a multiple regression
model with two independent variables requires ©x1, ©x2, ©y, ©x12, ©x22, ©x1x2, ©x1y, and
©x2y. In actuality, virtually all business researchers use computer statistical software pack-
ages to solve for the regression coefficients, the regression constant, and other pertinent
information. In this chapter, we will discuss computer output and assume little or no hand
calculation. The emphasis will be on the interpretation of the computer output.

A Multiple Regression Model

A real estate study was conducted in a small Louisiana city to determine what variables, if
any, are related to the market price of a home. Several variables were explored, including
the number of bedrooms, the number of bathrooms, the age of the house, the number of
square feet of living space, the total number of square feet of space, and the number of
garages. Suppose the researcher wants to develop a regression model to predict the market
price of a home by two variables, “total number of square feet in the house” and “the age
of the house.” Listed in Table 13.1 are the data for these three variables.
13.1 The Multiple Regression Model 521

TA B L E 1 3 . 1 Market Price Total Number Age of House

Real Estate Data ($1,000) of Square Feet (Years)
y x1 x2
63.0 1605 35
65.1 2489 45
69.9 1553 20
76.8 2404 32
73.9 1884 25
77.9 1558 14
74.9 1748 8
78.0 3105 10
79.0 1682 28
83.4 2470 30
79.5 1820 2
83.9 2143 6
79.7 2121 14
84.5 2485 9
96.0 2300 19
109.5 2714 4
102.5 2463 5
121.0 3076 7
104.9 3048 3
128.0 3267 6
129.0 3069 10
117.9 4765 11
140.0 4540 8

A number of statistical software packages can perform multiple regression analysis, includ-
ing Excel and Minitab. The output for the Minitab multiple regression analysis on the real estate
data is given in Figure 13.3. (Excel output is shown in Demonstration Problem 13.1.)
The Minitab output for regression analysis begins with “The regression equation is.”
From Figure 13.3, the regression equation for the real estate data in Table 13.1 is
yN = 57.4 + .0177x1 - .666x2
The regression constant, 57.4, is the y-intercept. The y-intercept is the value of yN if both
x1 (number of square feet) and x2 (age) are zero. In this example, a practical understanding
of the y-intercept is meaningless. It makes little sense to say that a house containing no
square feet (x1 = 0) and no years of age (x2 = 0) would cost $57,400. Note in Figure 13.2
that the response plane crosses the y-axis (price) at 57.4.

FIGURE 13.3
Regression Analysis: Price versus Square Feet, Age
Minitab Output of Regression
for the Real Estate Example The regression equation is
Price = 57.4 + 0.0177 Square Feet - 0.666 Age
Predictor Coef SE Coef T P
Constant 57.35 10.01 5.73 0.000
Square Feet 0.017718 0.003146 5.63 0.000
Age –0.6663 0.2280 –2.92 0.008
S = 11.9604 R-Sq = 74.1% R-Sq(adj) = 71.5%
Analysis of Variance
Source DF SS MS F P
Regression 2 8189.7 4094.9 28.63 0.000
Residual Error 20 2861.0 143.1
Total 22 11050.7
522 Chapter 13 Multiple Regression Analysis

The coefficient of x1 (total number of square feet in the house) is .0177, which
means that a one-unit increase in square footage would result in a predicted increase of
.0177 # ($1,000) = $17.70 in the price of the home if age were held constant. All other vari-
ables being held constant, the addition of 1 square foot of space in the house results in a
predicted increase of $17.70 in the price of the home.
The coefficient of x2 (age) is -.666. The negative sign on the coefficient denotes an inverse
relationship between the age of a house and the price of the house: the older the house, the
lower the price. In this case, if the total number of square feet in the house is kept constant, a
one-unit increase in the age of the house (1 year) will result in - .666 # ($1,000) = -$666, a
predicted $666 drop in the price.
In examining the regression coefficients, it is important to remember that the inde-
pendent variables are often measured in different units. It is usually not wise to compare
the regression coefficients of predictors in a multiple regression model and decide that the
variable with the largest regression coefficient is the best predictor. In this example, the two
variables are in different units, square feet and years. Just because x2 has the larger coeffi-
cient (.666) does not necessarily make x2 the strongest predictor of y.
This regression model can be used to predict the price of a house in this small
Louisiana city. If the house has 2500 square feet total and is 12 years old, x1 = 2500 and x2 = 12.
Substituting these values into the regression model yields

yN = 57.4 + .0177x1 - .666x2

= 57.4 + .0177(2500) - .666(12) = 93.658

The predicted price of the house is $93,658. Figure 13.2 is a graph of these data with
the response plane and the residual distances.

D E M O N S T R AT I O N
PROBLEM 13.1
Since 1980, the prime interest rate in the United States has varied from less than 5%
to over 15%. What factor in the U.S. economy seems to be related to
the prime interest rate? Two possible predictors of the prime interest
rate are the annual unemployment rate and the savings rate in the
United States. Shown below are data for the annual prime interest
rate for the even-numbered years over a 28-year period in the United
States along with the annual unemployment rate and the annual average personal
saving (as a percentage of disposable personal income). Use these data to develop a
multiple regression model to predict the annual prime interest rate by the unemploy-
ment rate and the average personal saving. Determine the predicted prime interest
rate if the unemployment rate is 6.5 and the average personal saving is 5.0.

Year Prime Interest Rate Unemployment Rate Personal Saving

1980 15.26 7.1 10.0

1982 14.85 9.7 11.2
1984 12.04 7.5 10.8
1986 8.33 7.0 8.2
1988 9.32 5.5 7.3
1990 10.01 5.6 7.0
1992 6.25 7.5 7.7
1994 7.15 6.1 4.8
1996 8.27 5.4 4.0
1998 8.35 4.5 4.3
2000 9.23 4.0 2.3
2002 4.67 5.8 2.4
2004 4.34 5.5 2.1
2006 7.96 4.6 0.7
2008 5.09 5.8 1.8

ENS185 MODULE 6 Correlation and Regression
No ratings yet
ENS185 MODULE 6 Correlation and Regression
82 pages
Portion 10
No ratings yet
Portion 10
55 pages
Hair EOMR 6e Chap012 PPT Accessible
No ratings yet
Hair EOMR 6e Chap012 PPT Accessible
45 pages
RES805-RM-Module 2 Correlation
No ratings yet
RES805-RM-Module 2 Correlation
29 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
Study Text
No ratings yet
Study Text
369 pages
Correlation Regression Analysis
No ratings yet
Correlation Regression Analysis
19 pages
Organism: With A Foreword by
100% (1)
Organism: With A Foreword by
432 pages
CH 12
No ratings yet
CH 12
43 pages
Unit 2
No ratings yet
Unit 2
44 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
SOCI1005 - Correlation and Regression
No ratings yet
SOCI1005 - Correlation and Regression
36 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
BStats 2
No ratings yet
BStats 2
66 pages
Lecture 13 Correlation Chapter 12 Part 1
No ratings yet
Lecture 13 Correlation Chapter 12 Part 1
20 pages
Statistical Techniques - Formatted
No ratings yet
Statistical Techniques - Formatted
51 pages
Corelation and Regrassion
No ratings yet
Corelation and Regrassion
5 pages
Correlation & Regression
No ratings yet
Correlation & Regression
30 pages
Stastics ll:6
No ratings yet
Stastics ll:6
22 pages
Quamet1 - CM7
No ratings yet
Quamet1 - CM7
10 pages
Computer System Architecture - Morris Mano (1) - Pfu79k3tv14e238u9o4r794
No ratings yet
Computer System Architecture - Morris Mano (1) - Pfu79k3tv14e238u9o4r794
517 pages
Chapter 4 - Correlation and Linear Regression
No ratings yet
Chapter 4 - Correlation and Linear Regression
28 pages
Levine Bsfc7ge Ch12 1
No ratings yet
Levine Bsfc7ge Ch12 1
93 pages
Topic 2 - Correlation Theory
No ratings yet
Topic 2 - Correlation Theory
15 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Correlation N Regression
No ratings yet
Correlation N Regression
25 pages
IWB Chapter 10 - Inter-Relationships Between Variables
No ratings yet
IWB Chapter 10 - Inter-Relationships Between Variables
22 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
CH 6
No ratings yet
CH 6
42 pages
5 Correlation and Cofficient 2023
No ratings yet
5 Correlation and Cofficient 2023
51 pages
Stat 4-6 Chapter
No ratings yet
Stat 4-6 Chapter
37 pages
Chapter 12
No ratings yet
Chapter 12
36 pages
Chapter 13 PowerPoint
No ratings yet
Chapter 13 PowerPoint
36 pages
Mfylg$f3f !y) NNN) 2
No ratings yet
Mfylg$f3f !y) NNN) 2
13 pages
Regression and Correlation
No ratings yet
Regression and Correlation
19 pages
CH 6
No ratings yet
CH 6
43 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Correlation
100% (1)
Correlation
29 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
11 pages
Statics Chapter 999
No ratings yet
Statics Chapter 999
9 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Lesson 6.2 Correlation and Regression Analysis Final Edition
No ratings yet
Lesson 6.2 Correlation and Regression Analysis Final Edition
8 pages
SANS MGT414 10 Course Book
No ratings yet
SANS MGT414 10 Course Book
100 pages
Early Method of Detecting Deception
100% (2)
Early Method of Detecting Deception
6 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Correlation-Regression 2019
No ratings yet
Correlation-Regression 2019
76 pages
Simple Linear Regression and Correlation (Continue..,)
No ratings yet
Simple Linear Regression and Correlation (Continue..,)
30 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Cha 6
No ratings yet
Cha 6
8 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
UCS 539 Module 2 - Financial Statements and Financial Analysis
No ratings yet
UCS 539 Module 2 - Financial Statements and Financial Analysis
51 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Business Stat CHAPTER 6
No ratings yet
Business Stat CHAPTER 6
5 pages
Panasonic Kx-mb2025 2030 Service Manual
67% (6)
Panasonic Kx-mb2025 2030 Service Manual
313 pages
Correlation Analysis and Regression 22
No ratings yet
Correlation Analysis and Regression 22
41 pages
UCS 539 Module 1 - Introduction To Accounting and Accounting Equation
No ratings yet
UCS 539 Module 1 - Introduction To Accounting and Accounting Equation
41 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
FLEX-1500 Service Manual
No ratings yet
FLEX-1500 Service Manual
49 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
DesignThinking UNIT II
No ratings yet
DesignThinking UNIT II
43 pages
UCS 539 Module 1 - Accounting Equation Practice (With Solutions)
No ratings yet
UCS 539 Module 1 - Accounting Equation Practice (With Solutions)
27 pages
Lecture 11-Correlation and Linear Regression
No ratings yet
Lecture 11-Correlation and Linear Regression
7 pages
UCS 539 Module 4 - Time Value of Money and Interest Rates
No ratings yet
UCS 539 Module 4 - Time Value of Money and Interest Rates
29 pages
UCS 539 Module 3 - Basics of Finance and Financial Decision Making
No ratings yet
UCS 539 Module 3 - Basics of Finance and Financial Decision Making
28 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
CRM Section Two
No ratings yet
CRM Section Two
4 pages
50 Years of The Future
100% (1)
50 Years of The Future
25 pages
Action Reesearch Webinar CPD Certificate April 2025
No ratings yet
Action Reesearch Webinar CPD Certificate April 2025
5 pages
Probability Distributions
No ratings yet
Probability Distributions
10 pages
Personal Development Plan
No ratings yet
Personal Development Plan
2 pages
Matrikulasi - 2
No ratings yet
Matrikulasi - 2
37 pages
Statistical Estimation
No ratings yet
Statistical Estimation
10 pages
Day 8 - Module Linear Correlation
No ratings yet
Day 8 - Module Linear Correlation
5 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Xie 2021
No ratings yet
Xie 2021
8 pages
Math 6 March 23 Quarter 3 Speed
No ratings yet
Math 6 March 23 Quarter 3 Speed
34 pages
Class Actvity 1 Answers
55% (11)
Class Actvity 1 Answers
10 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Orson Welles' Memo On by Lawrence French
100% (1)
Orson Welles' Memo On by Lawrence French
41 pages
Research Proposal
No ratings yet
Research Proposal
10 pages
Micro Controller 89c51
No ratings yet
Micro Controller 89c51
17 pages
Soil Mechanics Formula 1700830319
No ratings yet
Soil Mechanics Formula 1700830319
3 pages
EG8145V5 Quick Start 01 (R20C00)
No ratings yet
EG8145V5 Quick Start 01 (R20C00)
16 pages
Tetrapod Scheme
No ratings yet
Tetrapod Scheme
1 page
Project Brief 1
No ratings yet
Project Brief 1
2 pages
ECC For EBS
100% (1)
ECC For EBS
6 pages
Construction of Anganwadi Centres: Madhya Pradesh
No ratings yet
Construction of Anganwadi Centres: Madhya Pradesh
4 pages
Lesson Planning in Teaching
No ratings yet
Lesson Planning in Teaching
10 pages
Assignment - 2 (Google in China)
100% (1)
Assignment - 2 (Google in China)
5 pages
Brosur Master Steel
No ratings yet
Brosur Master Steel
4 pages
Sessional Marks (Theory)
0% (1)
Sessional Marks (Theory)
1 page
Library Cataloger General Responsibilities
No ratings yet
Library Cataloger General Responsibilities
2 pages
RIGZONE - How Does Coiled Tubing Work
No ratings yet
RIGZONE - How Does Coiled Tubing Work
2 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Correlation and Regression

Uploaded by

Correlation and Regression

Uploaded by

466 Chapter 12 Simple Regression Analysis and Correlation

Correlation is a measure of the degree of relatedness of variables. It can help a business

PEARSON PRODUCT- (©x©y)

(e) Virtually No Correlation (r = –.004)

Examination of the formula for computing a Pearson product-moment correlation

12.2 Determine the value of r for the following data.

Use the data to compute a correlation coefficient, r, to determine the correlation

12.2 INTRODUCTION TO SIMPLE REGRESSION ANALYSIS

12.3 DETERMINING THE EQUATION OF THE REGRESSION LINE

For any specific dependent variable value, yi ,

EQUATION OF THE SIMPLE yN = b0 + b1x

SLOPE OF THE REGRESSION (©x)(©y)

ALTERNATIVE FORMULA SSxy

y INTERCEPT OF THE ©y (©x)

Raw Steel Production New Orders

12.4 RESIDUAL ANALYSIS

FIGURE 12.9 FIGURE 12.10

Using Residuals to Test the Assumptions

Using the Computer for Residual Analysis

Normal Probability Plot Versus Fits

12.5 STANDARD ERROR OF THE ESTIMATE

SUM OF SQUARES OF ERROR SSE = ©(y - yN)2

For the airline cost example,

TA B L E 1 2 . 6 Number of Passengers Cost ($1,000) Residual

STANDARD ERROR OF SSE

13.1 THE MULTIPLE REGRESSION MODEL

Multiple regression analysis is similar in principle to simple regression analysis. However,

Multiple Regression Model with Two Independent

Determining the Multiple Regression Equation

A Multiple Regression Model

TA B L E 1 3 . 1 Market Price Total Number Age of House

yN = 57.4 + .0177x1 - .666x2

Year Prime Interest Rate Unemployment Rate Personal Saving

1980 15.26 7.1 10.0

You might also like