0% found this document useful (0 votes)
21 views18 pages

Regression Primer 02

Uploaded by

Angela Cruz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views18 pages

Regression Primer 02

Uploaded by

Angela Cruz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Regression Primer

Knowing about regression analysis will help you to learn about SEM. Although the techniques considered
next analyze observed variables only, their basic principles make up a core part of SEM. This includes the
dependence of the results on not only what is measured (the data), but also on what is not measured, or omit-
ted relevant variables, a kind of specification error. Some advice: Even if you think that you already know a
lot about regression, you should nevertheless read this primer carefully. This is because many readers tell me
that they learned something new after hearing about the issues outlined here. Next I assume that standard
deviations (SD) for continuous variables are calculated as the square root of the sample variance s2 = SS/df,
where SS refers to the sum of squared deviations from the mean and the overall degrees of freedom are
df = N – 1. Standardized scores, or normal deviates, are calculated as z = (X – M)/SD for a continuous
variable X.

BIVARIATE REGRESSION X = 0. Exercise 1 asks you to calculate these coeffi-


cients for the data in Table R.1.
Presented in Table R.1 are scores on three continu- The predicted scores defined by Equation R.1 make
ous variables. Considered next is bivariate regression up a composite, or a weighted linear combination of
for variables X and Y, but later we deal with the mul- the predictor and the intercept. The values of BX and
tiple regression analysis that also includes variable W. A X in Equation R.1 are generally estimated with the
The unstandardized bivariate regression equation for method of ordinary least squares (OLS) so that the
predicting Y from X—also called regressing Y on X— least squares criterion is satisfied. The latter means
that the sum of squared residuals, or Σ (Y − Yˆ ) , is as
2
takes the form
small as possible in a particular sample. Consequently,
Yˆ B X X + A X
= (R.1) OLS estimation capitalizes on chance variation, which
implies that values of BX and A X will vary over sam-
where Ŷ refers to predicted scores. Equation R.1 ples. As we will see later, capitalization on chance is a
describes a straight line where BX, the unstandardized greater problem in smaller versus larger samples.
regression coefficient for predictor X, is the slope of the Coefficient BX in Equation R.1 is related to the Pear-
line, and A X is the constant or intercept term, or the son correlation rXY and the standard deviations of X and
value of Ŷ , if X = 0. For the data in Table R.1, Y as follows:

=Yˆ 2.479 X + 61.054  SD 


B X  rXY  Y  (R.2)
 SDX 
which says that a 1-point increase in X predicts an
increase in Y of 2.479 points and that Ŷ = 61.054, given A formula for rXY is presented later, but for now we can

1
2 Regression Primer

TABLE R.1. Example Data Set for Bivariate Regression


and Multiple Regression
Case X W Y Case X W Y
A 16 48 100 K 18 50 102
B 14 47 92 L 19 51 115
C 16 45 88 M 16 52 92
D 12 45 95 N 16 52 102
E 18 46 98 O 22 50 104
F 18 46 101 P 12 51 85
G 13 47 97 Q 20 54 118
H 16 48 98 R 14 53 105
I 18 49 110 S 21 52 111
J 22 49 124 T 17 53 122

Note. MX = 16.900, SDX = 3.007; MW = 49.400, SDW = 2.817; MY = 102.950, SDY =


10.870; rXY = .686, rXW = .272, rWY = .499.

see in Equation R.2 that BX is just a rearrangement of are not centered.) Once centered, x = 0 corresponds to a
the expression for the covariance between X and Y, or score that equals the mean in the original (uncentered)
covXY = rXYSDXSDY. Thus, BX corresponds to the cova- scores, or X = M X. When regressing Y on x, the value of
riance structure of Equation R.1. Because BX reflects the intercept A x equals Ŷ when x = 0; that is, the inter-
the original metrics of X and Y, its value will change if cept is the predicted score on Y when X takes its aver-
the scale of either variable is altered (e.g., X is measured age value in the raw data. Although centering generally
in centimeters instead of inches). For the same reason, changes the value of the intercept (A X ≠ A x), centering
values of BX are not limited to a particular range. For does not affect the value of the unstandardized regres-
example, it may be possible to derive values of BX such sion coefficient (BX = Bx). Exercise 2 asks you to prove
as –7.50 or 1,225.80, depending on the raw score met-
rics of X and Y. Consequently, a numerical value of BX
this point for the data in Table R.1.
Regression residuals, X
r =0
or( Y − Yˆ ,) sum to zero and are
that appears “large” does not necessarily mean that X uncorrelated with the predictor, or
is an important or strong predictor of Y.
The intercept A X of Equation R.1 is related to both rX(Y − Yˆ ) = 0 (R.4)
BX and the means of both variables:
The equality represented in Equation R.4 is required
A X = MY – BX MX (R.3) in order for the computer to calculate unique values
of the regression coefficient and intercept in a par-
The term A X represents the mean structure of Equation ticular sample. Conceptually, assuming indepen-
R.1 because it conveys information about the means dence of residuals and predictors, or the regression
of both variables (and the regression coefficient) albeit rule (Kenny & Milan, 2012), permits estimation of
with a single number. As stated, Ŷ = A X when X = 0, but the explanatory power of the latter (e.g., BX for X in
sometimes scores of zero are impossible on certain pre- Equation R.1) controlling for omitted (unmeasured)
dictors (e.g., there is no IQ score of zero in conventional predictors. Bollen (1989) referred to this assump-
standardized metrics for such scores). If so, scores on tion as pseudo-­isolation of the measured predictor X
X may be centered, or converted to mean deviations from all other unmeasured predictors of Y. This term
x = X – MX, before analyzing the data. (Scores on Y describes the essence of statistical control where BX is
Regression Primer 3

estimated, assuming that X is unrelated to all possible =Xˆ .190 Y − 2.631


unmeasured predictors of Y.
The predictor and criterion in bivariate regression which says that a 1-point increase in Y predicts an
are theoretically interchangeable; that is, it is possible increase in X of .190 points and that X̂ = –2.631, given
to regress Y on X or to regress X on Y in two separate Y = 0. Presented in Figure R.1 are the unstandardized
analyses. Regressing X on Y would make less sense if X equations for regressing Y on X and for regressing X
were measured before Y or if X is known to cause Y. on Y for the data in Table R.1. In general, the two pos-
Otherwise, the roles of predictor and criterion are not sible unstandardized prediction equations in bivariate
fixed in regression. The unstandardized regression regression are not identical. This is because the Y-on-X
equation for regressing X on Y is equation minimizes residuals on Y, but the X-on-Y
equation minimizes residuals on X.
Xˆ BY Y + A Y
= (R.5) The equation for regressing Y on X when both vari-
ables are standardized (i.e., their scores are normal
where the regression coefficient and intercept in Equa- deviates, z) is
tion R.5 are defined, respectively as follows:
zˆ Y = rXY z X (R.7)
 SD 
B X  rXY  Y  and AY = MX – BY MY (R.6)
 SDX  where zˆ Y is the predicted standardized score on Y and
the Pearson correlation rXY is the standardized regres-
The expression for BY is nothing more than a differ- sion coefficient. There is no intercept or constant term in
ent rearrangement of the same covariance, or covXY = Equation R.7 because the means of standardized vari-
rXYSDXSDY, compared with the expression for BX (see ables equal zero. (Variances of standardized variables
Equation R.2). For the data in Table R.1, the unstan- are 1.0.) For the data in Table R.1, rXY = .686. Given
dardized regression equation for predicting X from Y is z X = 1.0 and rXY = .686, then zˆ Y = .686 (1.0), or .686;

130 X on Y

120
Y on X

110
Y

100

90

80

12 14 16 18 20 22
X

FIGURE R.1. Unstandardized prediction lines for regressing Y on X and for regressing X on Y for the data in
Table R.1.
4 Regression Primer

that is, a score one standard deviation above the mean ter when the scales of all variables are meaningful
on X predicts a score almost seven-­tenths of a standard rather than arbitrary. Suppose that Y is the time to com-
deviation above the mean on Y. A standardized regres- plete an athletic event and X is the number of hours
sion coefficient thus equals the expected difference on spent in training. Assuming a negative covariance, the
Y in standard deviation units, given an increase on X value of BX would indicate the predicted decrease in
of one full standard deviation. Unlike the unstandard- performance time for every additional hour of training.
ized regression coefficient BX (see Equation R.2), the In contrast, standardized coefficients describe the effect
value of the standardized regression coefficient (rXY) is of training on performance in standard deviation units,
unaffected by the scale on either X or Y. It is true that which discard the original—­and meaningful—­scales
(1) rXY = .686 is also the standardized coefficient when of X and Y. The assumptions of bivariate regression
regressing z X on zY, and (2) the standardized prediction are essentially the same as those of multiple regression.
equation in this case is zˆ X = rXY zY. They are considered in the next section.
There is a special relation between rXY and the
unstandardized predicted scores. If Y is regressed on
X, for example, then MULTIPLE REGRESSION
1. rXY = rYYˆ ; that is, the bivariate correlation between
The logic of multiple regression is considered next for
X and Y equals the bivariate correlation between Y
the case of two continuous predictors, X and W, and a
and Ŷ ;
continuous criterion Y, but the same ideas apply if there
2. the observed variance in Y can be represented as are three or more predictors. The form of the unstan-
the exact sum of the variances of the predicted dardized equation for regressing Y on both X and W is
scores and the residuals, or sY2 = sŶ2 + sY2 − Yˆ ; and
2 2 2
3. rXY = sŶ / sY , which says that the squared correla- Yˆ =B X X + BW W + A X , W (R.8)
tion between X and Y equals the ratio of the vari-
ance of the predicted scores over the variance of the where BX and BW are the unstandardized partial
observed scores on Y. regression coefficients and A X,W is the intercept. The
coefficient BX estimates the change in Y, given a 1-point
The equality just stated is the basis for interpreting change in X while controlling for W. The coefficient BW
squared correlations as proportions of explained vari- has the analogous meaning for the other predictor. The
ance, and a squared correlation is the coefficient of intercept A X,W equals the predicted score on Y when
2
determination. For the data in Table R.1, rXY = .6862 = the scores on both predictors are zero, or X = W = 0.
.470, so we can say that X explains about 47.0% of the
If zero is not a valid score on either predictor, then
variance in Y, and vice versa. Exercise 3 asks you to
Y can be regressed on centered scores (x = X – MX,
verify the second and third equalities just described for
w = W – M W) instead of the original scores. If so, then
the data in Table R.1.
When replication data are available, it is actually
Ŷ = A x,w, given X = MX and W = M W. As in bivari-
better to compare unstandardized regression coef- ate regression, centering does not affect the values of
ficients, such as BX, across different samples than to the regression coefficients for each predictor in Equa-
compare standardized regression coefficients, such as tion R.8 (i.e., BX = Bx, BW = Bw).
rXY. This is especially true if those samples have dif- The overall multiple correlation is actually just the
ferent variances on X or Y. This is because the correla- Pearson correlation between the observed and ­predicted
tion rXY is standardized based on the variability in a scores on the criterion, or RY·X,W = rYYˆ . Unlike bivari-
particular sample. If variances in a second sample are ate correlations, though, the range of R is 0–1.0. The
not the same, then the basis of standardization is not statistic R2 equals the proportion of variance explained
constant over the first and second samples. In contrast, in Y by both predictors X and W, controlling for their
the metric of BX is that of the raw scores for variables X intercorrelation. For the data in Table R.1, the unstan-
and Y, and these metrics are presumably constant over dardized regression equation is
samples.
Unstandardized regression coefficients are also bet- Yˆ =2.147 X + 1.302 W + 2.340
Regression Primer 5

and the multiple correlation equals .759. Given these rXY − rWY rXW rWY − rXY rXW
results, we can say that bX = and bW = (R.10)
1 − rXW
2
1 − rXW
2

1. a 1-point change in X predicts an increase in Y of In the numerators of Equation R.10, the bivariate corre-
2.147 points, controlling for W; lation of each predictor with the criterion is adjusted for
2. a 1-point change in W predicts an increase in Y of the correlation of the other predictor with the criterion
1.302 points, controlling for X; and for correlation between the two predictors. The
3. Ŷ = 2.340, given X = W = 0; and denominators in Equation R.10 adjust the total stan-
dardized variance by removing the proportion shared
4. the predictors explain .7592 = .576, or about 57.6%
by the two predictors. If the values of rXY, rWY, and rXW
of the total variance in Y, after taking account of
vary over samples, then values of coefficients in Equa-
their intercorrelation (rXW = .272; Table R.1).
tions R.8–R.10 will also change.
Given three or more predictors, the formulas for the
The regression equation just described defines a plane
regression coefficients are more complicated but follow
in three dimensions where the slope along the X-axis
the same principles (see Cohen et al., 2003, pp. 636–
is 2.147, the slope along the W-axis is 1.302, and the
642). If there is just a single predictor X, then bX = rXY.
Y-intercept for X = W = 0 is 2.340. This regression sur-
The intercept in Equation R.8 can be expressed as a
face is plotted in Figure R.2 over the range of scores in
function of the unstandardized partial regression coef-
Table R.1.
ficients and the means of all three variables as follows:
Equations for the unstandardized partial regression
coefficients for each of two continuous predictors are
A X, W =
M Y − B X M X − BW M W (R.11)
 SDY   SDW 
BX = bX   and BW = bW   (R.9) The regression equation for standardized variables is
 SDX   SDY 
where bX and bW for X and W are, respectively, their
=
zˆ Y b X rXY + bW rWY (R.12)
standardized partial regression coefficients, also
For the data in Table R.1, bX = .594, which says that
known as beta weights. Their formulas are listed next:
the difference on Y is expected to be about .60 stan-

130
120
110

100
90
80

22
20
18
16 80
14 60 70
X 12 30 40 50
W

FIGURE R.2. Unstandardized regression surface for predicting Y from X and W for the data in Table R.1.
6 Regression Primer

dard deviations large, given a difference on X of one regression matrix=in(*)/


full standard deviation, while we are controlling for W. variables=x w y/
The result bW = .337 has the analogous meaning except dependent=y
that X is now statistically controlled. Because all vari- /enter.
ables have the same metric in the standardized solu-
tion, we can directly compare values of bX with bW and A drawback to conducting regression analyses with
correctly infer that the relative predictive power of X is summaries statistics is that residuals cannot be calcu-
about 1.76 times that of W because the ratio .594/.337 = lated for individual cases.
1.76. In general, values of b can be directly compared
across different predictors within the same sample,
Corrections for Bias
but unstandardized coefficients (B) are preferred for
comparing results for the same predictor over different The statistic R2 is a positively biased estimator of r2
samples. (rho-­squared), the population proportion of explained
The statistic R Y2 ⋅X , W can also be expressed as a func- variance. The degree of bias is greater in smaller sam-
tion of the beta weights and bivariate correlations of the ples or when the number of predictors is large relative
predictors with the criterion. With two predictors, to the number of cases. For example, if N = 2 in bivari-
ate regression and there are no tied scores on X or Y,
R Y=
2
⋅X, W b X rXY + bW rWY (R.13) then r 2 must equal 1.0. Now suppose that N = 100 and k
= 99, where k is the number of predictor variables. With
The role of beta weights as corrections for predictor so many predictors—­in fact, the maximum number for
overlap is also apparent in this equation. Specifically, if N = 100—the value of R2 must equal 1.0 because there
rXW = 0 (the predictors are independent), then bX = rXY can be no error variance with so many predictors, and
and bW = rWY (Equation R.10). This means that R Y2 ⋅X , W this is true even for random numbers.
2 2
is just the sum of rXY and rWY . But if rXW ≠ 0 (the pre- There are many corrections that downward adjust R2
dictors covary), then bX and bW do not equal the cor- as a function of N and k. Perhaps the most familiar is
responding bivariate correlations and R Y2 ⋅X , W is not the Wherry’s (1931) equation:
2 2
simple sum of rXY and rWY (it is less). Exercise 4 asks
you to verify some of the facts about multiple regres-  N −1 
Rˆ 2 =1 − (1 − R 2)  (R.14)
sion just stated for the data in Table R.1.  N − k − 1
Standard regression analyses do not require raw data
2
files. This is because regression equations and values where R̂ is the shrinkage-­corrected estimate of r2.
2
of R2 can be calculated from summary statistics (e.g., In small samples it can happen that R̂ < 0; if so, then
2
Equation R.13), and many regression computer proce- R̂ is interpreted as though its value were zero. As the
dures read summary statistics as the input data. For sample size increases for a constant number of predic-
2
example, the SPSS syntax listed next reads the sum- tors, values of R̂ and R2 are increasingly similar, and
mary statistics in Table R.1 and specifies the regres- in very large samples they are essentially equal; that
sion of Y on X and W. Four-­decimal accuracy is recom- is, it is unnecessary to correct for positive bias in very
mended for matrix input: large samples. Exercise 5 asks you to apply the Wherry
correction to the data in Table R.1.
comment table R.1, regress y on x, w.
matrix data variables=x w y/
Assumptions
contents=mean sd n corr
/format=lower nodiagonal. The statistical and conceptual assumptions of regres-
begin data sion are strict, probably more so than many researchers
16.9000 49.4000 102.9500 realize. They are summarized next:
3.0070 2.8172 10.8699
20 20 20 1. Regression coefficients reflect unconditional
.2721 linear relations only. The estimate for BX in Equa-
.6858 .4991 tion R.8 assumes that the linear relation between X and
end data. Y remains constant over all levels of (a) X itself, (b) the
Regression Primer 7

other measured predictor, W, and (c) all unmeasured ment on the residuals are inadequate. Exercise 6 asks
predictors. But if the relation between X and Y is appre- you to inspect the residuals for the multiple regression
ciably curvilinear or conditional, the value of BX could analysis of the data in Table R.1. Although there is no
misrepresent predictive power. A conditional relation requirement in regression for normal distributions of
implies interaction, where the covariance between the original scores, values of multiple correlations and
X and Y changes over the levels of at least one other absolute partial regression coefficients are reduced if
predictor, measured or unmeasured. A curvilinear rela- the distributions for a predictor and the criterion have
tion of X to Y is also conditional in the sense that the very different shapes, such as very positively skewed on
shape of the regression surface changes over the levels one versus very negatively skewed on the other.
of X (e.g., Figure 7.7). How to represent curvilinear or
4. There are no causal effects among the predictors
interactive effects in regression analysis and SEM is
(i.e., there is a single equation). Because predictors and
considered in Chapter 7.
criteria are theoretically interchangeable in regression,
2. All predictors are perfectly reliable (no measure- such analyses can be viewed as strictly predictive. But
ment error). This very strong assumption is necessary sometimes the analysis is explicitly or implicitly moti-
because there is no direct way in standard regression vated by causal hypotheses, where a researcher views
analysis to represent or control for less-than-­perfect the regression equation as a prototypical causal model
score reliability for the predictors. Consequences of with the predictors as causes and the criterion as their
minor violations of this requirement may not be criti- outcome (Cohen et al., 2003). If predictors in standard
cal, but more serious ones can result in substantial bias. regression analyses are viewed as causal, then we must
This bias can affect not only the regression weights of assume there are no causal effects among them. Spe-
predictors measured with error but also those of other cifically, standard regression analyses do not allow for
predictors. It is difficult to anticipate the direction of indirect causal effects where one predictor, such as X,
this propagation of measurement error. Depending affects another, such as W, which in turn affects the cri-
on sample intercorrelations, some absolute regression terion, Y. The indirect effect just described would be
weights may be biased upward (too large), but oth- represented in SEM by the presumed causal order
ers may be biased in the other direction (too small),
or attenuation bias. There is no requirement that the X W Y
criterion be measured without error, but the use of a
psychometrically deficient measure of it can reduce the From a regression perspective, (1) variable W is both
value of R2. Note that measurement error in the cri- a predictor (of Y ) and an outcome (of X), and (2) there
terion only affects the standardized regression coeffi- are actually two equations, one for W another for Y.
cients, not the unstandardized ones. If the predictors But standard regression techniques analyze a single
are also measured with error, too, then these effects for equation at a time, in this case for just Y, and thus yield
the criterion could be amplified, diminished, or can- estimates of direct effects only. If there are appreciable
celed out, but it is best not to hope for the absence of indirect effects but such effects are not explicitly repre-
bias; see Williams et al. (2013) for more information sented in the analysis,, then estimates of direct effects
about measurement error in regression analysis. in standard regression analyses can be very wrong
(Achen, 2005). The idea behind this type of bias is elab-
3. Significance tests in regression assume that the
orated in Chapter 6, which concerns a graph-theoretic
residuals are normally distributed and homoscedas-
approach to causal inference.
tic. The homoscedasticity assumption means that the
residuals have constant variance across all levels of the 5. There is no specification error. A few different
predictors. Distributions of residuals can be heterosce- kinds of potential mistakes involve specification error.
dastic (the opposite of homoscedastic) or non-­normal These include the failure to estimate the correct func-
due to outliers, severe non-­normality in the observed tional form of relations between predictors and the cri-
scores, more measurement error at some levels of the terion, such as assuming unconditional linear effects
criterion or predictors, or a specification error. The only when there are sizable curvilinear or interac-
residuals should always be inspected in regression tive effects. Use of the incorrect estimation method is
analyses (see Cohen, Cohen, West, & Aiken, 2003, another kind of error. For example, OLS estimation is
chap. 4). Reports of regression analyses without com- for continuous criteria, but dichotomous outcomes (e.g.,
8 Regression Primer

pass–fail) generally require different methods, such as or not W is included in the regression equation, given
those used in logistic regression. Including predictors rXW = 0.
that are irrelevant in the population is a specification Now suppose that
error. The concern is that an irrelevant predictor could
in a particular sample relate to the criterion by sam- rXY = .40, rWY = .60, and rXW = .60
pling error alone, and this chance covariance may dis-
tort values of regression coefficients for other predic- Now we assume that the correlation between the
tors. Omitting from the regression equation predictors included predictor X and the omitted predictor W is
that (1) account for some unique proportion of criterion .60, not zero. In the bivariate analysis with X as the sole
variance and (2) covary with measured predictors is predictor, rXY = .40 (the same as before), but now the
left-out variables error, described next. results of the multiple regression analysis are

bX = .06, bW = .56, and R Y2 ⋅X , W = .36


LEFT‑OUT VARIABLES ERROR
Here the value of bX is much lower than that of rXY,
—or more lightheartedly described as the “heartbreak respectively, .06 versus .40. This happens because coef-
of L.O.V.E.” (Mauro, 1990), this is a potentially seri- ficient bX controls for rXW = .60, whereas rXY does not;
ous specification error. As covariances between mea- thus, rXY overestimates the relation between X and Y
sured (included) and unmeasured (excluded) predictors compared with bX.
increase, results based on the included predictors only Omitting a predictor correlated with others in the
tend to become progressively more biased. Suppose that equation does not always result in overestimation of the
rXY = .40 and rWY = .60 for, respectively, predictors X predictive power of an included predictor. For example,
and W. A researcher measures only X and specifies it as if X is the included predictor and W is the omitted pre-
the sole predictor of Y in a bivariate regression. In this dictor, it is also possible for the absolute value of rXY
analysis for the included predictor, the standardized in the bivariate analysis to be less than that of bX when
regression coefficient is rXY = .40. But if the researcher both predictors are included in the equation; that is, rXY
had the foresight to also measure W, the omitted pre- underestimates the relation indicated by bX. It is also
dictor, and specify it along with X as predictors in a possible for rXY and bX to have different signs. Both
multiple regression analysis (e.g., Equation R.8), the cases just mentioned indicate suppression, described in
beta weight for X in this analysis, bX, may not equal more detail in the next section. But overestimation due
.40. If not, then rXY as a standardized regression coef- to omission of a predictor may occur more often than
ficient with X as the sole predictor does not reflect the underestimation (suppression). Also, the pattern of bias
true relation of X to Y compared with bX derived with may be more complicated when there are several omit-
both predictors in the equation. ted variables (e.g., overestimation for some measured
The difference between rXY and bX varies with rXW, predictors, underestimation for others).
the correlation between the included and omitted pre- Predictors are typically excluded because they are
dictors. Specifically, if the included and omitted pre- not measured. This means that it is difficult to actually
dictors are unrelated (rXW = 0), there is no difference, know by how much and in what direction(s) regression
or rXY = bX = .40 in this example because there is no coefficients may be biased relative to what their val-
correction for correlated predictors. Specifically, given ues would be if all relevant predictors were included.
But it is unrealistic to expect the researcher to know
rXY = .40, rWY = .60, and rXW = 0 and be able to measure all relevant predictors. In this
sense, all regression equations are probably misspeci-
you can verify, using Equations R.10 and R.13, that the fied to some degree. If omitted predictors are uncor-
multiple regression results with both predictors are related with included predictors, the consequences of
left-out variables error may be slight; otherwise, the
bX = .40, bW = .60, and R Y2 ⋅X , W = .52 consequences may be more serious. Careful review of
theory and research is the main way to avoid serious
So we conclude that rXY = bX = .40 regardless of whether specification error by decreasing the potential number
of left-out variables.
Regression Primer 9

SUPPRESSION rXY = 0, rWY = .60, and rXW = .50

Perhaps the most general definition is that suppression the results of a multiple regression analysis are
occurs when either (1) the absolute value of a predic-
tor’s beta weight is greater than that of its bivariate bX = –.40, bW = .80, and R Y2 ⋅X , W = .48
correlation with the criterion or (2) the two have dif-
ferent signs (see also Shieh, 2006). So defined, sup- This example of classical suppression (i.e., rXY = 0,
pression implies that the estimated relation between bX = –.40) demonstrates that bivariate correlations of
a predictor and a criterion while controlling for other zero can mask true predictive relations once other vari-
predictors is a “surprise,” given the bivariate correla- ables are controlled. There is also reciprocal suppres-
tions. Suppose that X is the amount of psychotherapy, sion, which can occur when two variables correlate
W is the degree of depression, and Y is the number of positively with the criterion but negatively with each
prior suicide attempts. The bivariate correlations in a other. Some cases of suppression can be modeled in
hypothetical sample are SEM as the result of inconsistent direct versus indirect
effects of causally prior variables on outcome variables.
rXY = .19, rWY = .49, and rXW = .70 These possibilities are explored later in the book.

Based on these results, it might seem that psychother-


apy is harmful because of its positive association with PREDICTOR SELECTION AND ENTRY
suicide attempts (rXY = .19). When both predictors (psy-
chotherapy and depression) are analyzed in multiple An implication of suppression is that predictors should
regression, however, the results are not be selected based on values of bivariate correla-
tions with the criterion. These zero-order associations
bX = –.30, bW = .70, and R Y2 ⋅X , W = .29 do not control for other predictors, so their values can
be misleading compared with partial regression coef-
The beta weight for psychotherapy (–.30) has the oppo- ficients for the same variables. For the same reason,
site sign of its bivariate correlation (.19), and the beta whether or not bivariate correlations with the criterion
weight for depression (.70) exceeds its bivariate correla- are statistically significant is also irrelevant concern-
tion (.49). ing predictor selection. Although regression computer
The results just described are due to controlling for procedures make it easy to do so, researchers should
other predictors. Here, people who are more depressed avoid mindlessly dumping long lists of explanatory
are more likely to be in psychotherapy (rXW = .70) and variables into regression equations in order to control
also more likely to try to harm themselves (rWY = .49). for their effects (Achen, 2005). The risk is that even
Correcting for these associations in multiple regression small but undetected nonlinearities or indirect effects
indicates that the relation of psychotherapy to suicide among predictors can seriously bias partial regression
attempts is actually negative once depression is con- coefficients. It is better to judiciously select the smallest
trolled. It is also true that the relation of depression number of predictors—­those deemed essential based
to suicide is even stronger (here, more positive) once on extant theory or results of prior empirical studies.
psychotherapy is controlled. Omit either psychotherapy Once selected, there are two basic ways to enter
or depression from the analysis—­a left-out variables predictors into the equation: One is to enter all predic-
error—and the bivariate results with the remaining tors at once, or simultaneous (direct) entry. The other
predictor are misleading. is to enter them over a series of steps, or sequential
The example just described concerns negative sup- entry. Entry order can be determined according to
pression, where the predictors have positive bivariate one of two different standards, theoretical (rational)
correlations with the criterion and with each other, or empirical (statistical). The rational standard corre-
but one receives a negative beta weight in the multiple sponds to hierarchical regression, where you tell the
regression analysis. A second type is classical sup- computer a fixed order for entering the predictors. For
pression, where one predictor is uncorrelated with the example, sometimes demographic variables are entered
criterion but receives a nonzero beta weight controlling at the first step, and then entered at the second step is a
for another predictor. For example, given the following psychological variable of interest. This order not only
correlations in a hypothetical sample, controls for the demographic variables but also permits
10 Regression Primer

evaluation of the predictive power of the psychologi- tion. If you had good reason for including a predictor,
cal variable, over and beyond that of the simple demo- then it is better to leave it in the equation until replica-
graphic variables. The latter can be estimated as the tion indicates that the predictor does not appreciably
increase in the squared multiple correlation, or DR2, relate to the criterion.
from that of step 1 with demographic predictors only to
that of step 2 with all predictors in the equation.
An example of the statistical standard is stepwise PARTIAL AND PART CORRELATION
regression, where the computer selects predictors for
entry based solely on statistical significance; that is, The concept of partial correlation concerns the idea
which predictor, if entered into the equation, would of spuriousness: If the observed relation between
have the smallest p value for the test of its partial two variables is wholly due to one or more common
regression coefficient? After selection, predictors at a cause(s), their association is spurious. Consider these
later step can be removed from the equation according bivariate correlations between vocabulary breadth (Y ),
to p values (e.g., if p ≥ .05 for a predictor in the equation foot length (X), and age (W ) in a hypothetical sample of
at a particular step). The stepwise process stops when
elementary school children:
there could be no statistically significant DR2 by add-
ing more predictors. Variations on stepwise regression
rXY = .50, rWY = .60, and rXW = .80
include forward inclusion, where selected predictors
are not later removed from the equation, and back-
Although the correlation between foot length X and
ward elimination, which begins with all predictors in
vocabulary breadth Y is fairly substantial (.50), it is
the equation and then automatically removes them, but
such methods are directed by the computer, not you. hardly surprising because both are caused by a third
Problems of stepwise and related methods are so variable, age W (i.e., maturation).
severe that they are actually banned in some jour- The first-order partial correlation rXY·W removes
nals (Thompson, 1995), and for good reasons, too. the influence of a third variable W from both X and Y.
One problem is extreme capitalization on chance. The formula is
Because every result in these methods is determined
rXY − rXW rWY
by p values in a particular sample, the findings are rXY ⋅W = (R.15)
unlikely to replicate. Another problem is that not all
stepwise regression procedures report p values that are
( 1 − r )( 1 − r )
2
XW
2
WY

corrected for the total number of variables that were Applied to the hypothetical correlations just listed, the
considered for inclusion. Consequently, p values in partial correlation between foot length and vocabulary
stepwise computer output are generally too low, and breadth controlling for age is rXY·W = .043. (You should
absolute values of test statistics are too high; that is, verify this result.) Because the association between X
the computer’s choices could actually be wrong. Even
and Y disappears when W is controlled, their bivariate
worse, such methods give the false impression that the
relation may be spurious. Presumed spurious associa-
researcher does not have to think about predictor selec-
tions due to common causes are readily represented in
tion. Stepwise and related methods are anachronisms
SEM.
in modern data analysis. Said more plainly, death to
stepwise regression, think for yourself (e.g., hierarchi- Equation R.15 for partial correlation can be extended
cal entry)—­ see Whittingham, Stephens, Bradbury, to control for two or more external variables. For
and Freckleton (2006) for more information. example, the second-­order partial correlation rXY·WZ
Once a final set of rationally selected predictors has estimates the association between X and Y controlling
been entered into the equation, they should not be subse- for both W and Z. There is a related coefficient called
quently removed if their regression coefficients are not part correlation or semipartial correlation that con-
statistically significant. To paraphrase Loehlin (2004), trols for external variables out of either of two other
the researcher should not feel compelled to drop every variables, but not both. The formula for the first-order
predictor that is not significant. In smaller samples, the part correlation rY(X·W ), for which the association
power of significance tests may be low, and removing a between X and W is controlled but not for the associa-
nonsignificant predictor can substantially alter the solu- tion between Y and W, is
Regression Primer 11

rXY − rWY rXW squared bivariate correlations of the predictors with the
rY (X⋅W) = (R.16)
criterion and the overall squared multiple correlation
1 − rXW
2
can be expressed as sums of the areas a, b, c, or d in
Given the same bivariate correlations among these Figure R.3, as follows:
three variables reported earlier, the part correlation
2 2
between vocabulary breadth (Y ) and foot length (X) rXY = a + c and rWY =b+c
controlling only foot length for age (W ) is rY(X·W ) = .033.
This result (.033) is somewhat smaller than the partial R Y2 ⋅X , W = a + b + c = 1.0 – d
correlation for these data, or rXY·W = .043. In general,
rXY·W ≥ rY(X·W ); if rXW = 0, then rXY·W = rY(X·W ). The squared part correlations match up directly with
Relations among the squares of the various correla- the unique areas a and b in Figure R.3. Each of these
tions just described can be illustrated with a Venn-type areas also equals the increase in the total proportion
diagram like the one in Figure R.3. The circles repre- of explained variance that occurs by adding a second
sent total standardized variances of the criterion Y and predictor to the equation (i.e., DR2); that is,
predictors X and W. The regions in the figure labeled
a– d make up the total standardized variance of Y, so
rY2(X⋅W)= a= R Y2 ⋅X , W − rWY
2
(R.17)
a + b + c + d = 1.0
rY2(W⋅ X) = b= R Y2 ⋅X , W − rXY
2

Areas a and b represent the proportions of variance


in Y uniquely explained by, respectively, X and W,
but area c represents the simultaneous overlap (redun- The squared partial correlations correspond to areas a,
dancy) of the predictors with the criterion.1 Area d b, and d in Figure R.3, and each estimates the propor-
represents the proportion of unexplained variance. The tion of variance in the criterion explained by one pre-
dictor but not the other. The formulas are
1   Note that interpretation of the area c in Figure R.3 as a propor-

tion of variance generally holds when all bivariate correlations a R Y2 ⋅X , W − rWY


2

are positive and there is no suppression. Otherwise, the value c r=


2
XY⋅W = (R.18)
can be a negative, but there is no such thing as a negative propor-
a+d 1 − rWY2

tion of variance.

c d Y

FIGURE R.3. Venn diagram for the standardized variances of predictors X and W and criterion Y.
12 Regression Primer

R Y2 ⋅X , W − rXY
2 preparation—­automatically classifies a variable with
b
r=
2
WY⋅ X = less than 16 levels as ordinal.
b+d 1 − rXY2
The statistic r has a theoretical maximum absolute
value of 1.0. But the practical upper limit for | r | is < 1.0
For the data in Table R.1, rY2(X⋅W) = .327 and
2 if the relation between X and Y is not unconditionally
rXY = .435. In words, predictor X uniquely explains
⋅W linear, there is measurement error in either X or Y, or
.327, or 32.7% of the total variance of Y (squared part
distributions for X versus Y have different shapes. The
correlation). Of the variance in Y not already explained
amount of variation in samples (i.e., SDX and SDY in
by W, predictor X accounts for .435, or 43.5% of the Equation R.19) also affects the value of r. In general,
remaining variance (squared partial correlation). restriction of range on either X or Y through sampling
Exercise 7 asks you to calculate and interpret the cor- or case selection (e.g., only cases with higher scores
responding results for the other predictor, W, and the on X are studied) tends to reduce values of | r |, but
same data. not always (see Huck, 1992). The presence of outliers,
When predictors are correlated—­which is just about or extreme scores, can also distort the value of r; see
always—­ beta weights, partial correlations, and part Goodwin and Leech (2006) for more information.
correlations are alternative ways to describe in stan- There are other forms of the Pearson correlation for
dardized terms the relative explanatory power of each observed variables that are either natural dichotomies,
predictor controlling for the rest. None is more “cor- such as male versus female for chromosomal sex, or
rect” than the others because each gives a different ordinal (ranks). For example:
perspective on the same data. Note that unstandardized
regression coefficients (B) are preferred when compar- 1. The point-­biserial correlation (rpb) estimates the
ing results for the same predictors and criterion across association between a dichotomy and a continuous
different samples. variable (e.g., treatment vs. control, weight).
2. The phi coefficient (ϕ̂ ) is for two dichotomies (e.g.,
OBSERVED treatment vs. control, survived vs. died).
VERSUS ESTIMATED CORRELATIONS 3. Spearman’s rank order correlation or Spear-
man’s rho (ρ̂ ) is for two ranked variables (e.g.,
The Pearson correlation estimates the degree of linear finish order in a race, rank by amount of training
association between two continuous variables. Its equa- time).
tion is
Computational formulas for all these special forms are
N just rearrangements of Equation R.19 for r (e.g., Kline,
cov XY
∑ z X zY i i
2013a, pp. 138, 166).
i =1
=rXY = (R.19) All forms of the Pearson correlation estimate asso-
SDX SDY df ciations between observed (measured) variables. Other,
non-­Pearson correlations assume that the underlying,
where df = N – 1. Rodgers and Nicewander (1988) or latent, variables are continuous and normally dis-
described a total of 11 other formulas, each of which tributed. For example:
represents a different conceptual or computational defi-
nition of r, but all of which yield the same result for the 1. The biserial correlation (r bis) is for a naturally con-
same data. tinuous variable, such as weight, and a dichotomy,
A continuous variable is one for which, theoreti- such as recovered–­not recovered, that theoretically
cally, any value is possible within the limits of its score represents a dichotomized continuous latent vari-
range. This includes values with decimals, such as 3.75 able. For example, presumably degrees of recovery
seconds or 13.60 kilograms. In practice, variables with were collapsed when the observed dichotomy was
a range of at least 15 points or so are usually consid- created. The value of r bis estimates what the Pear-
ered as continuous even if their scores are discrete, or son r would be if the dichotomized variable were
integers only (e.g., scores of 10, 11, 12, etc.). For exam- continuous and normally distributed.
ple, the PRELIS program of LISREL—used for data 2. The polyserial correlation is the generalization of
Regression Primer 13

r bis that does basically the same thing for a naturally noncontinuous variables in SEM are considered later in
continuous variable and a theoretically continuous-­ Chapters 17 and 18.
but-­ polytomized variable (i.e., categorized into In both regression and SEM, it is generally a bad idea
three or more levels). Likert-type response scales to categorize predictors or outcomes that are continuous
for survey or questionnaire items, such as agree, in order to form pseudo-­groups (e.g., “low” vs. “high”
undecided, or disagree, are examples of a poly- based on a mean split). Categorization not only discards
tomized response continuum about the degree of numerical information about individual differences
agreement. in the original distribution but it also tends to reduce
3. The tetrachoric correlation (rtet) for two dichoto- absolute values of sample correlations when population
mized variables estimates what r would be if both distributions are normal. The degree of this reduction
measured variables were continuous and normally is greater as the cutting point moves further away from
distributed. the mean. But if population correlations are low and the
sample size is small, then categorization can actually
4. The polychoric coefficient is the generalization of increase absolute sample correlations. Categorization
the tetrachoric correlation that estimates r but for can also create artifactual main or interactive effects,
ordinal observed variables with two or more levels. especially when cutting points are arbitrary. In general,
it is better to analyze continuous variables as they are
Computing polyserial or polychoric correlations is rela- and without categorizing them—see Royston, Altman,
tively complicated and requires special software, such and Sauerbrei (2006) for more information.
as PRELIS in LISREL. These programs generally use
a special form of maximum likelihood estimation that
assumes normality of the latent continuous variables, LOGISTIC REGRESSION
and error variance tends to increase rapidly as the num- AND PROBIT REGRESSION
ber of categories on the observed variables decreases
from about five to two; that is, dichotomized continu- Some options to analyze dichotomous outcomes in SEM
ous variables generate the greatest imprecision. are based on logistic regression. Just as in standard
The PRELIS program can also analyze censored multiple regression, the predictors in logistic regression
variables, for which values occur outside of the range can be either continuous or categorical. But the predic-
of measurement. Suppose that a scale registers values tion equation in logistic regression is a logistic func-
of weight between 1 and 300 pounds only. For objects tion, or a sigmoid function with an “S” shape. It is a
that weigh either less than 1 pound or more than 300 type of link function, or a transformation that relates
pounds, the scale tells us only that the measured the observed outcomes to the predicted outcomes in a
weight is, respectively, at most 1 pound or at least 300 regression analysis. Each method of regression has its
pounds. In this example, the hypothetical scale is both own special kind of link function. In standard multiple
left censored and right censored because the values regression with continuous variables, the link function
less than 1 or more than 300 are not registered on the is the identity link, which says that observed scores on
scale. There are other possibilities for censoring, but the criterion Y are in the same units as Ŷ , the predicted
scores on censored variables are either exactly known scores (e.g., Figure R.1). For noncontinuous outcomes,
(e.g., weight = 250) or partially known in that they fall though, original and predicted scores are in different
within an interval (e.g., weight ≥ 300). The technique metrics. This is also true in logistic regression, where
of censored regression, better known in economics the link function is the logit link as explained next.
than in the behavioral sciences, analyzes censored out- Suppose that a total of 32 patients with the same dis-
comes. order are administered a daily treatment for a varying
In SEM, Pearson correlations are normally ana- number of days (5–60). After treatment, the patients are
lyzed as part of analyzing covariances when outcome rated as recovered (1) or not recovered (0). Presented in
variables are continuous. But noncontinuous outcome Table R.2 are the hypothetical raw data for this exam-
variables can be analyzed in SEM, too. One option is ple. I used Statgraphics Centurion (Statgraphics Tech-
to calculate polyserial or polychoric correlations from nologies, 1982–2022)2 to plot the logistic function with
the raw data and then fit the model to these predicted
Pearson correlations. Special methods for analyzing 2 https://www.statgraphics.com/centurion-overview
14 Regression Primer

TABLE R.2. Example Data Set for Logistic Regression and Probit Regression
Status n Number of days in treatment (X)
Not recovered (Y = 0) 16 6, 7, 9, 10, 11, 13, 15, 16, 18, 19, 23, 25, 26, 28, 30, 32

Recovered (Y = 1) 16 27, 30, 33, 35, 36, 39, 41, 42, 44, 46, 47, 49, 51, 53, 55, 56

95% confidence limits for these data that is presented the ratio of the probability for the target event, such as
in ­Figure R.4. This function generates π̂ , the predicted recovered, over the probability for the other event, such
probability of recovery, given the number of days as not recovered. Suppose that 60% of patients recover
treated, X. The confidence limits for these predictions after treatment, but the rest, or 40%, do not recover, or
are so wide because the sample size is small (see the
figure). Because predicted probabilities are estimated π̂ = .60 and 1 – π̂ = .40
from the data, they correspond to a latent continuous
variable, and in this sense logistic regression (and pro- The odds of recovery are thus ω̂ = .60/.40, or 1.50;
bit regression, too) can be seen as a latent variable tech- that is, the odds are 3:2 in favor of recovery. Odds
nique. are converted back to probabilities by dividing the
The estimation method in logistic regression is not odds by 1.0 plus the odds. For example, ω̂ = 1.50, so
OLS. Instead, it is usually a form of maximum likeli- π̂ = 1.50/2.50 = .60, which is the probability of recov-
hood estimation that is applied after transforming the ery.
dichotomous outcome variable into a logit, which is the Coefficients for predictors in logistic regression are
natural logarithm (i.e., natural base e, or about 2.7183) calculated by the computer in a log metric, but each
of the odds of the target outcome, ω̂ . The quantity ω̂ is coefficient can be converted to an odds ratio, which

1.0

.9
π)
Probability of Recovery ( ˆ

.8
.7

.6
.5
.4 95% confidence
limits
.3
Logistic
.2
Probit
.1
0

5 10 15 20 25 30 35 40 45 50 55 60

Days in Treatment (X)

FIGURE R.4. Predicted probability of recovery with 95% confidence limits for the data in Table R.2.
Regression Primer 15

estimates the difference in the odds of the target out- function of the normal curve (F) to calculate predicted
come, given a 1-point increase in the predictor, con- probabilities of the target outcome π̂ from values of Yˆ *
trolling for all other predictors. I submitted the data for each case:
in Table R.2 to the Logistic Regression procedure in
Statgraphics Centurion. The prediction equation in a πˆ = Φ (Yˆ *) (R.21)
log metric is
Equation R.21 is known as the normal ogive model.3
 πˆ  I analyzed the data in Table R.2 using the Probit
logit ( πˆ )= ln  = ln ( ωˆ )= .455 X − 13.701
 1 − πˆ  Analysis procedure in Statgraphics Centurion. The
prediction equation is
where .455 is the coefficient for the predictor X, number
of treatment days, and –13.701 is the intercept. Taking Yˆ * = .268X – 8.072
the antilogarithm of the coefficient for days in treat-
ment, or The coefficient for X, .268, estimates in standard devia-
tion units the amount of change in recovery, given a
ln–1 (.455) = e.455 = 1.576 one-day increase in treatment. That is, the z-score for
recovery increases by .268 for each additional day of
gives us the odds ratio, or 1.576. This result says that for treatment. Again, this rate of change is not constant
each additional day of treatment, the odds for recovery because the overall relation is nonlinear (Figure R.4).
increase by 57.6%. But this rate of increase is not lin- Predicted probabilities of recovery for this example are
ear; instead, the rate at which a logistic curve ascends generated by the probit function
or descends changes according to values of the predic-
tor. For these data, the greatest rate of change in pre- πˆ = Φ (.268 X − 8.072)
dicted recovery occurs between 30 and 40 days of treat-
ment. But at the extremes (X < 30 or X > 40), the rate of The 95% confidence limits for the probit function are
change in the probability of recovery is much less—see somewhat different than those for the logistic function
Figure R.4. The inverse logit function presented next for the data in Table R.2—see Figure R.4.
generates the logistic curve plotted in the figure: Logistic regression and probit regression applied in
the same large samples tend to give similar results but in
e .455 X −13.701 different metrics for the coefficients. The scaling factor
=πˆ logit −1(.455=
X − 13.701)
1 + e .455 X −13.701 that converts results from the logistic model to the same
metric as the normal ogive (probit) model is approxi-
An alternative method is probit regression, which mately 1.7. For example, the ratio of the coefficients
analyzes binary outcomes in terms of a probit func- for the predictor in, respectively, the logistic and probit
tion, where probit stands for “probability unit.” Like- analyses of the data in Table R.2 is .455/.268 = 1.698,
wise, the link function in probit regression is the or 1.7 at single-­decimal accuracy. The two procedures
­probit link. A probit model assumes that the observed may generate appreciably different results if there are
dichotomy Y = 1 for the target outcome versus Y = 0 many cases at the extremes (predicted probabilities
for other events is determined by a normal continuous are close to either 0 to 1.0) or if the sample is small.
latent variable Y* with a mean of zero and variance of Probit regression is more computationally intensive
1.0 such that than logistic regression, but this difference is rela-
tively unimportant for modern microcomputers with
1 if Y * ≥ 0 fast processors and ample memory. It can happen that
Y=  (R.20) computer procedures for probit regression may fail to
0 if Y * < 0 generate a solution in smaller samples. Agresti (2019)
describes additional techniques for categorical data.
The equation in probit regression generates Yˆ * in the
metric of normal deviates (z scores). Next, the com- 3   You can see the equation for Φ at https://en.wikipedia.org/
puter uses the equation for the cumulative distribution wiki/Normal_distribution
16 Regression Primer

SUMMARY LEARN MORE

You should know about regression analysis before The book by Cohen, Cohen, West, and Aiken (2003) is con-
learning the basics of SEM. For both sets of techniques, sidered by many as a kind of “bible” for multiple regression.
the results are affected not only by what is measured Royston, Altman, and Sauerbrei (2006) explain why catego-
(i.e., the data) but also by what is not measured, espe- rizing predictor or outcome variables is a bad idea. Shieh
cially if omitted predictors covary with included pre- (2006) describes suppression in more detail.
dictors, which is a specification error. Accordingly, you
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003).
should carefully select predictors after review of theory
Applied multiple regression/correlation analysis for the
and results of prior studies in the area. In regression,
behavioral sciences (3rd ed.). New York: Routledge.
those predictors should have adequate psychometric
characteristics because there is no allowance for mea- Royston, P., Altman, D. G., & Sauerbrei, W. (2006). Dichot-
surement error. The same restriction does not apply in omizing continuous predictors in multiple regression: A
SEM, but use of grossly inadequate measures in SEM bad idea. Statistics in Medicine, 25, 127–141.
can seriously bias the results, too. When selecting pre- Shieh, G. (2006). Suppression situations in multiple linear
dictors, the role of judgment should be greater than that regression. Educational and Psychological Measurement,
of significance testing, which can greatly capitalize on 66, 435–447.
sample-­specific variation.

EXERCISES

All questions concern the data in Table R.1. 4. Calculate the unstandardized regression equa-
tion and the standardized regression equation for
1. Calculate the unstandardized regression equation predicting Y from both X and W. Also calculate
for predicting Y from X based on the descriptive R Y2 ⋅X , W .
statistics.
5. Calculate Rˆ 2Y ⋅X , W .
2. Show that centering scores on X does not change
the value of the unstandardized regression coeffi- 6. Construct a histogram of the residuals for the
cient for predicting Y but does affect the value of the regression of Y on both X and W.
intercept.
7. Compute and interpret rWY
2 2
⋅ X and rY (X ⋅W) .
3. Show that = +
sY2 sŶ2 sY2 −Yˆ
and 2
rXY = sŶ2 / sY2 when X
is the only predictor of Y.

ANSWERS

1. Given the descriptive statistics and with slight 2. Given M X = 16.900, mean-­centered scores (x) are
rounding error: –.90, –2.90, –.90, –4.90, 1.10,
1.10, –3.90, –.90, 1.10, 5.10,
 10.870  1.10, 2.10, –.90, –.90, 5.10,
=
=
B X .686   2.479
3.007  –4.90, 3.10, –2.90, 4.10, .10

A X = 102.950 – 2.479 (16.900) = 61.054 and Mx = 0, SDx = 3.007, rxY = .686, so with slight
rounding error
Regression Primer 17

 10.870 
=
=
B X .686   2.479 R Y2 ⋅ X , W = .595(.686) + .337(.499) = .576
3.007 

A x = 102.950 – 2.479 (0) = 102.950


5. For N = 20, k = 2 and R Y2 ⋅ X , W = .576:

3. Given Ŷ = 2.479 X + 61.054, the predicted scores Ŷ  20 − 1 


Rˆ Y2 ⋅ X , W =1 − (1 − .576)  =.526
are  20 − 2 − 1

100.719, 95.761, 100.719, 90.803,


105.677, 105.677, 93.282, 100.719,
6. Presented next is the distribution of standardized
105.677, 115.593, 105.677, 108.156,
residuals for the regression of Y on both X and W
100.719, 100.719, 115.593, 90.803,
generated in SPSS with a superimposed normal
110.635, 95.761, 113.114, 103.198
curve:
and the residual scores Ŷ – Y are
4
–.719, –3.761, –12.719, 4.197, –7.677,
–4.677, 3.718, –2.719, 4.323, 8.407,
–3.677, 6.844, –8.719, 1.281, –11.593, 3

–5.803, 7.365, 9.239, –2.114, 18.802 Frequency

2
With slight rounding error,

sY2 = sŶ2 + sY2 − Yˆ = 55.570 + 62.586 = + 118.155 1

2
rXY = sY2 /sŶ2 = 55.570/118.155 = .470, so rXY = .686 0
−2.0 −1.0 0 1.0 2.0
Standardized residual

4. Given the descriptive statistics and with slight


rounding error:
.686 − .499(.272)
= bX = .594 7. For rXY = .686, rWY = .499, rXW = .272, and
1 − .272 2
and R Y2 ⋅ X , W = .576 with slight rounding error:
 10.870 
= =
B X .594   2.147 (.499 − .686(.272))2
3.007  rY2(W ⋅ X ) =.576 − .686 2 =. =.105
1 − .272 2
.499 − .686(.272)
=bW = .337 .576 − .686 2 (.499 − .686(.272))2
1 − .272 2 = 2
rWY ⋅X = . = .199
and 1 − .686 2
(1 − .272 2 )(1 − .686 2 )
 10.870  Respectively, variable W uniquely explains about
= =
BW .337   1.302
2.817  10.5% of the total variance in Y, and of variance in
Y not already explained by X, predictor W accounts
A X, W = 102.950 – 2.147 (16.900) – 1.302 (49.400) =
for about 19.9% of the rest.
2.340
18 Regression Primer

REFERENCES
Achen, C. H. (2005). Let’s put garbage-can regressions and Rodgers, J. L., & Nicewander, W. A. (1988). Thirteen ways to
garbage-can probits where they belong. Conflict Manage- look at the correlation coefficient. American Statistician,
ment and Peace Science, 22(4), 327–339. 42(1), 59–66.
Agresti, A. (2019). An introduction to categorical data analy- Royston, P., Altman, D.G., & Sauerbrei, W. (2006). Dichoto-
sis (3rd ed.). Wiley. mizing continuous predictors in multiple regression: A bad
Bollen, K. A. (1989). Structural equations with latent vari- idea. Statistics in Medicine, 25(1), 127–141.
ables. Wiley. Shieh, G. (2006). Suppression situations in multiple linear
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). regression. Educational and Psychological Measurement,
Applied multiple regression/correlation for the behavioral 66(3), 435–447.
sciences (3rd ed.). Routledge. Statgraphics Technologies, Inc. (1982–2013). Statgraphics
Goodwin, L. D., & Leech, N. L. (2006). Understanding cor- Centurion (Version 19.4.01). [Computer software]. https://
relation: Factors that affect the size of r. Journal of Experi- www.statgraphics.com/
mental Education, 74(3), 251–266. Thompson, B. (1995). Stepwise regression and stepwise
Huck, S. W. (1992). Group heterogeneity and Pearson’s r. discriminant analysis need not apply here: A guidelines
Educational and Psychological Measurement, 52(2), editorial. Educational and Psychological Measurement,
253–260. 55(4), 525–534.
Kenny, D. A., & Milan, S. (2012). Identification: A nontechni- Wherry, R. J. (1931). A new formula for predicting the shrink-
cal discussion of a technical issue. In R. H. Hoyle, (Ed.), age of the coefficient of multiple correlation. Annals of
Handbook of structural equation modeling (pp. 145–163). Mathematical Statistics, 2(4), 440–451.
Guilford Press. Whittingham, M. J., Stephens, P. A., Bradbury, R. B., &
Kline, R. B. (2013). Beyond significance testing: Statistics Freckleton, R. P. (2006). Why do we still use stepwise
reform in the behavioral sciences (2nd ed.). American modelling in ecology and behaviour? Journal of Animal
Psychological Association. Ecology, 75(5), 1182–1189.
Loehlin, J. C. (2004). Latent variable models: An introduc- Williams, M. N., Grajales, C. A. G., & Kurkiewicz, D. (2013).
tion to factor, path, and structural equation analysis (4th Assumptions of multiple regression: Correcting two mis-
ed.). Erlbaum. conceptions. Practical Assessment, Research, and Evalu-
Mauro, R. (1990). Understanding L.O.V.E. (left out variables ation, 18, Article 11.
error): A method for estimating the effects of omitted vari-
ables. Psychological Bulletin, 108(2), 314–329.

You might also like