Logistic Reg in SPSS PDF
Logistic Reg in SPSS PDF
Logistic Reg in SPSS PDF
Jörg Matthes
University of Zurich, Zurich, Switzerland
Researchers often hypothesize moderated effects, in which the effect of an independent variable on an outcome
variable depends on the value of a moderator variable. Such an effect reveals itself statistically as an interaction
between the independent and moderator variables in a model of the outcome variable. When an interaction is
found, it is important to probe the interaction, for theories and hypotheses often predict not just interaction but a
specific pattern of effects of the focal independent variable as a function of the moderator. This article describes
the familiar pick-a-point approach and the much less familiar Johnson–Neyman technique for probing interac-
tions in linear models and introduces macros for SPSS and SAS to simplify the computations and facilitate the
probing of interactions in ordinary least squares and logistic regression. A script version of the SPSS macro is
also available for users who prefer a point-and-click user interface rather than command syntax.
Behavioral science researchers long ago moved beyond One approach for probing interactions that we have
the business of theorizing about and testing simple bivari- seen used in the literature is the subgroup analysis or
ate cause and effect relationships, since few believe that separate regressions approach, where the data file is split
any effects are independent of situational, contextual, or into various subsets defined by values of the moderator
individual-difference factors. Furthermore, we understand and the analysis is repeated on these subgroups. But this
some variable’s effect on another better when we understand method does not properly represent how the focal predic-
what limits or enhances this relationship, or the boundary tor variable’s effect varies as a function of the modera-
conditions of the effect—for whom or under what circum- tor, especially when additional variables in the model are
stances the effect exists and where and for whom it does not. used as statistical controls. For details about the problems
Theoretical accounts of an effect can be tested and often are with this method—a method we do not recommend—see
strengthened by the discovery of moderators of that effect. Newsom, Prigerson, Schulz, and Reynolds (2003) and
So testing for moderation of effects, also called interaction, Stone-Romero and Anderson (1994).
is of fundamental importance to the behavioral sciences. Fortunately, there are more rigorous and appropriate
A moderated effect of some focal variable F on outcome methods for probing interactions in linear models, two of
variable Y is one in which its size or direction depends on which we will describe in this article. The first method
the value of a third, moderator variable M. Analytically, we discuss, the pick-a-point approach, is one of the more
moderated effects reveal themselves statistically as an in- commonly used. This approach involves selecting rep-
teraction between F and M in a mathematical model of Y. resentative values (e.g., high, moderate, and low) of the
In statistical models such as ordinary least squares (OLS) moderator variable and then estimating the effect of the
regression or logistic regression, moderation effects fre- focal predictor at those values (see, e.g., Aiken & West,
quently are tested by including the product of the focal in- 1991; Cohen, Cohen, West, & Aiken, 2003; Jaccard &
dependent variable and the moderator as an additional pre- Turrisi, 2003). A difficulty with this approach is that, fre-
dictor in the model. When an interaction is found, it should quently, there are no nonarbitrary guidelines for picking
be probed in order to better understand the conditions (i.e., the points at which to probe the interaction. An alternative
the values of the moderator) under which the relationship is the Johnson–Neyman (J–N ) technique (Johnson & Fay,
between the focal predictor and the outcome is strong ver- 1950; Johnson & Neyman, 1936; Potthoff, 1964), which
sus weak, positive versus negative, and so forth. identifies regions in the range of the moderator variable
A. F. Hayes, [email protected]
where the effect of the focal predictor on the outcome is in Equation 2, it can more easily be seen that the inclusion
statistically significant and not significant. Although this of the product of F and M makes F’s effect on Y a func-
method has been around for decades, it is rarely used, to tion of M:
our knowledge, probably due to a lack of researchers’ fa- k +1
miliarity with the method and its lack of implementation Yˆ = a + ( b1 + b3 M ) F + b2 M + ∑ bW
i i. (3)
in popular data analysis programs. Here, we will describe i=4
this method as applied to OLS and logistic regression and Observe from Equation 3 how the expected difference
provide a way to easily implement the J–N technique, as in Y as a function of differences in F, quantified now as
well as the pick-a-point approach, in SPSS and SAS in the b1 1 b3 M, clearly depends on M. Typically, b3 will be some
form of a macro that adds a new command, MODPROBE, value other than zero when the coefficients are estimated
to these two languages.1 Since an appreciation of the using available data. A hypothesis test will allow the analyst
methods requires an understanding of how to interpret to decide whether b3 is sufficiently different from zero to
the regression coefficients in a linear model, we will start warrant the inclusion of the interaction term in the model.
with a brief overview of some basic principles. Although it is tempting to interpret the coefficients and
hypothesis tests for F and M (i.e., b1 and b2) in Equation 2
Testing for Interaction in a Linear Model: as main effects, such an interpretation is not generally
Fundamental Principles justified or correct, except under limited conditions (see,
In a linear model, a set of k predictor variables is used e.g., Hayes, 2005, pp. 452– 456; Irwin & McClelland,
to model some outcome variable Y: 2001; Jaccard & Turrisi, 2003, p. 24). These are condi-
k tional effects, not main effects as they are understood in
Yˆ = a + b1 F + b2 M + ∑ bW
i i, (1) the ANOVA. In Model 2, b1 is interpreted as the expected
i=3 difference in Y between two cases that differ by one unit
where Ŷ is the estimate of the outcome variable Y, F and M in F but are at zero on M, with all W variables being held
are the focal and moderator variables, respectively, in the constant. Similarly, b2 is the expected difference in Y be-
discussion that follows below, and W is one or more ad- tween two cases that differ by one unit on M but are at zero
ditional predictor variables that are in the model for the on F, with all the W variables being held constant. The
purpose of statistical control. When Y is a continuous vari- t and p values for these coefficients are used to test the null
able, the ordinary least squares criterion is typically used hypothesis that these conditional effects are equal to zero.
to derive weights for each of the k predictor variables (the Although there is some interpretative value to centering
k values of b in Equation 1) that produce a linear com- M and F prior to calculating their product, which renders
bination of the predictors that minimizes the sum of the these effects as conditional at the other variable being at
squared differences between Y and Ŷ across all n cases the sample mean rather than zero, the need to center pre-
in the analysis. In models of a binary outcome, which we dictor variables is more one of choice or interpretational
will consider later, an iterative maximum-likelihood-based convenience than one of necessity (see, e.g., Cronbach,
method is typically used to estimate weights that produce 1987; Jaccard & Turrisi, 2003, pp. 27–28; Kromrey &
the best-fitting model of the probability of an arbitrarily Foster-Johnson, 1998). In all the discussions below, we
defined event, such as whether a person responds yes to a will not mean center the predictors.2
question rather than no, or is a member of one group rather In a model such as Equation 2, the interaction between
than another. F and M is quantified with a single regression coefficient.
In OLS, b1 in Equation 1 estimates the expected dif- Thus, it is sometimes called a single-degree-of-freedom
ference in Y between two cases that differ by a single unit (df ) interaction, because it requires only one df to esti-
on F but are equal on M and all W variables. This model mate it. Interactions between two dichotomous predic-
cannot be used to test whether M moderates the effect of tors, between a dichotomous and a quantitative predictor,
focal predictor F, for it constrains the effect of F to be or between two quantitative predictors are also single-df
independent of the values of M—exactly the opposite of interactions. In the remainder of this article, we will dis-
what a moderation hypothesis proposes. To test whether cuss how to deconstruct and interpret interactions of this
the effect of the focal predictor variable differs system- sort by focusing the analysis on examining how the effect
atically as a function of a proposed moderator variable, of the focal predictor varies depending on the modera-
this mathematical constraint must be eliminated. A widely tor variable, starting first with an interaction between a
used approach is to estimate Equation 1 with the addition dichotomous and a quantitative predictor. We will review
of the arithmetic product of the focal predictor variable the mathematics and describe macros that we have written
and the moderator variable (F 3 M) to the model: for SPSS and SAS to simplify the computations in OLS
k +1 regression. With the procedures described in that context,
Yˆ = a + b1 F + b2 M + b3 ( F × M ) + ∑ bW
i i. (2) we then will extend these methods to interaction between
i=4 two quantitative predictor variables and show how the
In this model, b3 estimates how the effect of F on Y changes same computational macros we have written can be used
as M changes by one unit, holding constant all k 2 3 of the for this problem. In the discussion that follows, we will
remaining variables (W ) in the model. Questions about focus on OLS regression models. At the end, we will note
interaction usually focus on the size and significance of the application of these principles to logistic regression
b3 in models such as Equation 2. By rearranging the terms and describe how our macros handle binary outcomes.
926 Hayes and Matthes
equals some value θ, which we will denote b1 | M 5 θ, can in the data with the function M′ 5 M 2 θ and then reesti-
be derived (from Equation 3 above) as mate Equation 4 above, substituting M′ for M throughout.
In the resulting model, the coefficient for F is the esti-
b1 | (M 5 θ) 5 b1 1 b3θ, (5) mated effect of F when M 5 θ, and the estimated standard
with standard error equal to error for this effect will be the same as that produced by
Equation 6 (see Aiken & West, 1991, pp. 18–19; Darling-
sb | M =θ = sb2 + 2θ sb2 b + θ 2 sb2 (6) ton, 1990, pp. 325–326; Hayes, 2005, pp. 457–458; Irwin
1 1 1 3 3 & McClelland, 2001; Jaccard & Turrisi, 2003, p. 27).
(see, e.g., Cohen et al., 2003, p. 273), where s 2b 1 and s 2b 3 are But how does one go about selecting a value of θ?
the variances (i.e., squared standard errors) of b1 and b3, Sometimes, choices of θ are easy to make because spe-
respectively, and s 2b 1b 3 is the covariance between b1 and b3 cific values of θ have some kind of important substantive
(obtained as optional output by most statistical analysis or theoretical meaning. In the absence of clear practical
programs). For this analysis, s 2b 1 5 0.0153, s 2b 3 5 0.0018, or theoretical guidance on what values of θ to choose, one
and s 2b 1b 3 5 20.0044. Although we do not recommend common strategy is to estimate the effect of the focal vari-
hand computation and, instead, advocate the use of the able among those relatively low, moderate, and high on the
macros we will describe later, we will step through this moderator. Using this strategy, low is typically defined as
one example by hand to illustrate the computations. one standard deviation (SD) below the sample mean, mod-
Suppose we want to know the expected difference in erate as the sample mean, and high as one SD above the
competence ratings between the experimental conditions sample mean, although other definitions could be used.
among those who are “average” in their political conser- In these data, Mw 5 2.4435, SD 5 1.5833, so low, moder-
vatism, using the sample mean as our definition of “aver- ate, and high values of ideology would be θ 5 0.8602
age.” In these data, M w 5 2.4435, so 2.4435 is our value (relatively liberal), θ 5 2.4435 (somewhat liberal), and
for θ. Applying Equations 5 and 6 results in the combined θ 5 4.0269 (relatively conservative), respectively. Apply-
equation at the bottom of this page. Under the null hy- ing Equations 5 and 6 for values of the moderator low
pothesis that the manipulation had no effect among those and high (moderate was computed above) yields b1 | (M 5
average in political conservatism, the ratio b1 | M 5 2.4435 0.8602) 5 0.0364 and b1 | (M 5 4.0269) 5 0.3600. Coin-
to its standard error is t distributed, with df equal to the cidentally, the standard error for both conditional effects is
residual df for the regression model. Here, dfresidual 5 244, 0.0952. So among those relatively liberal, the experimen-
and so t(244) 5 0.1982/0.0674 5 2.9407, p , .01. Among tal manipulation had no effect on perceived competence
those average in political conservatism, the experimental of the conservative candidate [t(244) 5 0.3824, p . .20].
manipulation did have an effect, with those assigned to the Among those relatively conservative, the manipulation
conservative success condition perceiving the conserva- did have an effect [t(244) 5 3.7815, p , .001], such that
tive candidate as more competent (by 0.1982 units) than among these more conservative readers, the conserva-
did those assigned to the liberal success condition. A c% tive candidate was perceived as higher in competence (by
CI could be constructed in the usual way as 0.3600 units) when he had raised more money, as com-
pared with when he had raised less.
(b1 | M ( 1
)
= θ ) ± t(100 − c )/ 2 sb | M = θ , (7) As was noted earlier, we do not recommend hand com-
putation, and the centering approach, although easier
where t (1002c)/2 is the t value that cuts off the upper than hand computation, is easy to misapply if the user is
(100 2 c)/2 percent of the t(dfresidual ) distribution from not comfortable with regression principles. As a compu-
the rest of the distribution. Here, the 95% CI for the ef- tational aide, we have developed a macro for SPSS and
fect of the manipulation among those who are average SAS, called MODPROBE, that can be downloaded from
in their conservatism is 0.1982 6 1.9697(0.0674), or www.comm.ohio-state.edu/ahayes/macros.htm and that
0.0654–0.3310. makes the pick-a-point approach easy to implement. The
These computations are tedious to do by hand, and the macro produces the usual regression output, as well as
potential for computational error is high. Fortunately, they estimates of the effect of the focal predictor variable at
can be done by computer by capitalizing on the interpre- values of the moderator variable. Once the macro is acti-
tation of b1 in the regression model. Recall that b1 is the vated, the MODPROBE command in SPSS
effect of F (the experimental manipulation here) when MODPROBE y 5 comp/x 5 cond ideology.
M 5 0. What we would like is for b1 to estimate the effect
of F when M 5 θ. This is simple enough to get by center- yields the output in Appendix A corresponding to this ex-
ing M around θ. To do so, subtract θ from all values of M ample analysis. The syntax convention requires the pre-
Table 2 described here and that follow, without the need to enter
MODPROBE Macro Output for Ŷ As a Function of the Focal correctly formatted syntax. Running the script produces
Predictor Variable and Moderator
a Windows-style dialog box where the user sets up the
Moderator values are the sample mean and plus/
minus one SD from mean problem and selects options. A screen shot of the dialog
Data for Visualizing Conditional Effect of
box can be seen in Figure 2.
Focal Predictor Johnson–Neyman technique. The J–N technique was
Cond Ideology Comp originally designed for the two-group ANCOVA problem
.0000 .8602 2.7402 when the homogeneity of regression assumption could not
1.0000 .8602 2.7766 be justified (Johnson & Fay, 1950; Johnson & Neyman,
.0000 2.4435 2.8461 1936; Potthoff, 1964). Later, this method was generalized
1.0000 2.4435 3.0443
.0000 4.0269 2.9520
to the broader category of linear models (Bauer & Curran,
1.0000 4.0269 3.3120 2005). It avoids the potential arbitrariness of the choice
of θ in the pick-a-point approach by mathematically deriv-
Note—Comp 5 Ŷ.
ing the point or points along the continuum of the mod-
erator where the effect of the focal predictor transitions
between statistically significant and nonsignificant. Such
dictor variables to be listed in a specific order, with the points, if they exist, provide information about the range
moderator variable listed last and the focal predictor vari- of values of the moderator where the focal predictor has a
able listed second to last. (As will be discussed later, any statistically significant effect and where it does not.
other variables in the predictor variable list that precede The pick-a-point approach will produce a statistically
the focal and moderator predictors are treated as statisti- significant result for a chosen θ if the absolute value of the
cal controls.) In the absence of an instruction from the ratio of the conditional effect (Equation 5) to its standard
user otherwise, the macro also estimates the effects of the error (Equation 6) exceeds the critical value of t(dfresidual )
focal variable at low (one SD below the mean), moderate for a hypothesis test at level of significance α. The J–N
(sample mean), and high (one SD above the mean) values technique asks, at what values of θ does t equal or exceed
of the moderator. the critical t so as to produce a p value for t no greater
To assist in the visualizing of the interaction, the macro than α? This problem is solved by finding the values of θ,
can produce a table containing Ŷ as a function of the focal if they exist, where this ratio is equal to the critical t. These
predictor variable and moderator. This information is values define limits of the regions of significance for the
requested by specifying subcommand “est 5 1.” Doing focal predictor variable along the moderator variable con-
so produces the additional output shown in Table 2. This tinuum and are calculated as shown in Equation 8 on the
table could be input into the graphing program to produce next page (see Bauer & Curran, 2005, for more detail on
a visual plot of the interaction, and the macro produces a the derivation). The application of Equation 8 will yield
data file like the one above to facilitate this. two values of θ that produce a ratio of Equations 5 and 6
Suppose that you want to calculate the effect of the focal exactly equal to the critical t (tcrit in Equation 8) for the
predictor variable at a specific value of the moderator other null hypothesis test that the effect of the focal predictor
than low, moderate, or high as defined above and produced on Y equals zero at moderator value θ at a chosen level
by default by the macro. The subcommand “modval 5 θ,” of significance. However, not all values of θ from Equa-
where θ is the value of the moderator variable for which tion 8 will be solutions in the range of the measurement of
the effect is desired, produces an estimate of the effect of the moderator. For example, one or both could be imagi-
the focal variable at moderator value θ. For example, the nary numbers or be based on a projection of the pattern
SPSS command MODPROBE y 5 comp/x 5 cond of interaction above the maximum or below the minimum
ideology/modval 5 6 produces the output after possible measurement on the moderator variable. So the
the regression model shown in Table 3, which tells us that only θs worth interpreting are those that lie in the range
when ideology 5 6 (highly conservative), the estimated of observation on the moderator variable. If no θs meet
effect of the manipulation is 0.5616 [t(244) 5 3.3847, p , this criterion, this means that the effect of the focal vari-
.001], with a 95% CI from 0.2348 to 0.8885. able is statistically significant across the entire observed
With a little practice, syntax is easy to master, but it range of the moderator or it never is. Any θ meeting this
can be frustrating to the uninitiated. To simplify the use of criterion marks a point of transition for the effect of the
our macro still further, we have produced an SPSS script focal predictor as it changes from statistically significant
(which can be downloaded from the same page noted to nonsignificant. It is possible for there to be two points
above) that completes all the computations that we have of transition, meaning that as the moderator increases, the
Table 3
MODPROBE Macro Output After the Regression Model
Conditional Effect of Focal Predictor at Values of the
Moderator Variable
Ideology b se t p LLCI(b) ULCI(b)
6.0000 .5616 .1659 3.3847 .0008 .2348 .8885
MODPROBE for Probing Interactions in SPSS and SAS 929
( ) ( 2t ) ( )( )
2
2 2 2 2 2 2
−2 tcrit sb b − b1b3 ± crit sb1b3 − 2b1b3 − 4 tcrit sb − b32 tcrit
2 2
sb − b12
θ= 1 3 3 1
(8)
2 ( 2 2
t sb3
tcrit − b32 )
930 Hayes and Matthes
Table 4
MODPROBE Macro Output for Focal Predictor
at Values of Moderator Variable
Conditional Effect of Focal Predictor at Values of the
Moderator Variable
Cond b se t p LLCI(b) ULCI(b)
.0000 .0669 .0292 2.2891 .0229 .0093 .1245
1.0000 .1691 .0310 5.4543 .0000 .1080 .2302
derson, 1994). In this section, we will illustrate the use of 0.0727. Figure 3 plots this interaction graphically using
our macro and also show how covariates are included in the coefficients from this model, setting the covariates to
the macro command line in order to control for their po- their sample mean. As can be seen, when attitude certainty
tential influence on regression coefficient estimates. is near the bottom of the scale (i.e., closer to 1), the coef-
The data for this example come from a telephone survey ficient for perceived opinion climate is positive, whereas
of 1,200 residents of Switzerland just prior to a national ref- at higher levels of certainty (closer to 5), the coefficient
erendum about legal procedures required for immigrants to is negative. So it seems that respondents who are rela-
become Swiss citizens. The outcome variable, dangerous tively lacking in confidence about their attitude about the
discussion, was responses of the participants to various referendum report more frequent dangerous discussion
questions about how frequently they engage in conversa- when they perceive relatively greater support for their
tion with others who have an opinion different from their own opinion in the community. However, there is little
own about the referendum, scaled from 1 (not at all) to relationship, or even a slightly negative one, between
5 (very frequently). They were also asked about the extent perceptions of support for one’s opinions and dangerous
to which they believed that other people in their commu- discussion among those who report greater confidence in
nity agreed with their own opinion about the referendum, their attitude.
also scaled from 1 (not at all) to 5 ( fully), a variable that With evidence that opinion climate and attitude cer-
we call perceived opinion climate. An additional set of tainty interact, we now will probe how the effect of the
questions was used to quantify the respondents’ certainty focal predictor, perceived opinion climate, varies as a
about their own attitude about the topic of the referendum, function of the moderator, attitude certainty. Using Equa-
scaled from 1 (very uncertain) to 5 (very certain). tion 5 or information from the macro printed by default,
The goal of the analysis is to estimate the effect of the we could derive and interpret the coefficient for opinion
perceived opinion climate on frequency of dangerous dis- climate when attitude certainty is equal to the mean (θ 5
cussion and how much, if at all, that effect depends on 4.2581), as well as one SD above (θ 5 5.2015) and below
attitude certainty. Thus, perceived opinion climate is the (θ 5 3.3147) the sample mean:
focal predictor (F ), and attitude certainty is the proposed
b1 | (M53.3147) 5 0.292720.0727(3.3147) 5 0.0517
moderator (M ). To do so, F, M, and F 3 M are included as
b1 | (M54.2581) 5 0.292720.0727(4.2581) 5 0.0169
predictors in an OLS regression predicting dangerous dis-
b1 | (M55.2015) 5 0.292720.0727(5.2015) 5 0.0855.
cussion frequency. We also include four additional vari-
ables as statistical controls: respondent sex (W1: 1 5 male, The section of the macro output in Table 6 provides these
0 5 female), age (W2: in years), and the language the in- conditional effects by default. It would seem that only
terview was conducted in (W3: 1 5 German, 0 5 French). among those relatively high in attitude certainty is there
The final control variable, general discussion frequency a statistically significant negative relationship between
(W4), is a measure of frequency of discussion about the perceived opinion climate and frequency of dangerous
referendum with friends and family, scaled 1 (not at all ) discussion [t(1193) 5 22.3731, p , .02], with a 95% CI
to 5 (very often). The model estimated is from 20.1563 to 20.0148. Such a claim would be mis-
taken, for observe that 5.2015 is beyond the upper bound
Yˆ = a + b1 F + b2 M + b3 ( F × M )
+ b4W1 + b5W2 + b6W3 + b7W4 .
Table 5
Information pertinent to this model can be found in OLS Regression Estimating Frequency of Dangerous Discussion
Table 5, and the command line and output from the SAS From Perceived Opinion Climate, Attitude Certainty,
version of the macro can be found in Appendix B. In the and Their Interaction, With Various Statistical Controls
command line, any variable listed prior to the focal pre- Coefficient SE t p
dictor in the predictor variable list (recall that the focal a: constant 0.0670 0.4246 0.1579 .8746
predictor is listed second to last and the proposed mod- b1: climate (F ) 0.2927 0.1222 2.3958 .0167
erator goes last) is treated as a covariate. As can be seen, b2: certainty (M ) 0.2432 0.0929 2.6187 .0089
b3: F 3 M 20.0727 0.0275 22.6409 .0172
the perceived opinion climate and attitude certainty do b4: sex (W1) 0.1799 0.0593 3.0329 .0025
interact [b3 5 20.0727; t(1193) 5 22.6409, p , .01]. b5: age (W2) 20.0021 0.0018 21.1743 .2405
The coefficient for the interaction means that as attitude b6: language (W3) 20.1815 0.0721 22.5183 .0119
certainty increases by one unit, the coefficient for opinion b7: discussion (W4) 0.5215 0.0257 20.2929 ,.0001
climate decreases (because the coefficient is negative) by Note—R 5 .5238, R2 5 .2744, F(7,1193) 5 64.4442, p , .0001.
MODPROBE for Probing Interactions in SPSS and SAS 931
Table 6
MODPROBE Macro Output For Focal Predictor
at Values of Moderator Variable
Conditional Effect of Focal Predictor at Values of the
Moderator Variable
CERTAIN b se t p LLCI(b) ULCI(b)
3.3147 0.0517 0.0387 1.3339 0.1825 -0.0243 0.1277
4.2581 -0.0169 0.0269 -0.6290 0.5295 -0.0698 0.0359
5.2015 -0.0855 0.0360 -2.3731 0.0178 -0.1563 -0.0148
932 Hayes and Matthes
Table 7
MODPROBE Output: Estimating the Conditional Effect of
Focal Predictor at Specific Values of the Moderator
%modprobe (data5immig, y5danger, x5lang discuss sex age climate certain, modval51);
Conditional Effect of Focal Predictor at Values of the Moderator Variable
CERTAIN b se t p LLCI(b) ULCI(b)
1.0000 0.2200 0.0955 2.3035 0.0214 0.0326 0.0474
%modprobe (data5immig, y5danger, x5lang discuss sex age climate certain, modval52);
Conditional Effect of Focal Predictor at Values of the Moderator Variable
CERTAIN b se t p LLCI(b) ULCI(b)
2.0000 0.1473 0.0695 2.1187 0.0343 0.0109 0.2837
change in the log odds of the event as F increases by one teraction between F and M varies across values of Z. A
unit itself changes as M increases by one unit, all Ws held statistically significant three-way interaction begs probing
constant. Raising e to the power of b3 yields a ratio of odds in the same way that a two-way interaction does. In this
ratios, which quantifies the factor change in the odds ratio case, focus would be on how the effect of F as a function
for F when M increases by one unit. of M (the two-way interaction between F and M) varies as
A significant interaction in logistic regression begs prob- a function of Z. If Z were dichotomous, this would involve
ing, just as in OLS regression, and the pick-a-point approach estimating the two-way interaction between F and M for
and J–N techniques can be used with very little modifica- the two values of Z. If Z were a quantitative dimension, the
tion to the procedures just described. In OLS regression, pick-a-point approach or the J–N technique could be used
the ratio of a variable’s regression coefficient to its standard to ascertain where on the Z continuum the two-way inter-
error is distributed as t under the null hypothesis of no effect action between F and M is large, small, positive, negative,
of that variable on the outcome. In logistic regression, this significant, and not significant.
ratio is typically treated as a standard normal variable or, Our macros can be used to probe single-df three-way in-
when squared, a chi-square statistic with one df (typically teractions. When the string of predictor variables is listed,
printed in software packages as the Wald statistic). The con- the macro automatically generates the product of the last
ditional effect of F when M 5 θ and its standard error can (the moderator) and the second to last (the focal predictor)
be derived using Equations 5 and 6, and a p value for their variables in the list prior to estimating the model. In this
ratio calculated from the standard normal or (if squared) case, the focal predictor variable would be the product of
the χ2(1) distribution. Alternatively, the J–N technique can F and M, and the moderator variable would be Z, so those
be used by replacing t 2crit in Equation 8 with the critical should be listed second to last and last, respectively, in the
χ2(1) for a hypothesis test at the α level of significance. “x 5” section of the command. All the products repre-
The MODPROBE macro (and SPSS script) will auto- senting the two-way interactions must be generated first,
matically detect whether or not the outcome variable is bi- and those products not functioning as the focal predictor
nary, and if so, it estimates the model using logistic regres- entered into the command as covariates.
sion rather than OLS. The logistic regression coefficients Consider, for instance, an extension of the first example
are estimated using maximum likelihood and iterating to a by including a three-way interaction between sex (sex:
solution with the Newton–Raphson method. The user can coded 0 5 males, 1 5 females), experimental condition,
control the number of iterations and convergence criteria and political ideology. The following SPSS commands
if desired (which default to 10,000 and .0000001). Space would accomplish the analysis:
constraints preclude printing an example of the use of the compute sexXcond 5 sex*cond.
macro with a binary outcome here. A worked example can compute sexXideo 5 sex*ideology.
be found where the macro can be downloaded, at www compute conXideo 5 ideology*cond.
.comm.ohio-state.edu/ahayes/macros.htm. modprobe y 5 comp/x 5 cond ideo sexXcond
Higher order interactions. Higher order interactions sexXideo conXideo sex.
involving dichotomous or quantitative predictor variables
can also be represented with a single regression coefficient. Because sex is a dichotomous moderator in this example,
Consider, for instance, a three-way interaction between a the macro will automatically generate estimates of the
focal predictor (F), a moderator (M) of the effect of F, and three-way interaction as well as the two-way interaction be-
a third predictor variable (Z) proposed to moderate the size tween condition and ideology for the two values of the mod-
of the interaction between F and M. A model with a three- erator variable (males and females). If the moderator vari-
way interaction would typically be estimated as such: able were a quantitative variable, such as age, by default the
macro would produce estimates of the two-way interaction
Yˆ = a + b1 F + b2 M + b3 Z + b4 ( M × Z ) + b5 ( F × Z ) between condition and ideology when age is at the sample
+ b6 ( F × M ) + b7 ( F × M × Z ). mean as well as a standard deviation above and below the
sample mean. The modval subcommand could be used to
In this model, b7 estimates the three-way interaction be- estimate the interaction between condition and ideology at
tween F, M, and Z—the extent to which the two-way in- specific values of the moderator, or the JN subcommand
MODPROBE for Probing Interactions in SPSS and SAS 933
could be used to ascertain the region of significance on the multiple regression/correlation analysis for the behavioral sciences
moderator continuum for the two-way interaction. (3rd ed.). Mahwah, NJ: Erlbaum.
Cronbach, L. J. (1987). Statistical tests for moderator variables: Flaws
Simultaneous inference. The J–N technique allows in analyses recently proposed. Psychological Bulletin, 102, 414-417.
one to claim that for any single point selected on the mod- Darlington, R. B. (1990). Regression and linear models. New York:
erator continuum within the region of significance, the McGraw-Hill.
effect of the focal predictor is statistically significant at Hayes, A. F. (2005). Statistical methods for communication science.
the chosen α level. However, it is not accurate to say that Mahwah, NJ: Erlbaum.
Irwin, J. R., & McClelland, G. H. (2001). Misleading heuristics and
at the α level of significance, the effect of the focal predic- moderated multiple regression models. Journal of Marketing Re-
tor is statistically significant simultaneously at all values search, 38, 100-109.
of the moderator within the region(s) of significance. The Jaccard, J., & Turrisi, R. (2003). Interaction effects in multiple regres-
probability of a Type I error for this claim is larger than α. sion (2nd ed.). Thousand Oaks, CA: Sage.
Johnson, P. O., & Fay, L. C. (1950). The Johnson–Neyman technique,
Potthoff (1964) recognized this as a problem similar to its theory and application. Psychometrika, 15, 349-367.
the one faced by the data analyst interested in post hoc Johnson, P. O., & Neyman, J. (1936). Tests of certain linear hypotheses
pairwise comparisons between means in an ANOVA while and their application to some educational problems. Statistical Re-
maintaining the probability of a Type I error across the search Memoirs, 1, 57-93.
entire set of comparisons at a desired α level. His solution Karpman, M. B. (1983). The Johnson–Neyman technique using SPSS
or BMDP. Educational & Psychological Measurement, 43, 137-147.
was to substitute F(2,dfresidual ) for t crit
2 in Equation 9 when
Karpman, M. B. (1986). Comparing two non-parallel regression lines
deriving regions of significance. Using this method, the with the parametric alternative to analysis of covariance using SPSS-X
region(s) of significance will be smaller with this proce- or SAS—the Johnson–Neyman technique. Educational & Psycho-
dure, as compared with the J–N method described earlier. logical Measurement, 46, 639-644.
Kromrey, J. D., & Foster-Johnson, L. (1998). Mean centering in mod-
Our macro implements the Potthoff procedure for OLS erated multiple regression: Much ado about nothing. Educational &
regression (but not logistic regression) by specifying Psychological Measurement, 58, 42-67.
“jn 5 2” at the end of the command line. Applied to the MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D.
first example above, the point of transition on the ideology (2002). On the practice of dichotomization of quantitative variables.
scale between a statistically significant and nonsignifi- Psychological Methods, 7, 19-40.
Newsom, J. T., Prigerson, H. G., Schulz, R., & Reynolds, C. F., III
cant effect of the experimental manipulation was 2.158. In (2003). Investigating moderator hypotheses in aging research: Sta-
other words, the region of significance is smaller using this tistical, methodological, and conceptual difficulties with comparing
method (ideology $ 2.158, as compared with $1.8787 separate regressions. International Journal of Aging & Human Devel-
for the nonsimultaneous J–N technique). Potthoff (1964, opment, 57, 119-150.
O’Connor, B. P. (1998). SIMPLE: All-in-one programs for exploring
p. 244) recognized that this procedure may be overly con- interactions in moderated multiple regression. Educational & Psycho-
servative for some tastes and suggested using a higher α logical Measurement, 58, 836-840.
level so as to ensure that the region of significance not be Potthoff, R. F. (1964). On the Johnson–Neyman technique and some
so small as to render the procedure largely useless. extensions thereof. Psychometrika, 29, 241-256.
Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational
tools for probing interactions in multiple linear regression, multilevel
Summary modeling, and latent curve analysis. Journal of Educational & Behav-
In this article, we discussed two methods for probing in- ioral Statistics, 31, 437-448.
teractions in linear models—the pick-a-point approach and Reineke, J. B., & Hayes, A. F. (2007, November). Reporting on cam-
the J–N technique. We provided two examples and illus- paign finance success: Effects on perceptions of political candidates.
Paper presented at the annual meeting of the National Communication
trated the implementation of these methods using macros Association, Chicago.
written for SPSS and SAS to ease the computational burden Rogosa, D. (1980). Comparing nonparallel regression lines. Psycho-
on the investigator. We hope that this article will serve as a logical Bulletin, 88, 307-321.
useful aide to researchers and that the macros will enhance Stone-Romero, E. F., & Anderson, L. E. (1994). Relative power of
the likelihood of rigorous probing of interactions detected moderated multiple regression and the comparison of subgroup corre-
lation coefficients for detecting moderator effects. Journal of Applied
by investigators and reported in the research literature. Psychology, 79, 354-359.
Author Note Notes
Correspondence concerning this article should be addressed to A. F. 1. Although we are not the first to produce SPSS code for probing inter-
Hayes, School of Communication, Ohio State University, 154 N. Oval Mall, actions in OLS regression (O’Connor, 1998), including the J–N technique
3016 Derby Hall, Columbus, OH 43210 (e-mail: [email protected]). for the simple two-group ANCOVA problem (Karpman, 1983, 1986), the
macros we describe here can be used for a variety of interactions, rather
References than requiring the investigator to use different programs for different types
of interactions. None of the existing programs applies the J–N technique
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and to probing interactions between two quantitative variables, none works for
interpreting interactions. Thousand Oaks, CA: Sage. binary outcomes, and none of them automatically controls for additional
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and variables in a model if the user desires such control. There is a Web-based
multilevel regression: Inferential and graphical techniques. Multivari- tool that implements the J–N technique (Preacher, Curran, & Bauer, 2006)
ate Behavioral Research, 40, 373-400. for OLS regression (but not logistic regression) that is somewhat tedious to
Bissonnette, V., Ickes, W., Bernstein, I., & Knowles, E. (1990). use, in that it requires the investigator to plug various coefficients and vari-
Personality moderating variables: A warning about statistical artifact ance estimates into the proper location into a form, and the likelihood for
and a comparison of analytic techniques. Journal of Personality, 58, error in use is high. O’Connor’s SPSS programs require more knowledge
567-587. of syntax than we believe most users are likely to have and do not permit
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied the addition of covariates in a model without first residualizing variables
934 Hayes and Matthes
manually. SPSS users not comfortable with macros can use our SPSS multicollinearity (Cohen et al., 2003, pp. 202–203). This can stabilize
script instead, which uses an SPSS dialog box for setting up the model. the mathematics and reduce the likelihood that rounding error will creep
2. The usual justification for mean centering is to reduce the deleteri- into computations, which can be important when there are several inter-
ous effects of multicollinearity on the estimation process. The correla- actions in a model. For those who prefer to mean center, the macros we
tions between the lower order and product variables are reduced by mean describe do have an option for mean centering the focal and predictor
centering prior to computing the product, thereby removing nonessential variables prior to estimation of the model.
Appendix A
Example SPSS Macro Output
MODPROBE y 5 comp/x 5 cond ideology.
Regression Summary
R-sq F df1 df2 p
.1478 14.1084 3.0000 244.0000 .0000
5555555555555555555555555555555555555555555555555555
b se t p
constant 2.6826 .0873 30.7211 .0000
cond -.0515 .1237 -.4163 .6775
ideology .0669 .0292 2.2891 .0229
interact .1022 .0426 2.3986 .0172
By adding “/jn 5 1” to the end of the command, the J–N output is produced below the regression model:
Moderator Value(s) Defining Johnson-Neyman Significance Region(s) 1.8787
Conditional Effect of Focal Predictor at Values of Moderator Variable
Ideology b se t p LLCI(b) ULCI(b)
.0000 -.0515 .1237 -.4163 .6775 -.2953 .1922
.3000 -.0209 .1132 -.1842 .8540 -.2439 .2022
.6000 .0098 .1032 .0949 .9245 -.1935 .2131
.9000 .0405 .0939 .4309 .6669 -.1445 .2254
1.2000 .0711 .0855 .8322 .4061 -.0972 .2394
1.5000 .1018 .0782 1.3013 .1944 -.0523 .2558
1.8000 .1324 .0725 1.8264 .0690 -.0104 .2753
1.8787 .1405 .0713 1.9697 .0500 .0000 .2810
2.1000 .1631 .0687 2.3725 .0184 .0277 .2985
2.4000 .1937 .0672 2.8820 .0043 .0613 .3262
2.7000 .2244 .0681 3.2942 .0011 .0902 .3586
3.0000 .2551 .0713 3.5758 .0004 .1146 .3956
3.3000 .2857 .0766 3.7316 .0002 .1349 .4365
3.6000 .3164 .0834 3.7913 .0002 .1520 .4807
3.9000 .3470 .0916 3.7884 .0002 .1666 .5275
MODPROBE for Probing Interactions in SPSS and SAS 935
Appendix A (Continued)
Alpha level used for Johnson-Neyman method and confidence intervals: .05
Appendix B
Example SAS Macro Output
%modprobe (data5immig, y5danger, x5lang discuss sex age climate
certain);
Variables
Regression Summary
R-sq F df1 df2 p n
0.2744 64.4442 7.0000 1193.0000 0.0000 1201.0000
5555555555555555555555555555555555555555555555555555
Model
b se t p
CONSTANT 0.0639 0.4433 0.1441 0.8855
LANG 0.1815 0.0721 2.5183 0.0119
DISCUSS 0.5215 0.0257 20.2929 0.0000
SEX -0.1799 0.0593 -3.0329 0.0025
AGE -0.0021 0.0018 -1.1743 0.2405
CLIMATE 0.2927 0.1222 2.3958 0.0167
CERTAIN 0.2432 0.0929 2.6187 0.0089
INTERACT -0.0727 0.0275 -2.6409 0.0084
INTERACT is defined as
CLIMATE X CERTAIN
5555555555555555555555555555555555555555555555555555
Conditional Effect of Focal Predictor at Values of the Moderator Variable
CERTAIN b se t p LLCI(b) ULCI(b)
3.3147 0.0517 0.0387 1.3339 0.1825 -0.0243 0.1277
4.2581 -0.0169 0.0269 -0.6290 0.5295 -0.0698 0.0359
5.2015 -0.0855 0.0360 -2.3731 0.0178 -0.1563 -0.0148
Moderator values are the sample mean and plus/minus one SD from mean
Warning: One SD above the mean is beyond the available data
Appendix B (Continued)