Chapter 1. Introduction and Review of Univariate General Linear Models
Chapter 1. Introduction and Review of Univariate General Linear Models
1
Some authors prefer the terms quantitative and qualitative to describe predictor
variables that are continuous or categorical. In this volume, we use the term con-
tinuous to denote variables whose underlying metric is continuous or discrete, and
we use the term categorical to denote nominal group structure that has no meaning-
ful underlying metric except to identify categories.
1
2
In the univariate case, regression models are those models that are lim-
ited to a single criterion, response, dependent, or outcome variable.2
Univariate regression models can be expressed mathematically as a regres-
sion function,
Y = β 0 + β1 X 1 + ε, [1.1]
2
We use the terms dependent, criterion, response, and outcome interchangeably in
this volume to describe the Y variable in models. The X variables in the model will
be interchangeably referred to as predictor, explanatory, or independent variables.
These terms appear throughout the literature on regression analysis. Some authors
prefer to reserve the term dependent variable to experimental designs with manipu-
lated conditions.
3
for a simple model with a single predictor variable. For a more complex
model with multiple predictors, we may write3
Y = β 0 + β1 X 1 + β 2 X 2 + + βq Xq + ε. [1.2]
3
We do not identify the response and explanatory variables Y or X with a subscript
to indicate the serial order of the 1st through the nth observations. In this volume,
all models are based on the full set of n observations, and the index of summation
or multiplication is assumed to be across all n participants.
4
Coding schemes for categorical variables will be introduced at greater length in
later sections.
4
Y1 1 X 11 X 12 X 1q β 0 ε1
Y 1 X X 22 X 2 q β1 ε 2
2 = 21
+ .
Yn 1 X n1 X n 2 X nq β q ε n
5
We use italics to represent scalars (e.g., X, Y, Z, β, ε), boldface lowercase letters to
denote row or column vectors (e.g., a, b, y, x, , ), and boldface uppercase letters
to denote matrices (e.g., X, Y, B, E, ). If a column or row vector is deliberately
represented by a matrix symbol, its vector status will be made explicit by the order
of the matrix, e.g., (n × 1) or (1 × p).
5
Y11 Y12 X 1 p 1 X 11 X 12 X 1q
Y Y2 p 1 X 21 X 2 q
21 Y22 =
X 22
Yn1 Yn 2 Ynp 1 X n1 X n2 X nq
β 01 β 02 β 0 p ε11 ε12 ε1 ε n1
β β1 p ε 21 ε 22 ε 2 ε n 2
11 β12 + .
β q1 β n 2 β qp ε n1 ε n1 ε n ε np
72 1 41 13 0 ε1
115 1 51 β 0 ε
18 1 β 2
y (40 × 1) = 117 , X(40 × 4) = 1 80 14 0 , (4 × 1) = , = ε3 .
1
β 2
111 1 59 10 0 β3 ε 40
TMT-B 1.000
E (Y |X ) = X = β 0 + β1 X 1 + β 2 X 2 + + β q X q . [1.5]
8
Figure 1.1
The Linear Regression Function With Expected Values
(Means) of the Conditional Distributions of Y on X
for the Data of Table 1.1
120
Y3
Y2
105
E{Y3} = 109
T-B
TM
90
E{Y} = 39+.93X
E{Y2} = 86
75
Y1 E{Y1} = 75
0
30 40 50 60 70 80 X
Age
These expected values are the means of the conditional probability dis-
tributions of Y, say µ Y |X , for each of the values of X j . The linear model
( j)
specifying the relationship between Y and X requires that the conditional
means of Y|X fall precisely on a straight line defined by the model as illus-
trated in Figure 1.1 for a single predictor variable. Linear models with two
predictors require that the regression surface defined by X be a two-
dimensional plane with partial slopes defining the X axes of the graph as
shown in Figure 1.2. For the simple regression model of Equation 1.1, the
parameter β 0 defines the expected value of Y|X = 0 and β1 defines the
expected rate of change in Y per unit change in X. From the example data
of Table 1.1, the regression function of Y = TMT-B on X = Age would
appear as in Figure 1.1, in which the conditional means of Y (time to com-
pletion of TMT-B) given three values of X = 40, 50, and 75, for example
(i.e., E{Y1}, E{Y2}, E{Y3}), lie precisely on the regression line to satisfy the
assumption of linearity. Note that the values of the observations Y1, Y2, and
Y3 appear in the plane of their respective probability distribution but deviate
from their conditional mean. The vector of deviations, = y − X, are the
error terms of the regression model in Equation 1.3.
9
100
14
Ed
80
uc
12
60
ati
Age
on
10
8 40
y = X + . [1.6]
The differences between Y and the expected values of Y are the errors of
prediction of the model,
= y − E ( y|X) ,
6
[1.7]
6
The symbols Y , y,
Y , and µ
(y|X1 X 2 X q ) will denote sample estimates of the popula-
tion E (Y|X ) .
10
which are illustrated by the distance from each point to the two-dimensional
plane in Figure 1.2. The closer all the observed values are to the fitted regres-
sion plane, the better the fit of the model to the data.
The criterion of least squares is used to estimate optimal values of
such that the discrepancies between the observations and the value pre-
dicted by the model are as small as possible. Using the differential calculus,
the values of are chosen to minimize the sum of the squared errors of
prediction:
= ( X′ X)−1 ( X′ Y).
[1.9]
Applying Equation 1.8 to the example data of Table 1.1 gives the
unstandardized parameter estimates of the regression of TMT-B on age,
education, and gender,8
β 0 65.69
= β1 = 0.92 .
β 2 −1.87
−4.68
β3
7
We will use the diacritic ^ over the symbol to denote a sample estimate of its
population parameter.
8
( X′ X)−1
is the inverse of the uncorrected raw score sum of squares and cross
products matrix (SSCP) of X and ( X′ Y) is the uncorrected raw score sum of cross
products (SCP) between X and Y. The unstandardized regression coefficients of
Equation 1.8 are identical to those obtained by mean corrected SSCP and SCP
matrices. Details of the relationship between raw score and mean corrected SSCP
and SCP matrices are given in Rencher (1998, pp. 269–271).
11
Interpretations follow the usual rules: Each one year increase in age is
accompanied by an increase of approximately 9 of a second to complete
10
the TMT-B task; each additional year of education reduces the time-to-
completion of about 2 seconds; and males and females differ by an average
of about 4.7 seconds on the timed TMT-B where females show faster per-
formance. The expected time to completion of the TMT-B for a 50-year-old
woman with 12 years of education would be estimated at 85 seconds.
It is occasionally useful to reparameterize the regression model to mean
zero and unit variance (e.g., ZY , Z X1, Z X 2, Z X 3) in which the particulars of the
regression model in standard score form9 can be expressed in terms of cor-
relation coefficients. The standard score regression model can be written in
scalar and matrix form as
Z y = Z X* + , [1.10]
β *
1 0.61
= β *2 = −0.15 .
*
β * −0.07
3
9
The symbol β* will be used to denote parameters in standard score form with the
standardized estimates of the parameters denoted by β *.
10
R XX and R XY are the sample size–adjusted SSCP and SCP matrices in standard
score form.
12
11
Standardized regression coefficients have little meaning for categorical predictor
variables. The standard deviation of the numbers used to designate categories of a
nominal grouping variable has no meaningful interpretation beyond the ability of
the numerals to distinguish categories. In a later section, we note that the standard-
ized version of a dichotomous predictor may have a useful interpretation when
involved in a test of relative importance when compared with other predictors in the
model.
13
2
σ 2 = ∑ (Y − X B) ,
n − qf − 1
(
squared errors SS =
ERROR ∑(
2
) ∑ )
Y − X β = ε 2 = ′ and the squared mul-
tiple correlation coefficient (R ). To achieve each of these measures
2
requires that the variability in the response variable be partitioned into its
constituent parts related to Equation 1.3. The partitioned SS is
= ∑ (Y − Y ) = y ′ y − yyy
2
L ′ yn=, where
(y ′ y − yyisn)an (n × 1) vector of the mean of Y repeated n times.
Redefining y ′ y = (y ′ y − yyn) to be the mean corrected SSTOTAL , and rede-
fining ′ X′ y = ( ′ X′ y − yyn) to represent the mean corrected SS
MODEL, the
partition of the sums of squares of Equation 1.13 is12
′ X′ y + ′.
y′y = [1.14]
′
RY2 X1 X 2 X q = 1 − [1.15]
y′y
•
or more commonly,
X′ y
′
RY2 X1 X 2… X q =
. [1.16]
y′y
•
For the TMT-B example data of Table 1.1, the mean corrected Total and
′ X′ y = 17758.00 . The fit of the model is
Model SS are y ′ y = 41875.33,
found to be
17758.00
RY2 X1 X 2 X 3 =
•
= .424.
41875.33
12
The uncorrected sum of squares of Y, ∑
Y 2 = y ′ y , contains both the SS associ-
ated with the predictor variables ( β1 ,β 2 ,β q ) and the SS associated with the inter-
cept. The mean corrected SS, y ′ y − yyn disaggregates these two quantities. Rencher
(1998, Sect. 4.3–4.5) gives details of the relationships between uncorrected and
mean-corrected SS.
15
computed from the extra sums of squares approach (Draper & Smith, 1998,
pp. 149–160), which requires evaluating the difference between full and
restricted model R 2 s. Define the full model R 2full as the proportion of varia-
tion in Y accounted for by all the qf predictors in the model, X 1 , X 2 ,, X qf .
Define a restricted model Rrestricted 2
as the proportion of variability in Y
accounted for by a subset of qr < q f predictors, say X 1 , X 2 ,, X qr. Since the
full model R 2full documents the proportion of variability in Y accounted for
by all the predictors and the restricted model Rrestricted 2
represents
the proportion of the variability in Y accounted for by the qr predictors,
the difference between the full and restricted model R 2 s must represent the
unique incremental variation in Y accounted for by those predictors that
are not contained in the restricted model. The difference R 2full − Rrestricted 2
is
the squared semipartial correlation coefficient. ������������������������� Examples of squared semi-
partial correlations for the TMT-B example data are
R 2full − Rrestricted
2
df e .
F(dfh , dfe ) = • [1.18]
1 − R 2full df h
Let qf be the number of predictors in the full model (exclusive of the unit
vector X 0 ), let qr be the number of predictors in the restricted model, and
let df h = qf − qr and df e = n − qf −1. If it is assumed that εi ~ N (0,σ 2 ) , then
the test statistic in Equation 1.18 follows the F distribution with qf − qr and
n − qf −1 degrees of freedom. The numerator of the left-most ratio of the
F-test is the definition of the squared semipartial correlation. The nature of
Rrestricted
2
will be dictated by the hypothesis to be tested since the hypothesis
dictates the constraints to be placed on the full model. If the hypothesis on
the whole model H 0 :β1 = β 2 = = β q f = 0 is desired,13 the restricted model
will contain only β 0 with Rrestricted 2
= 0 , leading to the numerator
R full − Rrestricted = R full . A test of a hypothesis on a single regression coeffi-
2 2 2
13
This is equivalent to the hypothesis H 0 :ρY2 • X1 X 2 X q f = 0.
14
The value of k need not be hypothesized to be 0; any theoretically defensible value
of k is permissible.
17
where
SS ERROR ,
MSE =
n − qf −1
SSxj is the sum of squares of the predictor variable involved in the test, and
1
1 − RX2 j other
•
is the variance inflation factor (VIF) that adjusts for the multicollinearity
among the predictor variables. For the TMT-B example, the F-test
on the whole model R 2 = .424 is F(3, 36 ) = 8.84, p < .001 . The test of
the significance of each of the individual partial regression coefficients
for age, education, and gender yielded, respectively, values of
t(36) = 4.74, p < .001; t(36) = − 1.15, p = .258; and t(36) = .57, p = .575. Only
the age variable is uniquely related to TMT-B performance. The t-test sta-
tistics on the individual coefficients are the F that would have been
obtained by the full- versus restricted-model approach of Equation 1.18.
The results of the test of hypotheses on the values of β j and on the values
of their respective partial and semipartial correlations are identical.
β0
0 1 0 0 β1 0
0 0 1 0
β1 β
H 0 : L = β2 = 2 = 0 . [1.20]
β
0 0 0 1 q f 0
β q f
18
( )
−1
)′ L ( X′ X)−1 L′
SS HYPOTHESIS = (L )
(L [1.22]
Under the assumption that the errors of the model are normally distrib-
uted, F will follow the F distribution on df h = c and df e = n − qf −1 degrees
of freedom.
The Test of the Whole Model Hypothesis
β 1 β 2 β 3 0 and ρY X X X = 0
2
•
1 2 3
For the TMT-B example data, we found the estimated regression coeffi-
cients to be
β 0
65.69
β1 0.92
= =
β 2 −1.87
−4.68
β3
19
and we desire a test the hypothesis that parameters for X 1 , X 2 , and X 3 are
simultaneously equal to 0, H 0 :β1 = β 2 = β 3 = 0 . This statement is also a test
of H 0: ρY2 ⋅ X1 X 2 X 3 = 0. The general linear test of the full model hypothesis is
given in L
β 0
0 1 0 0 β1 0
L = 0 0 1 0 = β 2 = 0 ,
β1
β
0 0 0 1 2 β 3 0
β3
which ignores the intercept. For the contrast matrix L, the inverse of the
sum of squares and cross-products matrix among the three predictor vari-
the hypothesis SS of
ables, X′ X −1 , and the estimates of the parameters
Equation 1.22 is
65.69 ′
0 1 0 0 0.92
SS HYPOTHESIS = 0 0 1 0
−1.87
0 0 0 1
−4.68
−1
40 2, 339 504
−1
18 0 0 0
0 1 0 0
0 0 1 0 2, 339 155,103 29, 097 1, 059 1 0 0
504 29, 097 6, 614 221 0 1 0
0 0 0 1
18 1059 221 18 0 0 1
65.69
0 1 0 0 0.92
0 0 1 0
−1.87
0 0 0 1
−4.68
In this model, age is the only significant contributor to the prediction of TMT-B.
The test statistic on the any unstandardized β j is also the test of the sig-
nificance of the standardized β * and the semipartial correlation rY X |X X .
( j 1 2 )
The test of hypotheses on sets of predictors is also identical for unstandard-
ized and standardized partial regression coefficients and the multiple semi-
partial correlations associated with each set. These equivalences no longer
hold when more complex hypotheses are tested by the general linear test.
which gives the basis for evaluating the SS HYPOTHESIS and the numerator of
the F-test. Substituting the estimates β j into Equation 1.22 we find,
65.69
= [0 1 −1 0] .92 = [ −.95]
L
1.87
−4.68
( )
−1
and L ( X′ X) L′ = 259.53 with ′ = 24117.33. The F-test of Equation
−1
1.23 is
234.71 36 = 0.32, p = .573.
F(1,16 ) = •
24117.33 1
15
Age has a positive relationship to TMT-B; performance deteriorates with increas-
ing age. Conversely, TMT-B has a negative relationship with increasing years of
education. Contrasts between regression coefficients are sensitive to both magni-
tude and direction and a choice must be made between testing differences in mag-
nitude only, or testing differences in both magnitude and direction. Theoretical
considerations based on substance knowledge should be brought to bear to make
this choice. For the age versus education comparison illustrated here, only the mag-
nitude of the effect is of interest. Reversing the scoring of the education variable
equates the sign of both age and education coefficients; hence the contrast is one of
magnitude and not direction. If there is theoretical justification to leave the signs of
the regression coefficients in the original scoring of age and education, then a test
of both magnitude and direction would result. The F-test on this contrast is F(1,16 ) =
3.01, p = .091, still a nonsignificant result.
22
Figure 1.3
Comparison of Unstandardized Partial Slopes, Standardized
Partial Slopes, and Semipartial Correlations
150 150
100 100
TMT-B
TMT-B
50 50
b = .919 b = −1.870
beta = .608 beta = −.148
semipartial r = .599 semipartial r = −.145
semipartial r-square = .359 semipartial r-square = .021
0 0
−2 −1 0 1 2 6 4 2 0 −2 −4
Age adjusted for Education and Gender Education (reverse) adjusted for Age and Gender
16
The scoring of the education variable was also reversed in this analysis to con-
strain the sign of each standardized slope to a positive value. The contrast is there-
fore a test of the difference in magnitude of semipartial correlations.
23
0.61
L = [1 −1 0] 0.15 = .460,
*
−0.07
SS HYPOTHESIS = 3.397, SS ERROR = 22.461,17 and F(1, 36) = 5.44, p = .025 . The
standardized parameter estimates differ significantly by the hypothesis test
applied to standardized coefficients. The reason for the differing results is
a consequence of the differences in the scales of measurement of the pre-
dictor variables; it can be shown that the jth standardized coefficient is a
ratio of its semipartial correlation to the square root of the proportion of
variation in X j not accounted for by the remaining predictors X j , (e.g.,
tolerance) in the full model, that is,
rY
( X j |X j ′ ) .
β *j =
1 − R 2j .other
17 *′ Z′ Z = ( n − 1) (1 − R 2
The error sum of squares in standard score form is ZY′ ZY − B X Y Y X1 X
•
* ′ Z′ Z = ( n − 1) (1 − R 2
ZY − B .
X Y •Y X1 X 2 X 3 )
18
The test of the differences between two standardized regression coefficients from
a regression analysis is defined as
β 1* − β *2
t=
(
MSE LR −XX1 L′ )
24
1 − RY2 X1 X 2 X 3
MSE =
•
.
n − qf −1
Substituting the definitions
rY ( X1 |X 2 )
β 1* =
1 − R12.2
and
rY ( X 2 |X1 )
β *2 =
1 − R12.2
into t sets the numerator to
rY ( X1 |X 2 ) − rY ( X 2 |X1 )
.
1 − R12.2
MSE 2 (1 + r12 )
.
1 − R12.2
25
Recounting these details here sets the stage for the generalization of
these same analytic concepts to those instances where more than one dependent
variable is to be analyzed simultaneously. Models with p > 1 response variables
are classified as multivariate models that can be treated with the same four-step
process—the specification of the multivariate model, estimation of its parame-
ters, identifying measures of strength association, and defining appropriate tests
of significance. We pursue these topics in the chapters that follow.
rY ( X1 |X 2 ) − rY ( X 2 |X1 ) .
t=
MSE ( 2(1 + r12 ))
Hence, the test of the hypothesis β1* − β*2 = 0 is a test of the differences between
semipartial correlation coefficients. In this interpretation, approximately 36% of the
variance in TMT-B is accounted for by age while about 2% of the variance in
TMT-B is accounted for by education. The absolute values of the two correlations
are significantly different from one another, while the absolute values of the two
unstandardized slopes do not differ significantly. The difference between unstand-
ardized rates of change is being masked by differences in variance of the predictors.