Lda - Sas
Lda - Sas
Course Notes
Longitudinal Data Analysis with Discrete and Continuous Responses Course Notes was developed by
Mike Patetta. Additional contributions were made by Chris Daman and Jill Tao. Editing and production
support was provided by the Curriculum Development and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks
of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and
product names are trademarks of their respective companies.
Longitudinal Data Analysis with Discrete and Continuous Responses Course Notes
Copyright © 2017 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States
of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior
written permission of the publisher, SAS Institute Inc.
Book code E70979, course code LWLONG42/LONG42, prepared date 21Mar2017. LWLONG42_001
ISBN 978-1-63526-093-9
For Your Information iii
Table of Contents
Exercises..................................................................................................1-45
Exercises..................................................................................................2-34
Exercises..................................................................................................2-57
Exercises..................................................................................................2-84
Demonstration: Models with Random Effects and Serial Correlation .............. 2-115
Exercises................................................................................................ 2-122
Exercises................................................................................................ 2-148
Exercises..................................................................................................3-57
Exercises..................................................................................................3-81
Exercises................................................................................................ 3-115
To learn more…
For information about other courses in the curriculum, contact the SAS
Education Division at 1-800-333-7660, or send e-mail to [email protected].
You can also find this information on the web at
http://support.sas.com/training/ as well as in the Training Course Catalog.
For a list of other SAS books that relate to the topics covered in this
course notes, USA customers can contact the SAS Publishing Department
at 1-800-727-3228 or send e-mail to [email protected]. Customers outside
the USA, please contact your local SAS office.
Also, see the SAS Bookstore on the web at
http://support.sas.com/publishing/ for a complete list of books and a
convenient order form.
Chapter 1 Introduction to
Longitudinal Data Analysis
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-3
Objectives
3
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
4
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Chapter 1 Introduction to Longitudinal Data Analysis
The objectives of longitudinal data analysis are to examine and compare responses over time.
The defining feature of a longitudinal data model is its ability to study changes over time within subjects
and changes over time between groups. For example, longitudinal models can estimate individual-level
(subject-specific) regression parameters and population-level regression parameters.
Longitudinal data sets differ from time series data sets because longitudinal data usually consist of a large
number of a short series of time points. However, time series data sets usually consist of a single, long
series of time points (Diggle, Heagerty, Liang, and Zeger 2002). For example, the monthly average
of the Dow Jones Industrials Index for several years is a time series data set, and the efficacy of a drug
treatment over time for several patients is a longitudinal data set.
Cross-Sectional Analysis
5
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
To illustrate the ability of longitudinal models to study changes over time, consider cross -sectional studies
in which a single outcome is measured for each subject. In the slide above where each point represents
one subject, blood pressure appears to be positively related to age. However, you can reach no
conclusions regarding blood pressure changes over time within subjects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-5
Longitudinal Analysis
6
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Now expand the cross-sectional study of baseline data to a longitudinal study with repeated measurements
over time. The baseline data still show a positive relationship between blood pressure and age. However,
now you can distinguish changes over time within subjects from differences among subjects at their
baseline or initial starting values. Cross-sectional models cannot make this distinction (Diggle, Heagerty,
Liang, and Zeger 2002).
7
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Chapter 1 Introduction to Longitudinal Data Analysis
An example of a longitudinal study is the Baltimore Longitudinal Study of Aging (Shoc k et al. 1984).
This is a multidisciplinary observational study where participants return approximately every two years
for three days of biomedical and psychological examinations. One objective of the study is to look for
markers, which can detect prostate cancer at an early stage. One marker with this potential is the prostate-
specific antigen (PSA), which is an enzyme produced by both normal and cancerous prostate cells.
Its level is related to the volume of prostate tissue. However, an elevated PSA level is not necessarily an
indicator of prostate cancer because patients with benign prostatic hyperplasia can also have increased
PSA levels. Therefore, researchers have hypothesized that the rate of change in the PSA level might
be a more accurate method of detecting prostate cancer in the early stages of the disease. A longitudinal
model can address this hypothesis.
Another example of a longitudinal study is the Indonesian children’s health study (Sommer 1982).
In this study more than 3000 children had quarterly medical exams for up to six visits to assess whether
they suffered from respiratory or diarrheal infection and xerophthalmia. One objective of the study was
to determine whether children who had a vitamin A deficiency were at increased risk of respiratory
infection.
9
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Special methods of statistical analysis are needed for longitudinal data because the set of measurements
on one subject tends to be correlated, measurements on the same subject close in time tend to be more
highly correlated than measurements far apart in time, and the variances of longitudinal data often change
with time. These potential patterns of correlation and variation might combine to produce a complicated
covariance structure. This covariance structure must be taken into account to draw valid statistical
inferences. Therefore, standard regression and ANOVA models might produce invalid results because two
of the parametric assumptions (independent observations and equal variances) might not be valid.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-7
Variance-Covariance Matrix
for OLS Regression
Subject X Y
1 4 10 2
0
2 2 7 2
3 6 12 2
0
4 8 11 2
10
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
To illustrate the differences between longitudinal data models and other types of models, consider
the variance-covariance matrix of the response variable for cross-sectional data. If you have four
observations, you would have a 4-by-4 variance-covariance matrix with the variances on the main
diagonal and the covariances (a measure of how two observations vary together) on the off-diagonals.
In linear regression with continuous responses, the assumptions are that the responses have equal
variances and are independent. Therefore, the variances along the diagonal are equal and the covariances
along the off-diagonals are 0.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Chapter 1 Introduction to Longitudinal Data Analysis
Longitudinal Data
1 4 10 6 6
2 2 7 5 3
3 6 12 9 8
4 8 11 14 16
11
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
With longitudinal data, there are now multiple measurements taken on each subject. You not only can
examine the differences between subjects, but you can also examine the change within subjects across
time. There are still only four subjects and the response is continuous. How does this change your
variance-covariance matrix?
Variance-Covariance Matrix
for Longitudinal Data
Subject T ime X Y
1 1 4 10
1
1
2
3
4
4
6
6
V1 0
2 1 2 7
2 2 2 5 V2
2 3 2 3
3 1 6 12
3 2 6 9 V3
3 3 6 8
4
4
4
1
2
3
8
8
8
11
14
16
0 V4
12
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-9
For longitudinal data models fit in the MIXED procedure, the number of observations is not the
number of subjects but rather the number of measurements taken on all the subjects. Because there are
three repeated measurements on each subject, you now have 12 observations and a 12-by-12 variance-
covariance matrix. For a simple longitudinal model, the matrix is now a block-diagonal matrix in which
the observations within each block (the block corresponds to a subject) are assumed to be correlated and
the observations outside of the blocks are assumed to be independent. In other words, the subjects are still
assumed to be independent of each other and the measurements within each subject are assumed to
be correlated.
Effect on
Time-Independent Predictor Variables
Ignoring Positive
Correlation
Accounting for
Positive Correlation
ˆ
13
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
If the observations are positively correlated, which often occurs with longitudinal data, then the variances
of the time-independent predictor variables (variables that estimate the group effect or between-subject
effect such as gender, race, treatment, and so on) are underestimated if the data are analyzed
as if the observations are independent. In other words, the Type I error rate (rejecting the null hypothesis
when it is true, also known as a false positive) is inflated for these variables (Dunlop 1994).
Details
Dunlop (1994) shows that the variance of the time-independent predictor variable is
2 2
(1 ) where is the correlation between the errors within the subject. If the observations
n
are positively correlated within subject, then the variance of the time-independent predictor variable will
be underestimated if the data are analyzed as if all observations are independent.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Chapter 1 Introduction to Longitudinal Data Analysis
Effect on
Time-Dependent Predictor Variables
Accounting for
Positive Correlation
Ignoring Positive
Correlation
ˆ
14
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
For time-dependent predictor variables (variables that measure the time effect or within-subject effect
such as how the measurements change over time), ignoring positive correlation leads to a variance
estimate that is too large. In other words, the Type II error rate (failing to reject the null hypothesis when
it is false, also known as a false negative) is inflated for these variables (Dunlop 1994). Because
the variances of the group effects will be underestimated and the variance of the time effects will
be overestimated if positive correlation is ignored, it is evident that correlated outcomes must
be addressed to obtain valid analyses.
Details
Dunlop (1994) shows that the variance of the time-dependent predictor variable is
2 2
(1 ) and in this situation, ignoring the positive correlation will lead to a variance estimate that
n
is too large.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-11
The linear mixed model fit by PROC MIXED will be used in this course. The
strengths are as follows:
• Handles unbalanced data with unequally spaced time points and subjects
observed at different time points
• Uses all the complete time measurements in the analysis
• Directly models the covariance structure
• Provides valid standard errors and efficient statistical tests
15
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The linear mixed model allows a very flexible approach to modeling longitudinal data. The data structure
is the number of observations equals the number of measurements for all the subjects. This means that
the data do not have to be balanced.
An advantage of fitting linear mixed models is that PROC MIXED uses all the complete time
measurements in the analysis. This method differs from complete case analysis in which any observation
with a missing value across any of the time measurements is dropped from the analysis. The method
PROC MIXED uses, called a likelihood-based ignorable analysis, leads to a valid analysis when the
missing data are MAR (missing at random, which is a less restrictive assumption than missing completely
at random (MCAR)). If the probability of missing for a variable X is related to the values of X itself, even
after controlling for the other variables, then the value is not missing at random (NMAR). In other words,
the probability of missing depends on the unobserved values. The ignorable analysis is not valid and more
complex modeling is required (Verbeke and Molenberghs 1997).
PROC MIXED offers a wide variety of covariance structures. This enables the user to directly address
the within-subject correlation structure and incorporate it into a statistical model. By selecting a
parsimonious covariance model that adequately accounts for within-subject correlations, the user can
avoid the problems associated with univariate and multivariate ANOVA using PROC GLM (Littell,
Stroup, and Freund 2002).
A value is missing at random (MAR) if the probability that it is missing on a variable X is related to some
other measured variable (or variables) in the model but does not depend on any unobserved data after
controlling for the observed data. With the MAR assumption, a systematic relationship exists between one
or more measured variables and the probability of missing data. MAR is sometimes referred
to as ignorable missing since the missing data mechanism can be ignored and does not need to be taken
into account as part of the modeling process.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Chapter 1 Introduction to Longitudinal Data Analysis
A value is missing completely at random (MCAR) if the probability that it is missing is independent
of the unobserved values. The formal definition of MCAR requires that the probability of missing data
on a variable X is unrelated to the values of X itself. In other words, the observed data values are a simple
random sample of the values that you would have observed if the data had been complete.
Model-Building Strategies
16
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The first step in any model-building process is to conduct a thorough exploratory data analysis.
For longitudinal data this involves plotting the individual measurements over time and fitting a smoothing
spline over time. Plotting different groups over time and illustrating cross -sectional and longitudinal
relationships are also important steps in exploratory data analysis.
The second step is to fit a complex mean model in PROC MIXED and output the ordinary least squares
residuals. These residuals can be used to create a sample variogram, and the pattern in the sample
variogram can be helpful in selecting a covariance structure.
The third step is to fit the linear mixed model in PROC MIXED using the selected covariance structure.
Eliminating unnecessary terms and fitting a parsimonious model are important steps in the model building
process. After a candidate model is selected, the final steps of the model building process are to evaluate
model assumptions and to identify potential outliers.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-13
17
17
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Objectives
21
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Chapter 1 Introduction to Longitudinal Data Analysis
Recommendations
22
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The first step in any model-building process is exploratory data analysis. In this step you create graphs
that expose the patterns relevant to the scientific question. The recommendations on the slide above,
given by Diggle, Heagerty, Liang, and Zeger (2002), are used to produce the graphs in this section
and the section on diagnostics.
Individual Profiles
23
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-15
A scatter plot of the response versus time is a useful graph. Connecting the repeated measurements
for each subject over time shows you whether there is a discernible pattern common to most subjects.
These individual profiles can also provide some information about between-subject variability.
Individual Profiles
24
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
For example, the slide above is a graph of weight over time for several subjects. These individual profiles
illustrate several important patterns (Diggle, Heagerty, Liang, and Zeger 2002).
1. All of the subjects are gaining weight.
2. The subjects that are the heaviest at the beginning of the study tend to be the heaviest throughout
the study.
3. The variability of the measurements is smaller at the beginning of the study compared
to the end of the study.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Chapter 1 Introduction to Longitudinal Data Analysis
Group Profiles
25
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Besides plotting the response over time, it is also useful to include different subgroups on the same graph
to illustrate the relationship between the response and an explanatory variable over time. For example,
in the slide above it appears that both males and females have decreasing blood pressures over time.
However, the slope for the males seems to be more pronounced than the slope for the females, which
might indicate an interaction between gender and time.
CD4+ Cell
Numbers
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-17
Example: The human immune deficiency virus (HIV) causes AIDS by attacking an immune cell called
the CD4+ cell, which facilitates the body’s ability to fight infection. An uninfected person has
approximately 1100 cells per milliliter of blood. Because CD4+ cells decrease in number from
the time of infection, a person’s CD4+ cell count can be used to monitor disease progression.
A subset of the Multicenter AIDS Cohort Study (Kaslow et al. 1987) was obtained for 369
infected men to examine CD4+ cell counts over time. The data are stored in a SAS data set
called long.aids.
These are the variables in the data set:
CD4 CD4+ cell count.
time time in years since seroconversion (time when HIV becomes detectable).
age in years relative to arbitrary origin.
cigarettes packs of cigarettes smoked per day.
drug recreational drug use (1=yes, 0=no).
partners number of partners relative to arbitrary origin.
depression CES-D (a depression scale).
id subject identification number.
The data were obtained with permission from Professor Peter Diggle’s website.
27
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Chapter 1 Introduction to Longitudinal Data Analysis
The researchers hope to characterize the typical time course of CD4+ cell depletion. This information can
clarify the relationship between HIV and the immune system, which might be helpful when counseling
infected men.
This observational longitudinal study has unbalanced data because the measurements can occur anytime
and the number of measurements can vary across subjects. The linear mixed model using PROC MIXED
is the model of choice for this analysis.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-19
Example: Examine the data in long.aids by producing a line listing report and descriptive statistics for
the numeric variables. Also produce graphs of the individual profiles and the group profiles.
/* long01d01.sas */
options nodate;
proc print data=long.aids(obs=17);
var id cd4 time age cigarettes drug partners depression;
title 'Line Listing of CD4+ Data';
run;
The variable age is a time-independent variable and the variables cigarettes, drug, partners,
and depression are time-dependent variables. The data are unbalanced because the subjects are measured
at different time points and the number of measurements is different across subjects.
proc means data=long.aids n min max mean median std;
var cd4 time age cigarettes drug partners depression;
title 'Descriptive Statistics for CD4+ Data';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-20 Chapter 1 Introduction to Longitudinal Data Analysis
The outcome variable is CD4 with a range of 10 to 3184. The variable time has a range of nearly 3 years
before seroconversion (time when HIV becomes detectable) to 5.5 years after seroconversion.
The variable age is in years relative to an arbitrary origin. The variable cigarettes measures the number
of packs smoked per day with a range of 0 (non-smokers) to 4 (heavy smokers). The variable drug
is a binary variable where 1 means the subject used recreational drugs since the time of the last CD4+
measurement. The mean of 0.76 shows that 76% of the observations had indicated the usage
of recreational drugs since their last CD4+ cell count measurement. The variable partners measures
the number of partners relative to an arbitrary origin. The variable depression is a measure of depressive
symptoms where a higher score indicates greater depressive symptoms.
To compute descriptive statistics aggregated by subject, an output data set must be created in the MEANS
procedure aggregated by subject. A DATA step is used to create a new variable called druguse that
indicates whether the subject used recreational drugs at any time during the study period.
proc means data=long.aids noprint nway;
class id;
var cd4 age cigarettes drug partners depression;
output out=subject mean=avgid_cd4 avgid_age
avgid_cigarettes avgid_drug avgid_partners avgid_depression;
run;
data subject;
set subject;
druguse=(avgid_drug gt 0);
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-21
The _FREQ_ variable indicates that 369 subjects participated in the study. The average number
of repeated measures was 6.4 with a range of 1 to 12. The average of the average CD4+ cell count was
773.77 with one subject having the lowest average of 245.6 and another subject having the highest
average of 1979.5. The variable druguse indicates that 90% of the participants used recreational drugs
during the study period. The average of the average depression score was 2.5 with one subject having
the lowest average of –6.8 and another subject having the highest average of 40.5.
To create a graph of individual profiles, plot CD4 versus time by subject identification number.
proc sgplot data=long.aids nocycleattrs noautolegend;
series y=cd4 x=time / group=id
lineattrs=(color=blue pattern=1);
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 3500 by 500) label='CD4 Cell Counts';
title 'Individual Profiles of the CD4+ Data';
run;
Selected PROC SGPLOT statement options:
CYCLEATTRS | NOCYCLEATTRS specifies whether plots are drawn with unique attributes in the
graph. By default, the SGPLOT procedure automatically assigns
unique attributes in many situations, depending on the types of
plots that you specify. If the plots do not have unique attributes
by default, then the CYCLEATTRS option assigns unique
attributes to each plot in the graph. The NOCYCLEATTRS
option prevents the procedure from assigning unique attributes.
NOAUTOLEGEND disables automatic legends from being generated. By default,
legends are created automatically for some plots, depending
on their content. This option has no effect if you specify
a KEYLEGEND statement.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-22 Chapter 1 Introduction to Longitudinal Data Analysis
LINEATTRS= style-element specifies the appearance of the series line. You can specify the
appearance by using a style element or by using suboptions.
If you specify a style element, you can in addition specify
suboptions to override specific appearance attributes.
The individual profiles plot is essentially useless. This is a common problem when there are many
subjects in a data set.
A more meaningful plot is an overlay plot of the individual profiles and the average trend. Therefore,
a smoothed line is fitted using a penalized B-spline curve. The individual profiles serve as a light
background and the average trend is a dark line in the foreground. This strategy was suggested by Tufte
(1990) where communication of statistical information is enhanced by adding detail in the background.
proc sgplot data=long.aids nocycleattrs noautolegend;
series y=cd4 x=time / group=id transparency=0.5
lineattrs=(color=cyan pattern=1);
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-23
NKNOTS= specifies the number of evenly spaced internal knots. The default is 100.
Selected LINEATTRS= option:
THICKNESS= specifies the thickness of the line. You can also specify the unit of measure.
The default unit is pixels.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-24 Chapter 1 Introduction to Longitudinal Data Analysis
The average trend shows that the CD4+ cell count is fairly constant around 1000 but drops off around
the time of seroconversion. The rate of CD4+ cell loss seems to be more rapid immediately after
seroconversion. The relationship between CD4+ cell counts and time seems to be cubic in nature.
To highlight the group profiles, smoothed lines representing different subgroups were fitted. The first
graph is by recreational drug usage.
proc format;
value druggrp 0='no recreational drug use'
1='recreational drug use';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-25
There seems to be very little difference in the trends of the two recreational drug groups.
To define the cigarette usage subgroups, collapse the levels of cigarettes into three groups. Observations
with no cigarette usage are in group 1, observations with one to two packs smoked per day are in group 2,
and observations with three to four packs smoked per day are in group 3. The new variable ciggroup is
added to long.aids for further use.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-26 Chapter 1 Introduction to Longitudinal Data Analysis
data long.aids;
set long.aids;
ciggroup=1*(cigarettes=0)+2*(0<cigarettes<=2)+3*(2<cigarettes<=4);
run;
proc format;
value cgroup 1='non-smoker'
2=1 to 2 packs per day'
3='3 or more packs per day';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-27
There seems to be a difference between the cigarette usage subgroups. Heavy cigarette users seem to have
a much more rapid rate of CD4+ cell loss compared to non-smokers. The difference in the smoothed lines
might indicate a time by cigarette interaction.
To define the age subgroups, collapse the levels of age into four groups. Observations in the first quartile
are in group 1, observations in the second quartile are in group 2, observations in the third quartile are in
group 3, and observations in the fourth quartile are in group 4.
proc rank data=long.aids groups=4 out=ageranks;
var age;
ranks agegroup;
run;
proc format;
value quartile 0='1st quartile'
1='2nd quartile'
2='3rd quartile'
3='4th quartile';
run;
RANKS names the group indicators in the OUT= data set. If the RANKS statement is omitted,
then the group indicators replace the VAR variables in the OUT= data set.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-28 Chapter 1 Introduction to Longitudinal Data Analysis
There seems to be no difference between the four age groups with regard to the trend of CD4+ cell loss.
To define the number of partner subgroups, collapse the levels of partners into four groups.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-29
There seems to be no difference between the four partner groups with regard to the trend of CD4+ cell
loss.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-30 Chapter 1 Introduction to Longitudinal Data Analysis
To define the depression subgroups, collapse the levels of depression into four groups.
proc rank data=long.aids groups=4 out=depressranks;
var depression;
ranks depressgroup;
run;
There seems to be no difference between the four depression groups with regard to the trend of CD4+ cell
loss.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-31
29
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
One of the recommendations in exploratory data analysis is to display both the cross -sectional and
longitudinal relationships between the response variable and the time-dependent explanatory variables.
The cross-sectional relationship can be displayed by a scatter plot of the baseline (or initial value) CD4+
cell count versus the baseline explanatory variable values. The longitudinal relationship can be displayed
by a scatter plot of the change in CD4+ cell counts (Y at time t – Y at time 1) versus the change in the
explanatory variable values (X at time t – X at time 1). Fitting a smooth curve in the scatter plot can
indicate whether there is evidence of a relationship (Diggle, Heagerty, Liang, and Zeger 2002).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-32 Chapter 1 Introduction to Longitudinal Data Analysis
Example: Create five cross-sectional scatter plots of the baseline CD4+ cell counts versus the baseline
of age, recreational drug use, cigarettes, depression score, and number of partners. Also create
four longitudinal scatter plots of the change in CD4+ cell counts versus the change in
recreational drug use, cigarettes, depression score, and number of partners. Fit a penalized
B-spline curve in the plots with continuous covariates and a regression line in the plots with
binary covariates.
/* long01d02.sas */
data aids1 aids2;
set long.aids;
by id;
retain basecd4 basedrug basedepress basecig basepart;
if first.id then
do;
basecd4=cd4;
basedrug=drug;
basedepress=depression;
basecig=cigarettes;
basepart=partners;
output aids1;
end;
if not first.id then
do;
chngcd4=cd4-basecd4;
chngdrug=drug-basedrug;
chngdepress=depression-basedepress;
chngcig=cigarettes-basecig;
chngpart=partners-basepart;
output aids2;
end;
run;
The first step is to create the baseline variables (in aids1) and the difference from time t and baseline
variables (in aids2). Because the data are sorted by subject id and time, BY-group processing is used
to identify the first observation by subject id. The first observation is used to assign the baseline variable
values. The RETAIN statement is used to retain the baseline values across the executions of the DATA
step. These baseline values are used to create the change in CD4+ cell count and the change in the
covariates.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-33
There seems to be a slight positive relationship between baseline CD4+ cell counts and the age of
the patient.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-34 Chapter 1 Introduction to Longitudinal Data Analysis
There seems to be a slight positive relationship between baseline CD4+ cell count and baseline
recreational drug use.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-35
There seems to a slight upward trend between the baseline values of CD4+ cell counts and the baseline
values of the depression scores.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-36 Chapter 1 Introduction to Longitudinal Data Analysis
The graph shows a positive relationship between the baseline values of CD4+ cell counts and the baseline
values of number of packs of cigarettes smoked per day. It seems that heavy smokers have higher CD4+
cell counts than non-smokers. This was also shown in the cigarette subgroup profile plot.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-37
There is little evidence of a relationship between baseline CD4+ cell counts and baseline number
of partners. The uptick in baseline CD4+ at the most extreme left of the plot is due to only one
observation.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-38 Chapter 1 Introduction to Longitudinal Data Analysis
There seems to be no relationship between the change in CD4+ cell counts and the change in recreational
drug use. The only noticeable pattern is that the patients who had no change in their recreational drug use
had the smallest decreases in CD4+ cell counts.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-39
There is some evidence that there is a negative relationship between the change in CD4+ cell counts
and the change in depression score. This implies that a decrease in CD4+ cell counts is associated with
an increase in depression.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-40 Chapter 1 Introduction to Longitudinal Data Analysis
There is some evidence that there is a positive relationship between the change in CD4+ cell counts
and the change in the number of packs smoked per day. This implies that a decrease in CD4+ cell counts
is associated with a decrease in smoking.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-41
There seems to be a strong positive relationship between the change in CD4+ cell counts and the change
in the number of partners. This implies that a decrease in CD4+ cell counts is associated with a decrease
in the number of partners.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-42 Chapter 1 Introduction to Longitudinal Data Analysis
• There seems to be a cubic relationship between CD4+ cell count and time.
• The group profile plots show a time by cigarette usage interaction.
• The cross-sectional plots show a positive relationship between the baseline
CD4+ cell counts and the baseline cigarette usage.
• The longitudinal plots show a positive relationship between the change in
CD4+ cell counts and the change in the number of partners.
31
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Careful exploratory data analysis might help you identify scientifically relevant variables to include in
your candidate model. In the candidate model for PROC MIXED, the exploratory results indicate to at
least include the quadratic and cubic effects of time and the time by cigarette interaction. The results
of the cross-sectional and longitudinal plots might help you understand the degree of heterogeneity across
men in the rate of CD4+ cell count depletion.
SAS Studio
32
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-43
SAS Studio is the new browser-based SAS programming environment that you can use for data
exploration and analysis. A tutorial on SAS Studio can be found at:
https://support.sas.com/training/tutorial/studio/get-started.html.
Interactive Mode
Some SAS procedures, such as PROC GLM, are interactive. That means they
remain active until you submit a QUIT statement, or until you submit a new
PROC or DATA step.
In SAS Studio, you can use the code editor to run these procedures, as well as
other SAS procedures, in interactive mode.
33
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Some procedures, such as GLM, are interactive, meaning they remain active until you submit a QUIT
statement, or a new PROC or DATA step. You can run these procedures interactively in SAS Studio using
the code editor. However, you must enable the interactive mode by using the icon shown above to activate
the interactive mode.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-44 Chapter 1 Introduction to Longitudinal Data Analysis
34
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Running SAS Studio in interactive mode starts a new SAS session. This means that library references and
macro variables must be defined for each new session. More information can be found at the SAS Studio
documentation: http://support.sas.com/software/products/sasstudio/#s1=2
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-45
Exercises
hours time point heart rate was recorded (0.01677, 0.08333, 0.25000, 0.5000, 1.000)
baseline baseline heart rate.
a. Submit the program long00d01.sas. Print the first 25 observations in the data set long.heartrate.
Then create an output data set using PROC MEANS with the mean heart rate for each patient by
drug. Also include the baseline heart rate in the data set. Then generate descriptive statistics for
the mean patient heart rate and baseline by drug.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-46 Chapter 1 Introduction to Longitudinal Data Analysis
36
36
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Special methods of statistical analysis are needed for longitudinal data because the set of measurements
on one subject tend to be correlated, measurements on the same subject close in time tend to be more
highly correlated than measurements far apart in time, and the variances of longitudinal data often change
with time. These potential patterns of correlation and variation might combine to produce a complicated
covariance structure. This covariance structure must be taken into account to draw valid statistical
inferences. Therefore, standard regression and ANOVA models might produce invalid results because two
of the parametric assumptions (independent observations, equal variances) might not hold.
If the observations are positively correlated, which often occurs with longitudinal data, then the variance
of the time-independent predictor variables are underestimated if the data are analyzed as if the
observations are independent. In other words, the Type I error rate is inflated for these variables.
For time-dependent predictor variables, ignoring positive correlation leads to a variance estimate that is
too large. In other words, the Type II error rate is inflated for these variables.
The linear mixed model allows a flexible approach to modeling longitudinal data. The linear mixed model
handles unbalanced data with unequally spaced time points and subjects observed at different time
points
uses all the available data in the analysis
directly models the covariance structure
provides valid standard errors and efficient statistical tests.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Chapter Summary 1-47
The first step in any model-building process is exploratory data analysis. In this step you create graphs
that expose the patterns relevant to the scientific question. General recommendations are
graph as much of the relevant raw data as possible
highlight aggregate patterns of potential scientific interest
identify both cross-sectional and longitudinal patterns
identify unusual individuals or observations.
A meaningful plot is an overlay plot of the individual profiles and the average trend. A smoothed line
representing the average trend can be fitted using a spline routine. The individual profiles can serve as a
light background and the average trend can be a dark line in the foreground.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-48 Chapter 1 Introduction to Longitudinal Data Analysis
1.4 Solutions
Solutions to Exercises
1. Conducting an Exploratory Data Analysis
A pharmaceutical firm conducted a clinical trial to examine heart rates among patients. Each patient
was subjected to one of three possible drug treatment levels: drug a, drug b, and a placebo. A baseline
measurement was taken and the heart rates were recorded at five unequally spaced time intervals: 1
minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour. The data are stored in the SAS data set
long.heartrate.
hours time point heart rate was recorded (0.01677, 0.08333, 0.25000, 0.5000, 1.000)
baseline baseline heart rate.
a. Submit the program long00d01.sas. Print the first 25 observations in the data set long.heartrate.
Then create an output data set using PROC MEANS with the mean heart rate for each patient by
drug. Also include the baseline heart rate in the data set. Then generate descriptive statistics for
the mean patient heart rate and baseline by drug.
proc print data=long.heartrate(obs=25);
title 'Line Listing of Heart Rate Data';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-49
1 201 p 92 0.01667 76
2 201 p 92 0.08333 84
3 201 p 92 0.25000 88
4 201 p 92 0.50000 96
5 201 p 92 1.00000 84
6 202 b 54 0.01667 58
7 202 b 54 0.08333 60
8 202 b 54 0.25000 60
9 202 b 54 0.50000 60
10 202 b 54 1.00000 64
11 203 p 84 0.01667 86
12 203 p 84 0.08333 82
13 203 p 84 0.25000 84
14 203 p 84 0.50000 86
15 203 p 84 1.00000 82
16 204 a 72 0.01667 72
17 204 a 72 0.08333 68
18 204 a 72 0.25000 68
19 204 a 72 0.50000 78
20 204 a 72 1.00000 72
21 205 b 80 0.01667 84
22 205 b 80 0.08333 84
23 205 b 80 0.25000 96
24 205 b 80 0.50000 92
25 205 b 80 1.00000 72
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-50 Chapter 1 Introduction to Longitudinal Data Analysis
N
drug Obs Variable N Minimum Maximum Mean Median
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
a 8 avgpat_heart 8 56.0000000 91.2000000 77.5000000 76.8000000
baseline 8 60.0000000 100.0000000 80.7500000 81.0000000
p 8
avgpat_heart 8 66.8000000 88.4000000 78.8500000 81.6000000
baseline 8 68.0000000 102.0000000 84.7500000 86.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
N
drug Obs Variable Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
a 8 avgpat_heart 12.1245913
baseline 12.8257553
b 8 avgpat_heart 10.5536723
baseline 14.9642431
p 8
avgpat_heart 7.9840913
baseline 11.2090525
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2) Patients on the placebo have the highest baseline mean while patients on drug b have the
highest mean heart rate. The differences are relatively small across treatment groups.
b. Generate an individual profiles plot with an average trend line using PROC SGPLOT. Use 50 as
the smoothing factor with 5 knots in the PBSPLINE statement.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-51
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-52 Chapter 1 Introduction to Longitudinal Data Analysis
18
18
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
37
37
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 2 Longitudinal Data
Analysis with Continuous Responses
Exercises............................................................................................................. 2-84
Exercises........................................................................................................... 2-122
Exercises........................................................................................................... 2-148
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-3
Objectives
3
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
y
where y is the vector of observed responses
4
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The general linear mixed model is an extension of the general linear model. The standard linear
regression model, which is used in the GLM procedure, models the mean of the response variable
by using the regression parameters. The random errors are assumed to be independent and normally
distributed with a mean of 0 and a common variance. If the parametric assumptions are valid (other than
the normality assumption), then the estimated regression parameters are the best linear unbiased estimates
(BLUE).
y
where
is the design matrix of random variables
5
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The general linear mixed model extends the general linear model by the addition of random effect
parameters and by allowing a more flexible specification of the covariance matrix of the random errors.
For example, general linear mixed models allow for both correlated error terms and error terms with
heterogeneous variances. The matrix Z can contain continuous or dummy predictor variables, just like
the matrix X. The name mixed model indicates that the model contains both fixed-effect parameters and
random-effect parameters.
In the longitudinal model proposed by Diggle, Heagerty, Liang, and Zeger (2002), it is assumed that
the error terms have a constant variance and can be decomposed as
i (1)i (2)i
where
(1)i
is the measurement error reflecting the variation added by the measurement process.
(2)i
is the error associated with the serial correlation in which times closer together are more
correlated than times farther apart.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-5
If you assume that the measurement errors have an independent covariance structure ( ), then you
2
should concern yourself only with covariance structures that reflect the serial correlation.
Fixed Effects
6
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Variable effects are either fixed or random depending on how the levels of the variables that appear in the
study are selected. For example, the above slide represents a clinical trial analyzing the effectiveness of
three drugs. If the three drugs are the only candidates for the clinical trial and the conclusions of the
clinical trial are restricted to just those three drugs, then the effect of the variable drug is a fixed effect.
A
Drug B Fixed Effect
C
Levels represent only
a random sample of a larger
7 set of potential levels.
18
Clinic
23 Interest is in drawing
inferences that are valid for
41 the complete population of
levels.
7
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
However, suppose the clinical trial was performed in four clinics and the four clinics are a sample from a
larger population of clinics. The conclusions of the clinical trials are not only restricted to the four clinics
but rather to the population of clinics. The appropriate model in this study is a general linear mixed model
with drug as a fixed-effect variable and clinic as a random-effect variable.
MIXED Procedure
8
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC MIXED is used to model linear mixed models. The procedure also provides you with
the flexibility of modeling not only the means of your data, but the variances and covariances as well.
Selected MIXED procedure statements:
CLASS specifies the classification variables to be used in the analysis. The CLASS statement
must precede the MODEL statement.
MODEL specifies the response variable (one and only one) and all the fixed effects, which
determine the X matrix of the mixed model. The MODEL statement is required and only
one is allowed with each invocation of PROC MIXED.
RANDOM defines the random effects, which determine the Z matrix of the mixed model.
The random effects can be categorical or numeric, and multiple RANDOM statements are
possible. When random intercepts are needed, you must specify INTERCEPT (or INT)
as a random effect. The covariance structure of the random effects corresponds
to the G matrix.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-7
REPEATED specifies the R matrix in the mixed model. If no repeated statement is specified, then
R is assumed to have the independent covariance structure. The repeated effect defines
the ordering of the repeated measurements within each subject. If no repeated effect
is specified, then the repeated measures data must be similarly ordered for each subject.
All missing response variable values must be indicated with periods in the input data set
unless they all fall at the end of a subject’s repeated response profile. The repeated effect
must contain only classification variables. Furthermore, the levels of the repeated effect
must be different for each observation within a subject.
• Random effects and error terms are normally distributed with means of 0.
• Random effects and error terms are independent of each other.
• The relationship between the response variable and predictor variables is
linear.
9
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
A nonlinear mixed model can be used when modeling a process that follows a more general nonlinear
relationship. Nonlinear mixed models can be fit in the NLMIXED procedure. Note that polynomial
models do not belong to nonlinear models. Polynomial models are still linear in the parameters
of the mean function. Nonlinear models refer to the nonlinear relationship between the response variable
and the fixed effect parameters (in other words, Y 1e 2 x1 ).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
10
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
12
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Estimation is more difficult in the mixed model than in the general linear model. Not only do you have
fixed effects as in the general linear model, but you also have to estimate the random effects,
the covariance structure of the random effects, and the covariance structure of the random errors.
Ordinary least squares is no longer the best method because the distributional assumptions regarding
the random error terms are too restrictive. In other words, the parameter estimates are no longer the best
linear unbiased estimates.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-9
Notice that EGLS requires the knowledge of G and R. Because you rarely have this information, the goal
becomes finding a reasonable estimate for G and R.
14
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The parameters of the covariance matrices G and R must be estimated. After they are estimated, they are
substituted in place of the true parameter values in G and R to compute estimates of and V ( ) .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
15
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The maximum likelihood estimation method finds the parameter estimates that are most likely to occur
given the data. The parameter estimates are derived by maximizing the likelihood function, which
is a mathematical expression that describes the joint probability of obtaining the data expressed
as a function of the parameter estimates.
PROC MIXED implements two likelihood-based methods, maximum likelihood (ML) and restricted
maximum likelihood (REML), to estimate the parameters in G and R. The difference between ML and
REML is the construction of the likelihood function. REML constructs the likelihood based on residuals
and obtains maximum likelihood estimates of the variance components from this restricted/residual
likelihood function. However, the two methods are asymptotically equivalent and often give very
similar results.
Details
PROC MIXED constructs an objective function associated with ML or REML and maximizes it over all
unknown parameters. The corresponding log likelihood functions are as follows:
1 1 n
ML : l (G, R) log | V | r 'V 1r log 2
2 2 2 ,
1 1 1 n p
REML : lR (G, R) log | V | log | X 'V 1 X | r 'V 1r log 2
2 2 2 2 ,
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-11
ML versus REML
• Both are based on the likelihood principle, which has the properties of
consistency, asymptotic normality, and efficiency.
• REML corrects for the downward bias in the ML parameters in G and R.
• REML handles strong correlations among the responses more effectively.
• REML is less sensitive to outliers in the data compared to ML.
• The differences between ML and REML estimation increase as the number
of fixed effects in the model increases and the number of subjects
decreases.
16
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The distinction between ML and REML becomes important only when the number of fixed effects
is relatively large. In that case, the comparisons unequivocally favor REML. First, REML copes much
more effectively with strong correlations among the responses for the subjects than does ML. Second,
REML estimates do not have the downward bias that ML estimates have because REML estimators take
into account the degrees of freedom from the fixed effects in the model. Finally, REML estimators are
less sensitive to outliers in the data than ML estimators. In fact, when the estimates do vary substantially,
Diggle, Heagerty, Liang, and Zeger favor REML (2002).
There is also the noniterative MIVQUE0 method, which performs minimum variance quadratic unbiased
estimation of the covariance parameters. However, Swallow and Monahan (1984) present simulation
evidence favoring REML and ML over MIVQUE0. MIVQUE0 is generally not recommended except for
situations when the iterative REML and ML methods fail to converge and it is necessary to obtain
parameter estimates from a fitted model.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Why is ordinary least squares not the preferred estimation method for fixed
effects in general linear mixed models?
a. Ordinary least squares does not support random effects.
b. Ordinary least squares does not support correlated error terms.
c. Ordinary least squares does not support nonnormal distribution of error
terms.
d. Both a and b.
17
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
R1
0
R2
R3
0 R4
19
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC MIXED requires that the data be structured so that each observation represents the measurement
for a subject at only one moment in time. Therefore, if Subject A had five repeated measurements, Subject
A would have five observations. An ID variable is needed to link the repeated measurements to the
subjects, and a time variable is needed to order the repeated measurements within each subject.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-13
With repeated measures data using the SUBJECT= option in the REPEATED statement, the matrix R has
a block-diagonal covariance structure where the block corresponds to the covariance structure for each
subject. The observations within each block can take on a variety of covariance structures while the
observations outside of the blocks are assumed to be independent. In PROC MIXED, the blocks must
have the same structure but can have different parameter estimates.
20
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The validity of the statistical inference of the general linear mixed model depends on the covariance
structure that you select for R. Therefore, a large amount of time spent on building the model is spent
on choosing a reasonable covariance structure for R.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
2
1
0
2
2
Time
Point
2
3
0 2
4
21
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The simplest covariance structure is the independent or variance component model, where the within-
subject error correlation is zero. This is the default structure for both the RANDOM and REPEATED
statements. For the between-subject errors, the simple covariance structure might be a reasonable
assumption. However, for the within-subject errors, the simple covariance structure might be a reasonable
choice if the repeated measurements occurred at long enough intervals so that the correlation is
effectively zero relative to other variation.
Compound Symmetry
Time Point
1 2 3 4
1.0 1
1.0 2
2 Time
Point
1.0 3
1.0 4
22
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-15
The covariance structure with the simplest correlation model is the compound symmetry structure.
It assumes that the correlation ( ) is constant regardless of the distance between the time points. This
is the assumption that univariate ANOVA makes, but it is usually not a reasonable choice in longitudinal
data analysis. However, this covariance structure might be reasonable when the repeated measurements
are not obtained over time. For example, the compound symmetry covariance structure might be a good
choice if the independent experimental units were classrooms and the responses obtained were from each
student in the classroom (Davis 2002).
23
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Unstructured Covariance
Time Point
1 2 3 4
1
2
12 13 14 1
2
2
23 24 2
Time
3
2
34 3
Point
2
4
4
25
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The unstructured covariance structure is parameterized directly in terms of variances and covariances
where the observations for each pair of times have their own unique correlations. The variances are
constrained to be nonnegative and the covariances are unconstrained. This is the covariance structure used
in multivariate ANOVA.
12
12
12 * 2 2
There are two potential problems with using the unstructured covariance. First, it requires the estimation
t (t 1)
of a large number of variance and covariance parameters ( ). This can lead to severe computational
2
problems, especially with unbalanced data. Second, it does not exploit the existence of trends in variances
and covariances over time, and this can result in erratic patterns of standard error estimates (Littell et al.
1998). If a simpler covariance structure is a reasonable alternative, then the unstruc tured covariance
structure wastes a great deal of information, which would adversely affect efficiency and power.
Although the unstructured covariance structure does not require equal spacing among the time points,
the structure is not appropriate for the R matrix in the CD4+ cell count example because the spacing
between time points is different across subjects. For example, the time interval between the first and
second measurements for Subject 1 might be different from the time interval for Subject 2. The time
interval between measurements can be different within the subjects (time between first and second
measurements can be different from time between second and third measurements), but the time interval
between specific measurements (first and second, second and third, and so on) must be the same across all
subjects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-17
1.0 2 3 1
1.0 2 2
Time
2
Point
1.0 3
4
1.0
26
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The first-order autoregressive covariance structure takes into account a common trend in longitudinal
data; the correlation between observations is a function of the number of time points apart. In this
structure, the correlation between adjacent observations is , regardless of whether the pair
of observations is the first and second pair, the second and third pair, and so on. The correlation is 2 for
any pair
of observations two units apart, and for any pair of observations d units apart. Notice that the AR(1)
d
model requires only estimates for just two parameters, 2 and , whereas the unstructured models
(1 T )* T
require estimates for parameters (where T is the number of time points). One shortcoming
2
is that the correlation decays very quickly as the spacing between measurements increases (Davis 2002).
The assumption in the AR(1) model is that the longitudinal data are equally spaced (Littell et al. 1996).
This means that the distance between time 1 and 2 is the same as time 2 and 3, time 3 and 4, and
so on. The AR(1) structure also assumes that the correlation structure does not change appreciably over
time (Littell et al. 2002). Therefore, the AR(1) structure might not be appropriate for the CD4+ cell study
because the repeated measures are unequally spaced.
In some circumstances the AR(1) model might be justified empirically where the observations are not
evenly spaced. When the adjoining observations show similar covariances, despite unequal time periods,
with exponentially decreasing covariances for increasingly separated measurement time points, then
the AR(1) model might be warranted (Brown and Prescott 2001). However, these circumstances are
unlikely for the CD4+ cell study.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Toeplitz
Time Point
1 2 3 4
1.0 1 2 3 1
1.0 1 2 2
Time
2
Point
1.0 1 3
1.0 4
27
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The Toeplitz covariance structure is similar to the AR(1) covariance structure in that the pairs
of observations separated by a common distance share the same correlation. However, observations d
units apart have correlation d instead of . The Toeplitz structure requires the estimation of T
d
As with the AR(1) structure, the Toeplitz structure assumes that the observations are equally spaced
and the correlation structure does not change appreciably over time (Littell et al. 2002). Therefore,
the Toeplitz covariance structure is not an appropriate structure for the CD4+ cell study.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-19
Spatial Power
Time Point
1 2 3 4
1.0 t t t t t t 1 2 1 3 1 4
1
t t
1.0 t t 2 3 2 4
2
Time
2
Point
t t
1.0 3 4
3
1.0 4
28
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Covariance structures that allow for unequal spacing are the spatial covariance structures. These
structures are mainly used in geostatistical models, but they are very useful for unequally spaced
longitudinal measurements where the correlations decline as a function of time. The connection between
geostatistics and longitudinal data is that the unequally spaced data can be viewed as a spatial process
in one dimension (Littell et al. 1996).
The spatial power structure provides a direct generalization of the AR(1) structure for equally spaced data.
Only two parameters are estimated ( 2 and ).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Spatial Gaussian
Time Point
1 2 3 4
t t 2 t t 2 t t 2
1 2 1 3 1 4
1.0 2
2
2
1
e e e
t t 2 t t 2
2 3 2 4 2
2 2
1.0 e
e
2
Time
t t 2
Point
3 4
1.0 2 3
e
1.0 4
29
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The spatial Gaussian structure is a frequently used covariance structure for unequally spaced
measurements. The difference between the spatial covariance structures is the assumptions made on how
the correlation between the error terms decreases as the length of the time interval increases. To determine
which correlation function is the best fit for your data, the sample variogram (which will be discussed
in a later section) could be used.
Other spatial structures used later in the course include spatial linear, spatial exponential,
and spatial spherical.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-21
30
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
32
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC MIXED provides the following methods for estimating the approximate denominator degrees
of freedom: containment, between-within, residual, Satterthwaite’s, and KENWARDROGER.
KENWARDROGER is considered by many to be the most appropriate for longitudinal models.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-22 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Kenward-Roger DF Adjustment
33
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The Kenward-Roger degrees of freedom adjustment uses an approximation that involves inflating
the estimated variance-covariance matrix of the fixed and random effects. Satterthwaite-type degrees
of freedom are then computed based on this adjustment. By default, the observed information matrix
of the covariance parameter estimates is used in the calculations.
The KENWARDROGER method uses more computer resources. It can take a long time and
extensive memory for large data sets.
In a simulation study performed by Guerin and Stroup (2000), the Kenward-Roger degrees
of freedom adjustment was shown to be superior or, at worst, equal to the Satterthwaite and
default DDFM options. They strongly recommend the KR adjustment as the standard operating
procedure for longitudinal models.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-23
Kenward-Roger DF Adjustment
34
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
For covariance structures that have nonzero second derivatives with respect to the covariance parameters,
the Kenward-Roger covariance matrix adjustment includes a second-order term. This term can result
in standard error shrinkage. Also, the resulting adjusted covariance matrix can then be indefinite
and is not invariant under reparameterization.
The following are examples of covariance structures that generally lead to nonzero second
derivatives: First-order antedependence (TYPE=ANTE(1)), First-order autoregressive
(TYPE=AR(1)), Heterogeneous AR(1) (TYPE=ARH(1)), First-order autoregressive
moving average (TYPE=ARMA(1,1)), Heterogeneous CS (TYPE=CSH), Factor-Analytic
(TYPE=FA), No diagonal Factor-Analytic ( TYPE=FA0( )),Heterogeneous Toeplitz
(TYPE=TOEPH), Unstructured Correlations (TYPE=UNR), and all Spatial covariance structures
(TYPE=SP()).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-24 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
DDFM=KR(FIRSTORDER)
35
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The FIRSTORDER suboption of the DDFM=KR option is recommended for the spatial covariance
structures because these covariance structures generally lead to nonzero second derivatives.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-25
Example: Fit a longitudinal model to the long.aids data set. Rescale the response variable by dividing
CD4 by 100. Include all the two-factor interactions with time and the time quadratic and cubic
effects. Use the Kenward-Roger degrees of freedom calculations and use the compound
symmetry covariance structure.
/* long02d01.sas */
data aids;
set long.aids;
cd4_scale=cd4/100;
run;
A common recommendation is to rescale the response and explanatory variables if they have relatively
large values compared to the other variables in the model. This creates a more stable model and decreases
the likelihood of convergence problems in PROC MIXED. Because the response variable CD4 has
relatively large values, a new rescaled variable was created. If time were measured in days, then that
variable would also be rescaled.
/* The program below assumes the data is sorted by id and time */
proc mixed data=aids;
model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*partners time*drug time*cigarettes
time*time time*time*time
/ solution ddfm=kr;
repeated / type=cs subject=id r rcorr;
title 'Longitudinal Model with Compound Symmetry '
'Covariance Structure';
run;
Selected MODEL statement options:
DDFM=KR performs the degrees of freedom calculations proposed by Kenward and Roger (1997).
SOLUTION requests estimates for all fixed effects in the model, together with the standard errors,
t-statistics, and p-values.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-26 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
TYPE= specifies the covariance structure for the error components. The default structure is
the simple or variance components structure.
When the subject’s identification number is treated as continuous, PROC MIXED considers
a record to be from a new subject whenever the value of the identification number changes from
the previous record. Therefore, you should first sort the data by the values of the identification
number if they are not already sorted. The long.aids data set is sorted by ID. Using a continuous
ID variable reduces the execution time for models with a large number of subjects.
No repeated effects are specified in the REPEATED statement because the data are similarly
ordered within each subject and there are no missing time values. If the measurements were not
similarly ordered within subject, then the time variable would have to be used as the repeated
effect. If there were missing measurements, then you must indicate all missing response variable
values with periods in the data set unless they all fall at the end of the subject’s response profile.
This requirement is necessary in order to inform PROC MIXED of the proper location of the
observed repeated responses.
Repeated effects must be classification variables, so you could use two versions of the time
variable. A continuous time could be used in the MODEL statement as well as the RANDOM
statement, and a classification time could be used in the REPEATED statement.
Longitudinal Model with Compound Symmetry Covariance Structure
Model Information
The Model Information table shows the name of the data set, the dependent variable, the covariance
structure used in the model, the subject effect, the estimation method to compute the parameters for the
covariance structure, and the method to compute the degrees of freedom. The default estimation method
is REML. The METHOD= option can be used in the PROC MIXED statement to specify other estimation
methods.
There are four methods for handling the residual variance in the model. The profile method factors out
the residual variance out of the optimization problem, whereas the fit method retains the residual variance
as a parameter in the optimization. The factor method keeps the residual fixed, and none is displayed
when a residual variance is not a part of the model. The NOPROFILE option in the PROC MIXED
statement changes the method based on the chosen covariance structure.
The fixed effects standard error method describes the method used to compute the approximate standard
errors for the fixed-effects parameter estimates and related functions of them. The default method can be
changed using the EMPIRICAL option in the PROC MIXED statement. This option requests robust
standard errors obtained from using the sandwich estimator, which has been shown to be consistent
as long as the mean model is correctly specified. However, if there are any missing observations, the
EMPIRICAL option provides only valid inferences for the fixed effects under the MCAR assumption.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-27
The EMPIRICAL option is not used here because it cannot be used with the Kenward-Roger degrees
of freedom calculation.
Dimensions
Covariance Parameters 2
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12
Number of Observations
The Dimensions table lists the sizes of the relevant matrices. This table can be useful in determining CPU
time and memory requirements. The Number of Observations table shows the number of observations
read, used, and not used. Because there are no missing observations, all the observations are used.
Iteration History
0 1 12668.04910184
1 2 11846.03145506 0.00000217
2 1 11846.02324942 0.00000000
The Iteration History table describes the optimization of the residual log likelihood. The minimization
is performed using a ridge-stabilized Newton-Raphson algorithm, and the rows of the table describe
the iterations that this algorithm takes in order to minimize the objective function.
Estimated R Matrix for Subject 1
Because the R option is used in the REPEATED statement, the residual covariance matrix is displayed for
the first subject by default. The diagonal shows the variance while the off-diagonals show the covariances.
Estimated R Correlation
Matrix for Subject 1
The RCORR option displays the correlation matrix for the first subject. The estimated correlation among
the measurements is 0.4820. The correlations are the same regardless of which pair of measurements is
examined because the compound symmetry covariance structure was requested.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-28 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
CS id 5.7939
Residual 6.2259
The Covariance Parameter Estimates table shows the parameter estimates for the compound symmetry
covariance structure. In this example, the estimated covariance is 5.7939 and the estimated residual
variance is 6.2259. Adding the values together gives the estimated variance (12.0198).
Fit Statistics
The Fit Statistics table provides information that you can use to select the most appropriate covariance
structure. Akaike’s Information Criterion (AIC) (Akaike 1974) penalizes the –2 residual log likelihood
by twice the number of covariance parameters in the model. The smaller the value is, the better the model
is. The finite-sample version of the AIC (AICC) is also included. It is recommended for small sample
sizes to use the AICC rather than the AIC. The Schwarz’s Bayesian Information Criterion (BIC) (Schwarz
1978) also penalizes the –2 residual log likelihood, but the penalty is more severe. Therefore, BIC tends
to choose less complex models than AIC or AICC.
Null Model Likelihood Ratio Test
1 822.03 <.0001
The Null Model Likelihood Ratio Test table shows a test that determines whether it is necessary to model
the covariance structure of the data at all. The test statistic is –2 times the log likelihood from the null
model (model with an independent covariance structure) minus –2 times the log likelihood from the fitted
model. The p-value can be used to assess the significance of the model fit.
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-29
The SOLUTION option in the MODEL statement requested a table for the fixed effects parameter
estimates. Notice that the quadratic and cubic time effects are significant (which agrees with the average
trend curve of the CD4+ cell count) and the time*age and time*cigarettes interactions are significant.
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
The Type 3 Tests of Fixed Effects table shows the hypothesis tests for the significance of each of the fixed
effects. A p-value is computed from an F distribution with the numerator and denominator degrees
of freedom. You can use the HTYPE= option in the MODEL statement to obtain tables of Type I
(sequential) tests and TYPE II (adjusted) tests in addition to or instead of the table of TYPE III (partial)
tests. You can also use the CHISQ option to obtain Wald chi-square tests of the fixed effects.
Example: Fit a longitudinal model using the spatial power covariance structure and use FIRSTORDER
suboption in the Kenward-Roger degrees of freedom adjustment. Request the covariance
matrix and the correlation matrix for the 13 th subject.
proc mixed data=aids;
model cd4_scale=time age cigarettes drug partners depression
time*age time*depression time*partners time*drug
time*cigarettes time*time time*time*time
/ solution ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id r=13 rcorr=13;
title 'Longitudinal Model with Spatial Power Covariance Structure';
run;
Selected REPEATED statement option:
LOCAL adds a measurement error component to the serial correlation component.
This option is useful when you model a time series covariance structure.
The variable time in the TYPE= option is used to calculate the time differences between repeated
measurements.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-30 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Model Information
Dimensions
Covariance Parameters 3
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12
Number of Observations
Iteration History
0 1 12668.04910184
1 3 11883.08815296 0.32992483
2 1 11881.79852820 0.00348677
3 2 11864.84042331 0.10490545
4 2 11801.90993395 2.88713335
5 2 11734.85393060 0.00204795
6 2 11731.57580732 0.00054912
7 1 11729.33587289 0.00001849
8 1 11729.26578521 0.00000003
9 1 11729.26567357 0.00000000
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-31
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
The Estimated R Matrix table shows the residual covariance matrix for the 13 th subject who had 12
repeated measurements.
Estimated R Correlation Matrix for Subject 13
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-32 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Estimated R Correlation
Matrix for Subject 13
The Estimated R Correlation Matrix table shows the correlation matrix for the 13 th subject. Notice how
the correlation coefficients decrease as the time interval increases.
Covariance Parameter Estimates
Variance id 7.8554
SP(POW) id 0.8554
Residual 4.3300
The estimated correlation coefficient used in the spatial power covariance structure is 0.8554.
The LOCAL option adds an additional variance parameter (labeled “Variance”). The parameter labeled
“Residual” represents the measurement error.
Fit Statistics
The AIC and BIC values are lower than the model using the compound symmetry covariance structure
(11850.0 versus 11735.3 for the AIC).
Null Model Likelihood Ratio Test
2 938.78 <.0001
The model with the spatial power covariance structure is significantly different from the model with
the independent covariance structure.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-33
Standard
Effect Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
The inferences for the fixed effects in the model using the spatial power covariance structure are similar
to the model using the compound symmetry covariance structure.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-34 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Exercises
A pharmaceutical firm conducted a clinical trial to examine heart rates among patients. Each patient was
subjected to one of three possible drug treatment levels: drug a, drug b, and a placebo. A baseline
measurement was taken and the heart rates were recorded at five unequally spaced time intervals:
1 minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour. The data are stored in the SAS data set
long.heartrate.
These are the variables in the data set:
heartrate heart rate
hours time point heart rate was recorded (0.01677, 0.08333, 0.25000, 0.5000, 1.000)
baseline baseline heart rate.
1. Fitting a General Linear Mixed Model
a. Fit a general linear mixed model with the three main effects, the three two-factor interactions,
and the quadratic and cubic effects of hours. Request the parameter estimates and the Kenward-
Roger method for computing the degrees of freedom. In the REPEATED statement, request
the unstructured covariance structure and the R matrix along with the correlations computed from
the R matrix.
1) Is the unstructured covariance structure legitimate in this example?
2) What does the R matrix represent?
3) What does the R correlation matrix represent? What is the general pattern among
the correlations?
4) Interpret the results of the null likelihood ratio test.
5) Are there any higher-order terms significant at the 0.05 level?
b. Fit the same model but with the compound symmetry covariance structure.
1) Is the compound symmetry covariance structure legitimate in this example?
2) Why is the AICC statistic much lower for the model with the compound symmetry covariance
structure compared to the model with the unstructured covariance structure?
3) Are there differences in the inferences for the fixed effects compared to the model with
the unstructured covariance structure? What is a possible reason for these differences?
c. Fit the same model but with the spatial power covariance structure. Because you are using
the spatial power covariance structure, add a measurement error component, and use
the FIRSTORDER suboption.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-35
2) Why is the AICC statistic lower for this model compared to the model with compound
symmetry covariance structure and the model with the unstructured covariance structure?
3) Are there differences in the inferences for the fixed effects compared to the model with
the compound symmetry covariance structure? What is a possible reason for these
differences?
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-36 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
38
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Objectives
42
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-37
Covariance structures
• model all the variability in the data, which cannot be explained by the fixed
effects
• represent the background variability that the fixed effects are tested against
• must be carefully selected to obtain valid inferences for the parameters of
the fixed effects.
43
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Obtaining valid inferences in a mixed model is much more complex than in a general linear model.
For example, inferences are obtained in the GLM procedure by testing the fixed effects against the error
variance (residual variance). However, in PROC MIXED the inferences are obtained by testing the fixed
effects against the appropriate background variability, which is modeled by the covariance structure. This
background variability might consist of several sources of error, so selecting the appropriate covariance
structure is not a trivial task.
Sources of Error
Random Effects Reflects how much subject-specific profiles deviate
from the average profile, or the between-subject
variability.
Serial Correlation Is usually a decreasing function of the time separation
between measurements. This represents the within
subject variability.
Measurement Error For some measurements, there might be a certain level
of variation in the measurement process itself.
44
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-38 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Longitudinal models usually have three sources of random variation. The between-subject variability
is represented by the random effects. The within-subject variability is represented by the serial
correlation. The correlation between the measurements within subject usually depends on the time
interval between the measurements and decreases as the length of the interval increases. A common
assumption is that the serial effect is a population phenomenon independent of the subject. Finally, there
is potentially also measurement error in the measurement process.
The covariance structure that is appropriate for your model is directly related to which component
of variability is the dominant component. For example, if the serial correlation among the measurements
is minimal, then the random effects will probably account for most of the variability in the data and
the remaining error components will have a very simple covariance structure. Diggle, Heagerty, Liang,
and Zeger (2002) believe that in most applications, the serial correlation is very often dominated by the
combination of random effects and measurement error. Furthermore, Chi and Reinsel (1989) found that
models with random effects and serial correlation might sometimes over-parameterize the covariance
structure because the random effects are often able to represent the serial correlations among the
measurements. They conclude that methods for determining the best combination of serial correlation
components and random effects are an important topic that deserves further consideration.
However, suppose the autocorrelation among the measurements is relatively large, and the between-
subject variability not explained by the fixed effects is relatively small. Then choosing the appropriate
serial correlation function in the covariance structure becomes important.
In this course, serial correlation will be used to describe correlation structures that allow
the correlations to change over time.
• Select a covariance structure that best fits the true covariance of the data.
• Create a scatter plot called the sample variogram.
• Use likelihood ratio tests to test whether adding parameters to the
covariance structure causes a statistically significant improvement in the
model.
• Compare models based on measures of fit that are adjusted for the number
of covariance parameters.
45
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-39
Because the covariance structure models the variability not explained by the fixed effects, selecting the
appropriate mean model is critical. For models dealing with data collected in an experiment, a saturated
model is usually recommended. However, for models dealing with observational data, saturated models
are not feasible. Therefore, it is important to include all the important main effects and interactions.
The choice of the covariance structure should be consistent with the empirical correlations. Examining a
plot of the autocorrelation function of the residuals might be useful for this purpose when you have
equally spaced data that are approximately stationary. (The residuals have constant mean and variance
and the correlations depend only on the length of the time interval.) However, the aids data set has
irregularly spaced data that might not be stationary. The variogram is an alternative function that
describes the association among repeated measurements and is easily estimated with irregular observation
times (Diggle 1990).
Likelihood ratio tests can be used to compare covariance structures provided that the same mean model is
fitted and the covariance parameters are nested. Nesting of covariance parameters occurs when the
covariance parameters in the simpler model can be obtained by restricting some of the parameters in the
more complex model. For example, a compound symmetry structure is nested within a Toeplitz structure,
but is not nested within an AR(1) structure. It is recommended to compare simple structures to more
complex structures, and the complex structures should be accepted only if they lead to a significant
improvement in the likelihood (Brown and Prescott 2001).
You can also use the information criteria (such as the AIC and BIC) produced by PROC MIXED as a tool
to help you select the most appropriate covariance structure. The smaller the information criteria value,
the better the model.
1
vijk (rij rik ) 2
2
by
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-40 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The variogram is used extensively in the field of geostatistics, which is concerned primarily with the
estimation of spatial variation. In longitudinal data analysis, the empirical counterpart of the variogram
is called the sample variogram. The data values in the sample variogram are calculated from the observed
half-squared differences between pairs of residuals, where the residuals are ordinary least squares
residuals based on the mean model, and the corresponding time differences. The vertical axis
in the variogram represents the residual variability within subject over time.
The scatter plot also contains a smoothed nonparametric curve, which estimates the general pattern
in the sample variogram. This curve can be used to decide whether the mixed model should include serial
correlation. If a serial correlation component is warranted, then the fitted curve can be used in selecting
the appropriate serial correlation function.
Process Variance
1
2 ij lk
( r r ) 2
with il
47
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The process variance, ˆ 2 , is estimated as the average of all half-squared-differences of the residuals
1
(rij rlk )2 , with i l (i and l are subscripts for subject, and j and k are subscripts for time points).
2
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-41
Autocorrelation Function
(u) 1 (u) / 2
48
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The autocorrelation function can be estimated from the sample variogram by the formula
(u) 1 (u) / 2 , where (u )is the average of the observed half-squared differences between pairs
of residuals corresponding to that particular value of u. With highly irregular sampling times, the averages
for the sample variogram might be estimated by fitting a nonparametric curve.
49
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-42 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Example Calculations
Subject Time Response Residual
1 1 4 2
1 3 5 -1
1 4 9 1
51
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
To illustrate how the data values in a variogram are calculated, consider the above slide. The variogram
(2 (1))2 9
value for the first comparison of the first time point to the second time point is 4.5 ,
2 2
with a time interval of 1 3 2 .
Example Calculations
Subject Time Residual
1 1 2
1 3 -1
1 4 1
2 1 -2
2 2 3
52
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-43
The value of the variance calculation that compares the first residual for Subject 1 to the first residual for
(2 (2)) 2
Subject 2 is 8.
2
(u) w2 (1 (u))
Var( ) w
2
Process
Serial Variance
Correlation
Time Interval
53
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The fitted nonparametric curve in the sample variogram can also be used to determine which error
components need to be addressed in the covariance structure. For example, suppose the variance
of the error terms is due only to within-subject variability. The corresponding variogram, (u ) with
u representing the time interval, would be based on the autocorrelation function. At a time interval
of 0, the autocorrelation is 1 and (u ) 0 . As the time interval approaches infinity, the autocorrelation
approaches 0 and (u ) approaches the process variance. Typically, (u ) is an increasing function
of u because the autocorrelation is positive and decreases as the time interval increases.
Sample variograms are better mechanisms to examine serial correlation compared to autocorrelation
functions created from the CORR procedure because the nonparametric smoothing of the variogram
recognizes the scarcity of the data at the higher time intervals and incorporates information from the
sample variogram at smaller time intervals. In comparison, autocorrelation functions might become very
unstable with sparse data and give a misleading impression about the serial correlation. Furthermore,
the autocorrelation function is most effective for studying equally spaced data that are approximately
stationary. Autocorrelations are more difficult to estimate with irregularly spaced data unless you round
the observation times, in other words, rounding the CD4+ observation time values to the nearest year
(Diggle, Heagerty, Liang, and Zeger 2002).
( ) 2 (1 ( ))
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-44 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Process
Variance
Serial
Correlation
Measurement
Error
Time Interval
54
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In some situations the measurement process introduces a component of random variation. Now
the variance of the error terms includes not only the within-subject variability, but also the measurement
error. A characteristic property of models with measurement error is that (u ) does not tend
to 0 as u tends to 0. If the data include duplicate measurements at the same time, then you can estimate
the measurement error directly as one-half the average squared differences between such duplicates.
In the CD4+ example, there are no duplicate measurements within subject. Therefore, the estimation
of the measurement error involves the extrapolation of the nonparametric curve, and this estimate
of the measurement error might be strongly model-dependent (Diggle, Heagerty, Liang, and Zeger 2002).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-45
Process
Random Variance
Effects
Serial
Correlation
Measurement
Error
Time Interval
55
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In some situations the model might include all three components of error. Now the variance of the error
terms includes the within-subject variability, the between-subject variability, and the measurement error.
The corresponding variogram has the same form as the variogram for the model with serial correlation
and measurement error. However, as the time interval approaches infinity, (u ) approaches a value less
than the variance of the error terms (which is approximately equal to the estimate of the process variance).
The difference between the plateau of the fitted line and the process variance is the error pertaining
to between-subject variability or random effects.
Therefore, the sample variogram can indicate whether the model fitted in PROC MIXED needs
the LOCAL option (to account for measurement error), a covariance structure that incorporates the serial
correlation, and/or a RANDOM statement to specify random effects. Although serial correlation would
appear to be a natural feature of any longitudinal model, in some situations the serial correlation might
be dominated by the combination of random effects and measurement error. The fitted nonparametric
curve in the sample variogram would have a slope near 0, which would indicate that a covariance
structure incorporating serial correlation would be an unnecessary refinement of the model (Diggle,
Heagerty, Liang, and Zeger 2002). A covariance structure such as compound symmetry would
be sufficient.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-46 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
If serial correlation is evident in the sample variogram, two popular choices of covariance structures
for unequally spaced longitudinal data are the spatial exponential structure, which incorporates
the exponential serial correlation function, and the spatial Gaussian structure, which incorporates
the Gaussian serial correlation function. However, precise characterization of the serial correlation
function is extremely difficult in the presence of several random effects. You should not ignore
the possible presence of any serial correlation, because this might result in less efficient model-based
inferences.
Verbeke and Molenberghs (2000) suggest that including serial correlation, if present, is more
important than correctly specifying the serial correlation function. They recommend that your
efforts should be in the detection of serial correlation, rather than specifying the actual shape
of the serial correlation function, which seems to be of minor importance.
What can you conclude if the intercept of the fitted nonparametric curve in
the sample variogram has values much greater than 0?
a. Serial correlation error needs to be addressed in the covariance
structure.
b. Measurement error needs to be addressed in the covariance structure.
c. Random effects error needs to be addressed in the covariance structure.
d. It is irrelevant because the slope of the fitted nonparametric curve
determines the source of the error component.
56
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-47
Sample Variogram
Example: Create a sample variogram with the aids data set. First include the VARIOGRAM
and VARIANCE macros (programs long02d02a.sas and long02d02b.sas). Then use the
VARIOGRAM macro to create the data set varioplot and use the VARIANCE macro to
estimate the process variance. Use PROC SGPLOT to display the sample variogram with a
scatter plot of the variogram values by time interval values as the background and a penalized
B-spline curve in the foreground. Fit a vertical reference line at the process variance.
/* long02d02.sas */
%include ".\long02d02a.sas";
%include ".\long02d02b.sas";
%variogram (data=aids,resvar=cd4_scale,clsvar=,
expvars=time age cigarettes drug partners
depression time*age time*cigarettes time*drug
time*partners time*depression time*time
time*time*time,id=id,time=time,maxtime=12);
%variance(data=aids,id=id,resvar=cd4_scale,clsvar=,
expvars=time age cigarettes drug partners
depression time*age time*cigarettes time*drug
time*partners time*depression time*time
time*time*time,subjects=369,maxtime=12);
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-48 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Because the fitted penalized B-spline curve does not tend toward zero as the time interval tends to zero, the
sample variogram clearly shows that the model has some measurement error (error in the measurement
process itself). Furthermore, the fitted line does not have a slope of zero, which indicates that there is serial
correlation in the model (cd4 cell counts vary over time within subject). The serial correlation function
appears to be relatively linear. Finally, because the fitted line does not reach the process variance, some error
due to random effects is evident in the model (unexplained between-subject variability).
Example: Create a plot of the autocorrelation function using PROC SGPLOT.
data varioplot;
set varioplot;
autocorr=1-(variogram/11.71);
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-49
The graph of the autocorrelation function shows that the correlation within subject decreases from
approximately 0.60 to 0.10 within the range of the data. Therefore, there is error associated with serial
correlation evident in the model and a structure that allows for this decreasing correlation should
be selected.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-50 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Information Criteria
59
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In a simulation study conducted by Guerin and Stroup (2000), several information criteria were compared
in terms of their ability to choose the right covariance structure. In terms of Type I error control, assuming
that the Kenward-Roger (KR) adjustment is used, Guerin and Stroup showed that it is better to err
in the direction of a more complex covariance structure. More complex covariance structures tend to have
inflated Type I error rates only if you fail to use the KR adjustment, while excessively simple covariance
structures have inflated Type I error rates that the degrees of freedom adjustment cannot correct.
However, because complex covariance structures reduce power, erring too far in the direction
of complexity is also not recommended. Guerin and Stroup believe that the AIC is the most desirable
compromise in practice. However, if the sample size is relatively small, the finite-sample corrected
version of AIC, called AICC, might be the most desirable.
Information criteria provide only rules of thumb to discriminate between several models.
These criteria should never be used or interpreted as formal statistical tests of significance.
When comparing several models with the same mean model but with different covariance
structures, use REML as the estimation method.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-51
AR(1) Yes No No
Toeplitz Yes No No
Spatial
Covariance Yes Yes Yes
Structures 60
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
A common recommendation is to graph the information criteria by covariance structure. However, choose
only the covariance structures that make sense given the data. For example, because the aids data set has
unequally spaced time points and different time points across subjects, only compound symmetry
and the spatial covariance structures are appropriate covariance structures. If the time points were equally
spaced, then the AR(1) and Toeplitz covariance structures could have been examined. If the time points
were unequally spaced but had the same time points across subjects, then the unstructured covariance
structure could have been examined.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-52 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Information Criteria
Example: Calculate and plot the AIC, AICC, and BIC information criteria for the models and use the
covariance structures of compound symmetry, spatial power, spatial linear, spatial exponential,
spatial Gaussian, and spatial spherical. The tables in ODS that contain the information criteria
are fitstatistics.
/* long02d03.sas */
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-53
data model_fit;
length model $ 7 type $ 4;
set csmodel (in=cs)
powmodel (in=pow)
linmodel (in=lin)
expmodel (in=exp)
gaumodel (in=gau)
sphmodel (in=sph);
if substr(descr,1,1) in ('A','B');
if substr(descr,1,3) = 'AIC' then type='AIC';
if substr(descr,1,4) = 'AICC' then type='AICC';
if substr(descr,1,3) = 'BIC' then type='BIC';
if cs then model='CS';
if pow then model='SpPow';
if lin then model='SpLin';
if exp then model='SpExp';
if gau then model='SpGau';
if sph then model='SpSph';
run;
The IN= option in the DATA step detects whether the data set contributed to an observation when you
read multiple SAS data sets in one DATA step. The specified variable is a temporary numeric variable
with values of 0 (indicates that the data set did not contribute to the current observation) or 1 (indicates
that the data set did contribute to the current observation). The SUBSTR function extracts from the
variable descr the necessary information to put in the variable type that identifies the information criteria.
proc sgplot data=model_fit;
scatter y=value x=model / group=type;
xaxis label='Covariance Structure';
yaxis values=(11700 to 11900 by 20) label='Model Fit Values';
title 'Model Fit Statistics by Covariance Structure';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-54 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The covariance structures, spatial exponential, spatial linear, spatial power, and spatial spherical, all seem
to have the best fit. The spatial Gaussian model fit statistics are somewhat higher than the other spatial
structures. The compound symmetry covariance structure is clearly inferior. The AIC and AICC values
are identical across covariance structures because of the large sample size. For small sample sizes,
the AICC model fit statistic might be useful.
In the simulation study performed by Guerin and Stroup (2000), most of the gain from modeling
the covariance structures comes from “getting close”. Therefore, there will probably be a trivial impact
on the analysis if any of the four spatial covariance structures with the smallest information criteria are
used. Their simulation study focused on the Type I error rates, where the effects of simplistic covariance
structures tend to be more obvious.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-55
62
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
• Results from the sample variogram indicate that measurement error, serial
correlation, and error associated with random effects are evident in the
model.
• Spatial exponential, spatial linear, spatial power, and spatial spherical all
seem to have the best model fit statistics.
• Spatial power is the selected covariance structure.
64
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-56 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
In conclusion, the sample variogram is a useful graph in the selection of a covariance structure. It is
constructed from the ordinary least squares residuals from a complex mean model. For this model,
the results of the sample variogram clearly show that the LOCAL option is needed in the REPEATED
statement. This option adds an additional variance parameter to the R matrix. The results also show that
serial correlation is evident (meaning that the correlations change over time) and that the pattern seems
to be linear. However, the model fit statistics show that the spatial exponential, spatial linear, spatial
power, and spatial spherical covariance structures all seem to have the best fit. Although the spatial power
covariance structure will be selected, any of the other three spatial covariance structures would be
appropriate.
The results of the sample variogram also show that some error associated with random effects is evident
in the model. Therefore, a RANDOM statement might be needed. Models with RANDOM statements will
be examined in a later section.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-57
Exercises
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-58 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
66
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Objectives
• Illustrate how to specify heterogeneity in the residual covariance parameters.
• Fit a parsimonious mean model.
• Create an interaction plot.
70
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-59
Before After
1 b
T 12
bT bT
13 14
1 a a a
T12T T13 14
1 b b 1 a a
T T23 24 T T23 24
2B 1 bT 34 2A 1 a
T 34
1 1
71
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The linear mixed models presented thus far assume that the covariance parameters are the same across
subgroups of subjects. However, PROC MIXED has the flexibility of allowing heterogeneity in the
covariance parameters across subgroups of subjects. For example, suppose there is evidence that the
variance of the CD4+ cell counts is much greater before seroconversion compared to after seroconversion.
A better fitting model might have covariance parameters defined before seroconversion and after
seroconversion. The covariance structure still remains the same (in this example the spatial power
covariance structure), but the covariance parameters are allowed to change across the two subgroups.
GROUP= Option
72
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-60 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
PROC MIXED allows heterogeneity in the residual covariance parameters with the GROUP= option.
All observations having the same level of the GROUP effect have the same covariance parameters. Each
new level of the GROUP effect produces a new set of covariance parameters with the same structure as
the original group.
T1 B2 B2 BT 12
T2 B2 BT 12
B2
0
T3 A2 A2 AT 34
T4
0 A2 AT 34
A2
73
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The covariance structure for repeated measurements is still a block-diagonal covariance structure where
the block corresponds to the covariance structure for each subject. However, in this example the
GROUP= option now subdivides the block based on the GROUP effect. For example, suppose one
subject had four measurements. Two measurements were before seroconversion and two were after
seroconversion. Furthermore, you define the GROUP effect as the time before and after seroconversion.
The covariance structure within the block for this subject now has variance and covariance parameter
estimates before seroconversion and after seroconversion. For two measurements where one is before
and one is after seroconversion, the covariance is 0. In this example, the GROUP= option indicates
a covariance structure such that observations within subject and with a different GROUP effect value are
assumed to be independent.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-61
Example: Create a plot of the variance of CD4+ cells over time using PROC SGPLOT. Fit a penalized
B-spline curve with a smoothness of 50 and 25 knots. Then create the variable timegroup that
groups the observations into the appropriate time groups. Finally, fit a longitudinal model
in PROC MIXED that allows the covariance parameters to vary by timegroup.
/* long02d04.sas */
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-62 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The graph shows that the variance of CD4+ cells is greater before seroconversion compared to after
seroconversion. This makes sense from a subject matter point of view because healthy people usually
have more variability in their immune cells than unhealthy people. This is a useful graph to create during
your initial data exploration.
data aids;
set aids;
timegroup=1*(time le 0)+2*(time gt 0);
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-63
Model Information
timegroup 2 1 2
Dimensions
Covariance Parameters 5
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12
There are now five covariance parameters being estimated rather than three.
Number of Observations
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-64 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Iteration History
0 1 12668.04910184
1 2 11838.24404707 123.48030790
2 2 11731.67873760 20.79399338
3 2 11675.26706982 2.60777627
4 2 11623.53197659 0.00151927
5 2 11620.88033698 0.00038964
6 1 11619.31012203 0.00001388
7 1 11619.25830218 0.00000002
8 1 11619.25821833 0.00000000
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
1
2
3
4 4.9044 4.6554 4.4007
5 5.1944 4.9307 4.6610
6 5.4950 5.2161 4.9307
7 5.7992 5.5048 5.2037
8 6.1221 5.8113 5.4934
9 6.4764 6.1477 5.8113
10 9.6153 6.4899 6.1349
11 6.4899 9.6153 6.4629
12 6.1349 6.4629 9.6153
PROC MIXED estimates the variance and correlation coefficient for the subjects before seroconversion
and after seroconversion. The variance estimates (15.90 for time group 1 and 9.62 for time group 2) seem
to be quite different across time groups. With timegroup as the GROUP= variable, the measurements
before seroconversion are assumed to be independent of the measurements after seroconversion.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-65
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
Estimated R Correlation
Matrix for Subject 13
1
2
3
4 0.5101 0.4842 0.4577
5 0.5402 0.5128 0.4847
6 0.5715 0.5425 0.5128
7 0.6031 0.5725 0.5412
8 0.6367 0.6044 0.5713
9 0.6736 0.6394 0.6044
10 1.0000 0.6750 0.6380
11 0.6750 1.0000 0.6722
12 0.6380 0.6722 1.0000
The correlations of the measurements within subject after seroconversion are larger.
Fit Statistics
The AIC information criterion is lower than the full model without the group effect (11735.3 to 11629.3).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-66 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
4 1048.79 <.0001
Standard
Effect Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
The higher order terms for time and the time by age and time by cigarettes interactions are still
significant.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-67
75
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Model Development
77
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-68 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Another approach is to compute a likelihood ratio test that compares two models, the full model with all
of the interactions and the reduced model with just a subset of terms. The difference between the –2 log
likelihoods for the full and reduced models is the value of the test statistic. The likelihood ratio test
comparing the full and reduced models is valid only under ML estimation.
ML versus REML
• Differences in the model fit statistics under REML reflect differences in the
covariance parameter estimates.
• Differences in the model fit statistics under ML reflect differences in all the
parameter estimates.
• When comparing different mean models, differences under ML are a better
reflection of the importance of the fixed effects than differences under
REML.
78
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
If you reduce the mean model simply by examining p-values, then either estimation method is
appropriate. However, if you reduce the mean model using model fit statistics such as AIC and BIC,
then the estimation method must be ML. Model fit statistics under REML are used to select the
covariance structure. Likelihood ratio tests under REML can be used to assess the importance
of the covariance parameter estimates.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-69
Example: Using the spatial power covariance structure, fit the full model that allows the covariance
parameters to vary by timegroup with all of the main effects, the time by main effect
interactions, and the quadratic and cubic effects for time. Use the ML estimation method.
/* long02d05.sas */
proc mixed data=aids method=ml;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*depression time*partners time*drug
time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;
Model Information
timegroup 2 1 2
Dimensions
Covariance Parameters 5
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12
Number of Observations
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-70 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Iteration History
0 1 12584.82997708
1 2 11760.17221976 125.93204406
2 2 11652.56198545 21.25711718
3 2 11595.75541363 3.10321895
4 2 11543.42016538 0.00176337
5 2 11540.49577186 0.00047876
6 1 11538.58522785 0.00001937
7 1 11538.51370640 0.00000004
8 1 11538.51355638 0.00000000
Fit Statistics
The model fit statistics are not comparable to the ones produced under REML.
Null Model Likelihood Ratio Test
4 1046.32 <.0001
Standard
Effect Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-71
Num Den
Effect DF DF F Value Pr > F
The time*drug, time*depression, and time*partners interactions are not significant. The first
interaction to eliminate is the least significant interaction, which in this case is time*depression.
Example: Refit the model without time*depression.
proc mixed data=aids method=ml;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*drug time*partners time*cigarettes time*time
time*time*time / solution ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;
Partial Output
Covariance Parameter Estimates
Fit Statistics
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-72 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
4 1046.33 <.0001
The AIC information criterion decreased from 11576.5 to 11574.5, which indicates that this model is a
better fitting model.
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-73
Fit Statistics
4 1046.89 <.0001
The AIC information criterion decreased from 11574.5 to 11572.7, which indicates that this model is a
better fitting model.
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-74 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Residual 2.7846
Fit Statistics
4 1046.33 <.0001
The AIC criterion decreased from 11572.7 to 11572.2, which indicates that this model is a better fitting
model. However, do not eliminate terms simply on the basis of the AIC information criterion. Variables
with subject matter importance should be kept in the model. Sometimes nonsignificant results are just as
important as significant results with regard to the importance to the field of research.
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
All of the interaction terms are now significant at the 0.05 significance level. The variable age should not
be eliminated because it is involved in an interaction.
Example: Refit the final model using the REML estimation.
proc mixed data=aids;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-75
Partial Output
Covariance Parameter Estimates
Fit Statistics
4 1049.09 <.0001
Notice the information criteria are quite different using REML versus ML.
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
The model with the six main effects, two interactions with time, the quadratic effect of time, and the
cubic effect of time is your final model. The results show that recreational drug use has a positive effect
on the CD4+ cell count. The number of partners also has a positive relationship. There is also a negative
relationship between depression and CD4+ cell count. Finally, time has a cubic relationship with CD4+
cell count, which is what you observed in the graph showing the average trend.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-76 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
80
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Illustrating Interactions
82
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
A useful way to explain significant interactions is to graph them. The steps below show how to visualize
the interaction between time and age.
1. Create a data set with plotting points. These points should include the median for each explanatory
variable not involved in the interaction, and the 5th , 25th , 50th, 75th , and 95th percentiles of time
and age.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-77
2. Concatenate the plotting points data set with the aids data set.
3. Create an output data set in PROC MIXED with the predictions based on ̂ .
4. Graph the predictions of the observations with the plotting points by time and age to illustrate how
the slope for time differs by the level of age.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-78 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Illustrating Interactions
Example: Illustrate the time*cigarettes and time*age interactions. Use all of the values of cigarettes.
/* long02d06.sas */
proc means data=aids noprint;
var time age;
output out=percentiles p5=time_p5 age_p5 p25=time_p25 age_p25
p50=time_p50 age_p50 p75=time_p75 age_p75 p95=time_p95
age_p95;
run;
The values of interest are the 5th , 25th , 50th, 75th , and 95th percentiles.
data _null_;
set percentiles;
call symput('time_p5',time_p5);
call symput('time_p25',time_p25);
call symput('time_p50',time_p50);
call symput('time_p75',time_p75);
call symput('time_p95',time_p95);
run;
Macro variables are created for the percentiles of interest.
proc means data=aids noprint;
var age drug partners depression;
output out=plot median=age drug partners depression;
run;
The MEANS procedure is used to create a data set with the medians of the numeric variables not involved
in the interaction.
data plot;
set plot;
do cigarettes = 0 to 4;
do time = &time_p5,&time_p25,&time_p50,&time_p75,&time_p95;
timegroup=1*(time le 0) + 2*(time gt 0);
id+1;
output;
end;
end;
run;
A DATA step with two DO loops creates a data set with the plotting points for the time by cigarette
interaction. The data points include the median for each explanatory variable not involved in the
interaction, the 5th , 25th , 50th, 75th, and 95th percentiles of time, all the values of cigarettes, and two values
of timegroup. An ID variable is also created with values 1 through 25.
data cigplot;
set aids plot;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-79
The observations with the plotting points will not be used when PROC MIXED fits the
model. However, the output data set will have predicted means for these observations.
ods select all;
proc sgplot data=cigpred;
pbspline y=pred x=time / group=cigarettes;
where id le 25;
yaxis label="Predicted CD4+ Cell Counts in hundreds";
xaxis label="Time since Seroconversion";
keylegend / title="Packs of Cigarettes Smoked per Day";
title 'Interaction Plot of Time by Cigarette Usage';
run;
Only the observations with the plotting points are plotted by using the WHERE statement.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-80 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The graph shows that heavier smokers have a more precipitous decline in CD4+ cell counts than light or
nonsmokers. Patients who smoked four packs or more a day had the highest predicted CD4+ cell counts
before seroconversion. However, after four years the predicted CD4+ cell counts were nearly equal across
the four cigarette groups. These results agree with the individual profiles in the cigarette usage subgroups
graph in the exploratory data analysis section.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-81
data _null_;
set percentiles;
call symput('age_p5',age_p5);
call symput('age_p25',age_p25);
call symput('age_p50',age_p50);
call symput('age_p75',age_p75);
call symput('age_p95',age_p95);
run;
data plot1;
set plot1;
do age= &age_p5,&age_p25,&age_p50,&age_p75,&age_p95;
do time = &time_p5,&time_p25,&time_p50,&time_p75,&time_p95;
timegroup= 1*(time le 0) + 2*(time gt 0);
id+1;
output;
end;
end;
run;
The next set of programs creates the plotting points for the time*age interaction.
data ageplot;
set aids plot1;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-82 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The graph shows that older men have a more precipitous decline in CD4+ cell counts than younger men.
It seems beyond one year after seroconversion, older men have predicted CD4+ cell counts that are lower
than younger men. These results are not consistent with the individual profiles with the age subgroups
graph in the exploratory data analysis section. The difference is the interaction plot is a multivariate plot
while the individual profiles plot is a univariate plot.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-83
84
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The model development phase found that the time by age interaction and time by cigarettes interaction
are significant. The interaction plot of time by cigarettes showed that heavier smokers have a more
precipitous decline in CD4+ cell counts than light or nonsmokers. The interaction plot of time by age
showed that older men have a more precipitous decline in CD4+ cell counts than younger men.
The model also validated the graph of the average trend line in the exploratory data analysis section. The
graph of the average trend showed that time appeared to have a cubic relationship with CD4+ cell counts.
The model showed that the cubic effect of time is significant.
There also seems to be some heterogeneity in the covariance structure. The group effect of time before
seroconversion and time after seroconversion improved the fit of the model. The covariance parameter
estimates showed that the variance of the measurements before seroconversion is much larger than the
variance of the measurements after seroconversion. However, for equally spaced time intervals, the
correlation of the measurements before seroconversion is lower than the correlation of the measurements
after seroconversion.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-84 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Exercises
1) Is the spatial exponential covariance structure still one of the best fits?
2) Which spatial covariance structure is a good fit for the complex mean model but a relatively
poor fit for the reduced model?
c. Refit the reduced model using the REML estimation method and the spatial exponential
covariance structure. Also request the correlations from the R matrix and the parameter estimates
for the fixed effects.
1) Interpret the parameter estimates and inferences for the fixed effects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-85
Objectives
88
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
89
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-86 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Thus far the longitudinal models in this course all used the REPEATED statement. However, you should
not come to the conclusion that the REPEATED statement is used whenever you have longitudinal data.
Some longitudinal models fit the data better using the RANDOM statement. However, it is generally
recommended that you start with the REPEATED statement rather than the RANDOM statement because
this can reduce the computing time considerably.
90
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When the autocorrelation plot shows an autocorrelation function that cannot be easily modeled using
the covariance structures in PROC MIXED, a longitudinal model using the RANDOM statement might
be useful. These models are called random coefficient models because the regression coefficients for one
or more covariates are assumed to be a random sample from some population of possible coefficients.
In longitudinal models, the random coefficients are the subject-specific parameter estimates. Random
coefficient models are useful for highly unbalanced data with many repeated measurements per subject
(Verbeke and Molenberghs 1997).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-87
y
where represents:
• the population average
• parameters that are assumed to be the same for all subjects
and where represents:
• parameters that are allowed to vary over subjects
• subject-specific regression coefficients that reflect the natural
heterogeneity in the population
91
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The random coefficient model assumes that the vector of repeated measurements on each subject follows
a linear regression model where some of the regression parameters are population-specific (fixed-effects),
but other parameters are subject-specific (random-effects). The fixed effect parameter estimates represent
the population average. The subject-specific regression coefficients with time as a random effect reflect
how the response evolves over time for each subject. These subject-specific models can be very flexible,
but in practice polynomials involving time will often suffice. However, extensions of this flexibility, such
as fractional polynomial models or extended spline functions, can be considered as well (Verbeke and
Molenberghs 2000).
In random coefficient models, the covariance structure for the R matrix is the independent covariance
structure, which now accounts for the measurement error.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-88 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
92
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In random coefficient models, the random regression lines deviate from the population regression line. If
you specify the intercept as a random variable, then you enable the intercept for each subject to deviate
from the population intercept. If you specify the slope as a random variable, then you enable the slope for
each subject to deviate from the population slope. For example, if you specify time as a random effect
and a fixed effect in the longitudinal model for the CD4+ cell count data, then you stipulate that there is a
relationship between CD4+ cell counts and time and that this relationship can vary across subjects.
Subject-s pecific
Popul a tion i ntercept
devi a tion of i ntercept
a nd s l ope
a nd s l ope
93
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-89
In random coefficient models, the fixed effect parameter estimates represent the expected values of the
population of intercepts and slopes. The random effects for intercept represent the difference between the
intercept for the ith subject and the overall intercept. The random effects for slope represent the difference
between the slope for the ith subject and the overall slope. Random coefficient models also have a random
effect for the within-subject variation. Because there is not enough data on a single subject to estimate its
regression parameters, and to avoid theoretical obstacles, it is assumed that the random effects are
normally distributed random variables. The random effects and random errors are also independent of
each other.
G1
0
G2
G3
0 G4
94
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When you specify the SUBJECT= option in the RANDOM statement, a block-diagonal covariance matrix
with identical blocks is created in the G matrix. Complete independence is assumed across subjects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-90 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
95
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The slide above has two random effects and two subjects, where each block corresponds to a subject.
Notice there is complete independence across subjects. If a represents the intercept and b represents time,
then the variance estimate for the intercept tells you how much the intercepts vary across subjects.
The variance estimate for time represents how much the slopes for time vary across subjects. The
covariance estimate between the intercept and time represents how the change in the intercepts affects
the slopes of time. In other words, it indicates whether the CD4+ cell count depletion over time is affected
by the subject’s CD4+ cell count at seroconversion.
In this example, the unstructured covariance structure is appropriate for the G matrix but not the R matrix
because the issue regarding unequal time intervals across subjects does not pertain to the G matrix.
The covariance structure for the G matrix models the error that represents the natural heterogeneity
between subjects. The within-subject variability, which is directly related to the spacing of measurements,
is modeled by the covariance structure in the R matrix.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-91
96
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
98
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-92 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
A common misconception is that random coefficient models do not take into account the serial correlation
error within subject. However, when you specify the intercept and slope (in this example, time) in a
RANDOM statement, the V matrix enables the correlations within subject to change over time. The
unequal time intervals are taken into account because the Z matrix is used in the computation of the
V matrix. The difference between models with random intercepts and slopes and models with a spatial
covariance structure for the R matrix is that the random coefficient model indirectly models the serial
correlation within subject with the variances and covariances of the intercept and slope. The model with
the REPEATED statement directly models the serial correlation within subject by specifying a spatial
covariance structure for the R matrix.
99
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When you build a random coefficient model, it is necessary to determine which random effects are needed
in the model. Examining the residual profile plots might be helpful, but with 369 subjects the plots can
be cumbersome. One recommended strategy is to include all the relevant random effects. This ensures
that the remaining variability is not due to any missing random effects. However, including high
dimensional random effects with an unstructured covariance matrix leads to complicated covariance
structures and might result in non-convergence of the algorithms in PROC MIXED (Verbeke
and Molenberghs 2000).
After a candidate model is selected, a likelihood ratio chi-square test can be computed by comparing
the candidate model with the reduced model. The mean structure of the model remains the same across
both models, but the number of random effects is reduced by one in the reduced model. Verbeke
and Molenberghs (2000) recommend using the REML estimation method because the REML test statistic
performed slightly better than the ML test statistic. A program illustrating the likelihood ratio test
is shown in an appendix.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-93
The p-values computed for the likelihood ratio test in this scenario might be slightly biased
because the asymptotic null distribution for the likelihood ratio test statistic for testing hypotheses
regarding random effects is often a mixture of chi-squared distributions rather than the classical
single chi-squared distribution (Verbeke and Molenberghs 2000).
There is also a COVTEST option in PROC MIXED that produces asymptotic standard errors
and Wald Z-tests for the covariance parameter estimates. However, the sample size requirements
for these tests are excessive and often not met (approximately 400 or more subjects).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-94 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Example: Fit a random coefficient model with random intercepts, and random linear, quadratic, and
cubic slopes for time. Include all the two factor interactions with time as the fixed effects.
Specify the COVTEST, G, and GCORR options. Also specify the V and VCORR option for
subject 13.
/* long02d07.sas */
proc mixed data=aids covtest;
model cd4_scale=time age cigarettes drug partners depression
time*age time*depression time*drug time*partners
time*cigarettes time*time time*time*time / solution
ddfm=kr;
random intercept time time*time time*time*time / type=un subject=id
g gcorr v=13 vcorr=13;
title 'Random Coefficient Model with Cubic Effect of Time';
run;
Selected PROC MIXED statement option:
COVTEST produces asymptotic standard errors and Wald Z-tests for the covariance parameter
estimates.
Selected RANDOM statement options:
G requests that the estimated G matrix be displayed.
GCORR displays the correlation matrix corresponding to the estimated G matrix.
V requests that blocks of the estimated V matrix be displayed. Also, you can specify which
subject’s V matrix to display.
VCORR displays the correlation matrix corresponding to the blocks of the estimated V matrix.
Also, you can specify specify which subject’s correlation matrix to display.
In the RANDOM statement, you must specify INTERCEPT (or INT) as a random effect to indicate
the intercept. PROC MIXED does not include the intercept in the RANDOM statement by default
as it does in the MODEL statement. Furthermore, the effects in the RANDOM statement in combination
with the SUBJECT= option make these random effects deviations from the fixed means. The random
effects must be in the MODEL statement or else you might assume that the fixed effect’s parameter
estimate is 0, which is a questionable assumption.
Random Coefficients Model with Cubic Effect of Time
Model Information
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-95
Dimensions
Covariance Parameters 11
Columns in X 14
Columns in Z per Subject 4
Subjects 369
Max Obs per Subject 12
There are a total of four columns in the Z matrix. These columns represent the intercept, time, time*time,
and time*time*time random effects. The 14 columns in the X matrix represent the parameters in the
mean model.
Number of Observations
Iteration History
0 1 12668.04910184
1 2 11906.71433484 598549.15664
2 1 11826.06516534 1503342.7659
3 1 11770.20234989 1238618.9132
4 1 11760.73740046 0.01597010
5 1 11735.44729489 0.00600725
6 1 11710.08383953 0.00123057
7 1 11704.71596333 0.00016956
8 1 11704.02578248 0.00000604
9 1 11704.00292721 0.00000001
10 1 11704.00288278 0.00000000
Estimated G Matrix
The Estimated G Matrix table shows the estimated variances and covariances of the random effects.
For example, 7.1562 (row 1, column 1) is the variance of the intercepts. The value 0.8308 (row 2, column
2) is the variance of the linear slopes of time. The value -1.0342 (row 1, column 2) is the covariance
of the intercepts and the linear slopes of time.
Estimated G Correlation Matrix
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-96 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The Estimated G Correlation Matrix table shows the correlations between random effects. The correlation
between the intercepts and the linear slopes of time is -0.4242 (row 1, column 2) while the correlation
between the linear slopes of time and the quadratic slopes of time is 0.6288 (row 2, column 3).
Estimated V Matrix for Subject 13
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
The Estimated V matrix shows the variances and covariances among the measurements (in this case,
subject 13). The V matrix is calculated by the formula ZGZ' +R . Since the Z matrix has the time values,
the variances and covariances estimated in the V matrix are based on the variances and covariances
of the random effects along with the time values of the measurements. Notice the variances along
the diagonal are not equal. This was not the case in the model with the spatial power covariance structure
(without the GROUP= option) and this illustrates the strength of the random coefficient models.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-97
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
Estimated V Correlation
Matrix for Subject 13
The Estimated V Correlation Matrix table shows the correlations among the measurements (in this case,
subject 13). Since the Z matrix has the time values, the correlations estimated from the V matrix are based
on the variances and covariances of the random effects along with the time values of the measurements.
The R matrix in this case has an independent covariance structure. Notice the correlations do not have
to decrease as the time interval increases. This was not the case in the model with the spatial power
covariance structure and this illustrates the flexibility of the random coefficient model.
If intercept is the only random effect, then the correlations would be equal across the
measurements within time group (a compound symmetry covariance structure).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-98 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Standard Z
Cov Parm Subject Estimate Error Value Pr Z
A total of 11 covariance parameters are estimated in this model. The values correspond to the values
in the G matrix. The COVTEST option displays the standard error, Z value, and p-value. The results
show that the variances and covariances of the random effects are significantly different from 0.
The residual value of 4.9781 corresponds to the variance estimate in the R matrix. The inferences are
unreliable for small sample sizes. With 369 subjects, the asymptotic results should be valid.
The recommended sample size to meet the asymptotic requirement is 400 or more.
Fit Statistics
10 964.05 <.0001
The Null Likelihood Ratio Test compares the fitted model to a model with a V independent covariance
structure (in this case, one without a RANDOM statement).
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-99
Num Den
Effect DF DF F Value Pr > F
The inferences from the random coefficients model are very similar to the repeated effects model.
The time*age, time*cigarettes, quadratic effect and cubic effect of time are significant.
Example: To compare the random coefficient model with the last repeated effects model, fit a random
coefficients model without the time*drug, time*partners, and time*depression interactions
and with the GROUP= option.
proc mixed data=aids;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time /
solution ddfm=kr;
random intercept time time*time time*time*time / type=un subject=id
group=timegroup g gcorr v=13 vcorr=13;
title 'Random Coefficients Final Model';
run;
Model Information
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-100 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
timegroup 2 1 2
Dimensions
Covariance Parameters 21
Columns in X 11
Columns in Z per Subject 8
Subjects 369
Max Obs per Subject 12
Number of Observations
Iteration History
0 1 12650.28067427
1 2 11621.69179982 10525.543915
2 1 11615.72160553 0.00765038
3 1 11594.68913868 0.00372117
4 1 11577.99517787 0.00112286
5 1 11573.11206038 0.00019219
6 1 11572.33492058 0.00000888
7 1 11572.30167262 0.00000003
8 1 11572.30155569 0.00000000
Estimated G Matrix
Estimated G Matrix
1
2
3
4
5 -9.1612 3.4119 -0.3790
6 13.0029 -4.7445 0.5156
7 -4.7445 1.7910 -0.1968
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-101
The Estimated G Matrix shows the estimated variances and covariances of the random effects by time
groups. For example, 37.2267 (row 1, column 1) is the variance of the intercepts in time group 1 and
11.5272 (row 5, column 5) is the variance of the intercepts in time group 2. The value 246.90 (row 2,
column 2) is the variance of the linear slopes of time in time group 1 and 13.0029 (row 6, column 6)
is the variance of the linear slopes of time in time group 2. The value 77.9624 (row 1, column 2) is the
covariance of the intercepts and the linear slopes of time in time group 1 and -9.1612 (row 5, column 6)
is the covariance of the intercepts and the linear slopes of time in time group 2.
Estimated G Correlation Matrix
1
2
3
4
5 -0.7483 0.7509 -0.7572
6 1.0000 -0.9831 0.9700
7 -0.9831 1.0000 -0.9976
8 0.9700 -0.9976 1.0000
The Estimated G Correlation Matrix table shows the correlations between random effects by time groups.
The correlation between the intercepts and the linear slopes of time is 0.8132 (row 1, column 2) in time
group 1 and -0.7483 (row 5, column 6) in time group 2. This is an artifact of how time was coded. In time
group 1, the intercept is the last time point (time points are negative and the intercept is at time 0).
Therefore, negative slope coefficients lead to smaller intercepts. However, in time group 2, the intercept
is the first time point. Thus, negative slope coefficients lead to larger intercepts.
Estimated V Matrix for Subject 13
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-102 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
1
2
3
4 4.9821 5.2123 5.1246
5 5.3898 5.4426 5.3558
6 5.8390 5.8197 5.7516
7 6.2852 6.2690 6.2312
8 6.7203 6.7567 6.7578
9 7.1265 7.2475 7.2936
10 10.5925 7.6528 7.7427
11 7.6528 11.0814 8.0615
12 7.7427 8.0615 11.3736
The Estimated V Matrix table shows the variances and covariances among the measurements by time
groups (in this case, for subject 13). For example, the variance of the first measurement in time group 1
is 15.2504 (row 1, column 1) and the covariance of the first and second measurements in time group 1
is 11.1374 (row 1, column 2). The variance of the first measurement in time group 2 is 11.3064 (row 4,
column 4) and the covariance of the first and second measurements in time group 2 is 5.9626 (row 4,
column 5). The model assumes that the measurements in time group 1 are independent of the
measurements in time group 2.
Estimated V Correlation Matrix for Subject 13
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-103
Estimated V Correlation
Matrix for Subject 13
1
2
3
4 0.4552 0.4657 0.4519
5 0.5667 0.5595 0.5434
6 0.6139 0.5982 0.5836
7 0.6409 0.6250 0.6132
8 0.6670 0.6557 0.6473
9 0.6900 0.6861 0.6815
10 1.0000 0.7064 0.7054
11 0.7064 1.0000 0.7181
12 0.7054 0.7181 1.0000
The Estimated V Correlation Matrix table shows the correlations among the measurements by time
groups (in this case, for subject 13). For example, the correlation between the first and second
measurements in time group 1 is 0.7325 (row 1, column 2) and the correlation between the first
and second measurements in time group 2 is 0.6068 (row 4, column 5). The correlation between
the measurements in time group 1 and time group 2 is 0.
Covariance Parameter Estimates
Fit Statistics
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-104 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
20 1077.98 <.0001
The AIC information criterion (11614.3) is very close to the AIC of the repeated-effects model (11611.2).
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
The inferences in the random coefficients model are very similar to the inference in the repeated-effects
model.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-105
When is the V matrix the same in the random coefficient model and a model
with the REPEATED statement and several time points?
a. Random coefficient model has a random intercept and slope, and the
repeated model has spatial power covariance structure.
b. Random coefficient model has a random intercept and slope, and the
repeated model has compound symmetry covariance structure.
c. Random coefficient model has only a random intercept, and the
repeated model has compound symmetry covariance structure.
d. Random coefficient model has only a random intercept, and the
repeated model has spatial power covariance structure.
101
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
• EBLUPs are predictions that take into account the residual variability and
between-subject variability.
• If the within-subject variability is large in comparison to between-subject
variability for an individual profile, then the response values are unreliable
and the predictions move toward the population mean.
• If the within-subject variability is small in comparison to between-subject
variability for an individual profile, then the response values are reliable and
the predictions move toward the observed data.
• This feature is useful for forecasting time series.
103
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
One objective in the AIDS study is to estimate the time course of CD4+ cell depletion for individual
subjects. However, the individual profile plots showed that the observed CD4+ levels are highly variable
over time. Part of the reason might be due to the large residual variability error component. Therefore,
estimating individual profiles without taking account of the error associated with residual variability
in CD4+ cell determinations might be unreliable.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-106 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
In PROC MIXED, you can compute predicted response values that are empirical best linear unbiased
predictions (EBLUPs). These predictions can be interpreted as a weighted mean of the population average
profile and the observed data profile. The general formula is
PROC MIXED computes EBLUPs for the response variable in two ways:
• Using the RANDOM statement with the OUTP= option in the MODEL
statement.
• Using the REPEATED statement with the OUTP= option in the MODEL
statement and the SUBJECT= option in the REPEATED statement. Only
observations with missing response values will have EBLUPs.
104
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC MIXED computes EBLUPs for the response variable in two ways. When you use the RANDOM
statement with the OUTP= option in the MODEL statement, the predicted values from the original data
ˆ
are ˆ . Predicted values for data points other than those observed can be obtained by using missing
dependent variables in your input data set.
Another way to compute EBLUPs for the response variable is to use the OUTP= option in the MODEL
statement with the REPEATED statement with the SUBJECT= option. Simply concatenate the original
data with the observations with missing response variable values. The predictions for these observations
are EBLUPs. However, if the new observation is independent of the data used in fitting the model (the
subject has no previous observations), then the EBLUP equals ̂ .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-107
The standard errors for EBLUPs with the REPEATED statement are larger than those for the RANDOM
statement (unless you use the NEWOBS option in the MODEL statement when you have a RANDOM
statement). The reason for this discrepancy is not that one is more accurate than the other. If you think
of an observation as Y = Signal + Noise (with noise representing measurement error), the RANDOM
statement predicts the signal (unless you use the NEWOBS option) while the REPEATED statement
predicts the sum of signal and noise. The signal is predicted with greater precision than the sum of signal
and noise.
EBLUPs are not only useful for forecasting time series, but also in generating predictions based
on changes in the covariate patterns. For example, you can generate predictions on CD4+ cell counts
based on the changes in cigarette consumption.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-108 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Computing EBLUPs
Example: Compute EBLUPs and Xbetas with the random coefficients model with the cubic effect
of time. Forecast the CD4+ cell count for subject 10145 at time 5.30 and graph the individual
profile of subject 10145 along with the EBLUPs and Xbetas.
/* long02d08.sas */
data aids1;
input time age cigarettes drug partners depression id
timegroup;
datalines;
5.30 4.4 0 1 -3 -7 10145 2
;
run;
The covariate values for subject 10145 were held fixed from their last observation period and timegroup
is set at 2.
data forecast;
set aids aids1;
run;
The degrees of freedom calculations are based on the residual method (DDFM=RES) to save on
computing time.
data predict;
merge predblup predxbeta;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-109
The EBLUPs follow the data values before seroconversion indicating that the between-subject variability
is much greater than the within-subject variability. However, the EBLUPs follow the Xbetas after
seroconversion indicating that the between-subject variability is much smaller than the within-subject
variability. The EBLUP at time 5.3 seems to be very close to the Xbeta at 5.3.
Example: Compute EBLUPs and Xbetas with the model with the REPEATED statement that had
heterogeneity and spatial power covariance structure. Forecast the CD4+ cell count for subject
10145 at time 5.30 and graph the individual profile of subject 10145 along with the EBLUPs
and Xbetas.
ods select none;
proc mixed data=forecast;
class timegroup id;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-110 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-111
Recall that when using the OUTP= option with the SUBJECT= option in the REPEATED statement, the
EBLUPs are only computed for time points that have missing response values. Therefore, the EBLUP is
only computed for time point 5.3. The EBLUPs and Xbetas are different from the previous graph because
the V matrices for the two models are different. If the random coefficients model has the same V matrix
and the same mean model as the model with the REPEATED statement, then the EBLUPs at time 5.3
would be the same.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-112 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
106
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
These models
• take into account random effects, serial correlation, and measurement
error
• enable the user to fit a large variety of covariance structures
• often have estimation and convergence problems
• are not generally recommended as a longitudinal model
108
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-113
You can also fit a model in PROC MIXED with both the RANDOM and REPEATED statements.
However, this model is generally not recommended in practice. Diggle, Heagerty, Liang, and Zeger
(2002) argue that, in applications, the effect of serial correlation is very often dominated by the
combination of random effects and measurement error. They recommend that no models simultaneously
include serial correlation as well as random effects other than intercepts. Verbeke and Molenberghs
(2000) also claim that models that include several random effects, serial correlation, and measurement
error will often have estimation problems.
109
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Convergence problems in PROC MIXED arise from estimating the covariance parameters in the model,
not the fixed effects. For example, when the covariance parameters are on a different scale, the algorithm
in PROC MIXED might have trouble converging. Furthermore, if there is very little variability in the time
effects the variance of the random slopes might approach 0, which might generate numerical difficulties.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-114 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
110
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When fitting complicated covariance structures, you often need to specify starting values (using the
PARMS statement) in order for PROC MIXED to converge. Requesting a grid search over several values
of these parameters is recommended. Sometimes it is useful to use the Fisher scoring method, which uses
the expected Hessian matrix, which consists of second derivatives of the objective function with respect
to the covariance parameters, instead of the observed one.
Other recommendations can be found in the online SAS documentation in the Convergence
Problems section of PROC MIXED.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-115
Example: Fit a longitudinal model with the RANDOM and REPEATED statements. Specify the group
effect and only the interactions that were significant in the random coefficients model.
/* long02d09.sas */
proc mixed data=aids covtest;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
random intercept time time*time time*time*time / type=un subject=id;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Random Effects and '
'Serial Correlation';
run;
Model Information
The covariance structures are now an unstructured covariance structure and a spatial power covariance
structure.
Class Level Information
timegroup 2 1 2
Dimensions
Covariance Parameters 15
Columns in X 11
Columns in Z per Subject 4
Subjects 369
Max Obs per Subject 12
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-116 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Number of Observations
Iteration History
0 1 12650.28067427
1 2 12266.99621099 38594.655773
2 1 12189.83366851 74110.722854
3 1 12140.37942938 32264.055111
4 1 12095.41712550 642385.13078
5 1 12037.70974721 2649327.2638
6 1 11975.05255510 1356563.0471
7 3 11874.39052840 .
8 1 11825.73904305 .
9 1 11699.28700966 .
10 1 11594.57830896 .
11 3 11526.55157663 .
12 4 11511.03788261 .
13 3 11502.89876486 .
14 1 11496.21859975 .
15 1 11494.28586873 .
16 2 11493.40764086 0.00037097
17 2 11492.86168202 .
18 1 11491.90004455 0.00024427
19 2 11491.81327410 .
20 4 11490.43193681 .
21 2 11490.28644277 0.00000077
22 1 11490.28372598 0.00000000
Even with the complicated covariance structures, the model converged. However, the note in the log
indicates a potential problem.
Whenever the Log window shows the note that the estimated G matrix is not positive definite, you are
most likely to see a zero variance component estimate. Sometimes a zero variance component estimate
can indicate an inappropriate model, such as an over-parameterized model, and you might want to
respecify the model to make sure you are not accounting for the same variance in different parameters.
Covariance Parameter Estimates
Standard Z
Cov Parm Subject Group Estimate Error Value Pr Z
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-117
The variances for the intercepts and the linear effect of time are significant. However, the variance
estimate for the quadratic effect of time is 0. When you have a variance parameter estimate of 0, one
recommendation is to drop that random effect from the model. Because the quadratic effect of time will
be dropped, then the cubic effect of time should also be dropped.
Fit Statistics
13 1160.00 <.0001
The AIC information criterion is the lowest of any model thus far. The random coefficients model had
an AIC value of 11614.3.
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-118 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The inferences for the fixed effects are similar to the random coefficients model.
Example: Refit the longitudinal model without the quadratic and cubic effects of time in the RANDOM
statement.
proc mixed data=aids covtest;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
random intercept time / type=un subject=id;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Random Effects and '
'Serial Correlation';
run;
Model Information
timegroup 2 1 2
Dimensions
Covariance Parameters 8
Columns in X 11
Columns in Z per Subject 2
Subjects 369
Max Obs per Subject 12
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-119
Number of Observations
Iteration History
0 1 12650.28067427
1 4 11860.81110568 196.02454031
2 1 11688.87754208 42.26586415
3 1 11566.25877989 .
4 2 11529.61932012 5.55286390
5 2 11519.58037978 2.07258899
6 2 11509.30705045 0.17852318
7 2 11505.84039590 0.00048014
8 2 11504.19719227 0.00023349
9 2 11503.52177894 0.00001620
10 1 11503.46258346 0.00000009
11 1 11503.46226161 0.00000000
Standard Z
Cov Parm Subject Group Estimate Error Value Pr Z
The results of the Covariance Parameter Estimates table show that the variance of the intercepts is
significant. This indicates that there is significant variation of the intercepts between subjects. The
variance of the linear effect of time is also significant. This indicates that there is significant variation in
the slopes of time between subjects. However, the covariance of the intercepts and the linear effect of
time is not significant. Therefore, the subject’s CD4+ cell count depletion over time is not affected by the
subject’s CD4+ cell count at seroconversion.
The results also show that the variances of the residuals in the R matrix for time group 1 and time group 2
are significant. The variances appear to be different from each other across time groups. Furthermore, for
equally spaced time intervals, the correlation among measurements in time group 1 is much smaller than
the correlation among measurements in time group 2.
Fit Statistics
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-120 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
7 1146.82 <.0001
The AIC information criterion is very close to the model with the random effects of time*time and
time*time*time (11518.3). However, the BIC information criterion is lower (11550.7 versus 11573.0)
because the model with the four random effects had seven more covariance parameter estimates (15 to 8).
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
The inferences of the fixed effects are similar to the model with the four random effects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-121
112
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In conclusion, the random coefficient models might be useful to fit longitudinal models, especially when
there is a large error component due to random effects. The model still enables the correlations within
subject to change over time. However, the correlations are estimated using the variances and covariances
of the random effects along with the time values for the subjects.
The final CD4+ cell count model has both a RANDOM and REPEATED statement with an unstructured
G matrix and a spatial power covariance structure for the R matrix. The variances of the intercepts and
linear effects of time were significantly different from 0. This means that the CD4+ cell count values
at seroconversion vary across subjects and the depletion of CD4+ cell counts over time vary across
subjects. Heterogeneity in the R matrix is also evident in the model.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-122 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Exercises
5) Interpret the parameter estimates for the random effects for Subject 1.
b. Fit a model with both the REPEATED and RANDOM statements. Specify a random intercept
and hours, and use the unstructured covariance structure. Print the G matrix, the correlation
matrix based on the V matrix, and the parameter estimates for the fixed effects. Specify the
spatial exponential covariance structure for the R matrix, add a measurement error component,
and use the FIRSTORDER suboption.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-123
What covariance structure does the R matrix have in the first random
coefficient model?
a. Unstructured
b. Independent
c. Compound symmetry
d. Spatial power
114
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Objectives
• Explain the linear mixed model residual and influence diagnostic statistics.
• Examine how the violation of assumptions regarding the random effects
influences the inference of the model.
• Create residual and influence diagnostic plots.
118
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-124 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The following are common questions that deal with mixed model
assessment.
• Are the model assumptions validated?
• Is the covariation of the observations modeled properly?
• Are the results sensitive to specific data points and clusters?
119
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In ordinary least squares regression models, model assessment usually revolves around residual analysis,
overall measures of goodness-of-fit, and influence analysis. Model assessment is especially important
in linear mixed models because likelihood-based estimation methods are particularly sensitive to unusual
observations. After you detect these observations, you should examine them and determine whether they
are erroneous. If these observations are legitimate, then they might represent important new findings.
They also might indicate that your current model is inadequate.
• Standard residual and influence diagnostics for linear models can now be
extended to linear mixed models.
• Diagnostics in linear mixed models are more complicated by the fact that
the estimates of the fixed effects depend on the estimates of the
covariance parameters.
• With longitudinal data, it is usually more important to measure the
influence of a set of observations on the analysis, not just the influence of
individual observations.
120
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-125
The differences between the influence and residual analysis in the ordinary least squares models
and the linear mixed model come from the fact that the estimates of the fixed effects and the predictions
of the random effects depend on the estimates of the covariance parameters. If there are no random effects
and the model uses an independent covariance structure, then the general linear mixed model reduces
to the ordinary least squares model and the residual and influence measures are well known.
Influence Diagnostics
121
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The fixed effects are affected when you remove observations because of the change in covariance
parameters and the change in the regressor space.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-126 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Types of Residuals
rmi yi i ˆ
• A conditional residual is the difference between the observed data and the
predicted value of the observation
122
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Conditional residuals are subject-specific residuals that are useful in detecting outlying subjects
and in determining whether the random effects are selected properly. If you choose the right random
effects, the conditional residuals should be small. For example, if you choose a random intercept but you
should have a random slope in the model, the subject-specific residuals show the model misspecification.
Marginal residuals are population-averaged residuals that are helpful in diagnosing whether the fixed
effect part of the model is selected properly. They are also helpful in diagnosing the fit of the model
averaged across all subjects. For example, if you were to predict the response of the next subject in your
study, the only way of measuring the quality of the prediction is by using the marginal residuals.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-127
Types of Residuals
123
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The raw residuals are usually not well suited to examine model assumptions and to detect outliers
and influential observations. For example, if the variances of the observations differ, then a data point
with a smaller raw residual and the smaller variance might be more troublesome than a data point with
a large residual and the larger variance. To account for the unequal variance of the residuals, various
studentizations are applied (Schabenberger 2004).
Residual Analysis
• Studentized residuals and the Pearson residuals are useful for detecting
potential outliers.
• Scaled residuals are useful for evaluating the appropriateness of the
covariance structure of your model.
124
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-128 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Influence Diagnostics
125
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The basic procedure for quantifying influence is shown above. It is important to note that influence
analyses are performed under the assumption that the chosen model is correct. Changing the model
structure can alter the conclusions (Schabenberger 2004).
DFFITS and
the fitted and predicted values
PRESS residuals
Cook’s D or
the estimates
Multivariate DFFITS
fixed effects
COVTRACE or
the precision
COVRATIO
Cook’s D or
the estimates
covariance Multivariate DFFITS
parameters COVTRACE or
the precision
COVRATIO
126
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-129
An overall influence statistic measures the change in the objective function being minimized. In ordinary
least squares regression, the residual sums of squares serves that purpose. In linear mixed models fit
by maximum likelihood or restricted maximum likelihood, an overall influence measure is the likelihood
distance. This statistic gives the amount by which the log-likelihood of the full data changes if one were
to evaluate it at the reduced-data estimates.
The PRESS residual is the difference between the observed value and the predicted marginal mean, where
the predicted value is obtained without the observations in question. The sum of the PRESS residuals is
the PRESS statistic. The DFFITS statistic is the change in predicted values due to removal
of a single data point standardized by the externally estimated standard error of the predicted value
in the full data.
The primary difference between Cook’s D and Multivariate DFFITS (MDFFITS) is that MDFFITS uses
an externalized estimate of the variance of the parameter estimates while Cook’s D does not. For both
statistics, you are concerned about large values, indicating that the change in the parameter estimate is
large relative to the variability of the estimate.
The benchmarks of no influence for the COVTRACE and COVRATIO statistics are 0 for
the covariance trace and 1 for the covariance ratio. The variance matrix that is used in the computation
of COVTRACE and COVRATIO for covariance parameters is obtained from the inverse Hessian matrix.
127
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Influence diagnostics are performed by noniterative or iterative methods. The noniterative diagnostics
rely on recomputation formulas under the assumption that covariance parameters or their ratios remain
fixed. With the possible exception of a profiled (factored out) residual variance, no covariance parameters
are updated. This is the default behavior because of its computational efficiency. However, the impact of
an observation on the overall analysis can be underestimated if its effect on covariance parameters is not
assessed. Toward this end, iterative methods can be applied to gauge the overall impact of observations
and to obtain influence diagnostics for the covariance parameter estimates.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-130 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Percent
0
Intercept Estimate
128
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When you use the SOLUTION option in the RANDOM statement, a table of the random effect parameter
estimates, which are deviations from the population parameter estimates, is produced. These estimates are
the empirical best linear unbiased predictors (EBLUPs). They can be interpreted as deviations from the
population average, which might be helpful for detecting subjects or groups of subjects that are having
a different time course. Furthermore, these estimates can be used in the prediction of subject-specific
profiles. If you use ODS and save the parameter estimates of the random effects to a SAS data set, you
can create histograms and scatter plots for diagnostic purposes.
The random effects for intercept represent the variability in subject-specific intercepts not explained
by the covariates included in the model. The distribution of the random effects is assumed to be normal.
You might be able to check this assumption by plotting the intercept parameter estimates.
However, both the residual error and the covariate structure play an important role in the shape
of the distribution of random effects. If the residual variability is large compared to the random effects
variability, then the observed distribution of the random effects might not reflect the true distributional
shape of the random effects. In fact, Verbeke and Molenberghs (2000) show that when the within-subject
variability is large in comparison to the between-subject variability, the histogram of random effect
parameter estimates shows less variability than is actually present in the population of random effects.
Therefore, these histograms might be misleading.
Verbeke and Molenberghs (2000) suggest that the nonnormality of the random effects can only
be detected by comparing the results obtained under the normality assumption with results obtained from
fitting a linear mixed model with relaxed distributional assumptions for the random effects. This will not
be a trivial task to accomplish. Therefore, what are the consequences of ignoring the normality
assumption of the random effects?
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-131
• Fixed effect parameter estimates and standard errors are robust with
respect to the misspecification of the random effects distribution.
• Violation of the normality assumption clearly affects the standard errors of
the random effects.
• Parameter estimates of the random effects are also affected by the
normality assumption.
129
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
If the model is correctly specified and the covariance structure is appropriate, then the violation
of the normality assumption of the random effects has little effect on the estimation of the fixed effect
parameter estimates and their standard errors. Verbeke and Lesaffre (1997) performed extensive
simulations comparing the corrected standard errors of the fixed effects (using an estimator that corrects
for possible nonnormality of the random effects) to the uncorrected standard errors of the fixed effects.
The results showed that the two standard errors were very similar regardless of the distribution. However,
when the normality assumption is violated, the corrected standard errors for the random effects are clearly
superior to the uncorrected ones. It is data dependent on whether the standard errors increase or decrease.
Therefore, if interest is only in the inference of the fixed effects, then valid inferences are obtained even
when the random effects are incorrectly assumed to be normally distributed. If interest is in the inference
of the random effects, then you should explore whether the assumed normal distribution is appropriate.
Verbeke and Molenberghs (2000) suggest that if you are interested in detecting subgroups in the random
effects population, then you should take as many measurements as possible, at the beginning and
at the end of the study to obtain maximal spread of the time points.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-132 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
130
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-133
Model Assessment
Example: Fit the model with the REPEATED and RANDOM statements. The model includes the six
main effects, two interactions, and the two polynomial terms for time. Specify the spatial
power covariance structure and the group effect of timegroup in the REPEATED statement.
Specify time and intercept in the RANDOM statement along with the unstructured
covariance structure. Specify plots of the likelihood distances, the PRESS statistics, influence
statistics, and residuals (raw, student, Pearson, and scaled). Use iterative analysis with
the maximum number of iterations set to 5 and use the FIRSTORDER suboption. Identify
potentially influential subjects and observations.
/* long02d10.sas */
ods graphics / imagemap=on tipmax=2400;
ods output influence=influence;
proc mixed data=aids noclprint plots=(distance(useindex)
press(useindex) influenceestplot(useindex) residualpanel(box)
studentpanel(box) pearsonpanel(box) vcirypanel(box));
class timegroup id;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time
/ solution ddfm=kr(firstorder) influence(effect=id iter=5) vciry;
random intercept time / type=un subject=id;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title "Longitudinal Model with Random Effects and Serial "
"Correlation";
run;
Select ODS GRAPHICS statement options:
IMAGEMAP=ON|OFF controls data tips and drill down generation. Data tips are pieces
of explanatory text that appear when you hold the mouse pointer
over the data portions of a graph contained in an HTML page.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-134 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-135
Compared to noniterative updates, the computations for iterative influence analysis are more
involved. In particular for large data sets and/or a large number of random effects, iterative
updates require considerably more resources. A one-step (ITER=1) or two-step update might
be a good compromise. The output includes the number of iterations performed, which is less
than if the iteration converges. If the process does not converge in iterations, you should
be careful in interpreting the results, especially if n is fairly large.
Partial Output
The scaled residuals appear normally distributed with a few outliers. The random scatter around the zero
reference line indicates no problems with the choice of the c ovariance structure.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-136 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The conditional residuals appear normally distributed with a few extreme outliers.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-137
The conditional studentized residuals appear normally distributed with a few extreme outliers.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-138 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The conditional Pearson residuals appear normally distributed with a few extreme outliers.
Partial Output
Influence Diagnostics for Levels of id
Cook's
Number of D
Observations PRESS Cook's Cov
id in Level Iterations Statistic D MDFFITS COVRATIO COVTRACE Parms
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-139
RMSE
MDFFITS without Restricted
Cov COVRATIO COVTRACE deleted Likelihood
id Parms CovParms CovParms level Distance
Since an iterative analysis was specified, the Influence Diagnostics for Levels of ID table shows the
overall impact of each cluster representing a subject and the influence diagnostics for the covariance
parameter estimates. Because the maximum number of iterations was set to five, for each deletion set the
covariance parameters were updated up to five times. It should be noted that for every deletion set,
PROC MIXED converged in less than 5 iterations (maximum number was 3).
RMSE is an estimate of the root mean square error with the cluster deleted.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-140 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The plot of the restricted likelihood distance clearly shows several influential clusters. Cluster 30148 has
the largest restricted likelihood distance. You should examine influential clusters and determine whether
they are erroneous. If these clusters are legitimate, then they might represent important new findings.
They also might indicate that your current model is inadequate.
By viewing the tooltip information, you can see that patient 30148 had extremely large CD4+ cell
counts.
Several clusters have a large effect on the fixed effects and covariance parameters. These clusters warrant
further investigation. They can point to a model breakdown and lead to the development of a better model
(Schabenberger 2004).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-141
The PRESS statistic measures the influence on the fitted and predicted values. The USEINDEX option
uses as the horizontal axis label the index of the effect level rather than the formatted value(s). Several
clusters appear influential.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-142 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The fixed effects deletion estimates plot gives a detailed picture on how the individual parameter
estimates react to the removal of each cluster. Some of the parameters clearly are affected.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-143
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-144 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The plot of the covariance parameter deletion estimates gives a detailed picture of how the individual
covariance parameters react to the removal of the clusters.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-145
data aids_id;
set aids_inf;
by id;
if first.id;
run;
The first DATA step merges the aids and influence data sets into the aids_inf data set and subsets to
observations with restricted likelihood distance statistics greater than 1 or PRESS statistics greater than
1000. The second DATA step, creates the aids_id data set and retains only the first observation for each
subject.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-146 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
cd4_
Obs id scale time PRESS RLD
Based on the PRESS statistic and the restricted likelihood distance, seven subjects might be considered
influential. Two subjects, 10171 and 10191, have both extremely high and extremely low cd4_scale
counts. Subject 10770 has no extreme counts.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-147
133
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In conclusion, model assessment is a critical part of model building. Residual and influence statistic plots
can indicate whether you have a misspecified model, and can assist you in detecting erroneous data
or important new findings. If the objectives of your study are to obtain accurate inferences of the fixed
effects in your model, then the normality assumptions regarding the random effects are not important.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-148 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Exercises
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.6 Chapter Summary 2-149
The general linear mixed model can easily be fitted to longitudinal data. The model assumes that
the vector of repeated measurements on each subject follows a linear regression model where some
of the regression parameters are population-specific (fixed-effects) whereas other parameters are subject-
specific (random-effects). The subject-specific regression coefficients reflect how the response evolves
over time for each subject.
Estimation is more difficult in the mixed model than in the general linear model. Not only do you have
fixed effects as in the general linear model, but you also have to estimate the covariance matrix
of the random effects, and the covariance matrix of the random errors. Ordinary least squares is no longer
the best method because the distributional assumptions regarding the random error terms are too
restrictive. Generalized least squares is used because it takes into account the covariance structures
of the random effects and random errors.
PROC MIXED implements two likelihood-based methods to estimate the covariance parameters:
maximum likelihood (ML) and restricted maximum likelihood (REML). The difference between ML
and REML is the construction of the likelihood function. However, the two methods are asymptotically
equivalent and often give very similar results. The distinction between ML and REML becomes important
only when the number of fixed effects is relatively large. In that case, the comparisons unequivocally
favor REML.
When finding reasonable estimates for the covariance structures, if you choose a structure that is too
simple, then you risk increasing the Type I error rate
complex, then you sacrifice power and efficiency.
The Kenward-Roger degrees of freedom adjustment is superior, or at worst equal, to the Satterthwaite
and default DDFM options. For the more complex covariance structures, the Type I error rate inflation is
extremely severe unless the KR adjustment is used. It is recommended that the KR adjustment be used
along with the FIRSTORDER suboption as the standard operating procedure for longitudinal models.
Longitudinal models usually have three sources of random variation. The between-subject variability
is represented by the random effects. The within-subject variability is represented by the serial
correlation. The correlation between the measurements within subject usually depends on the time
interval between the measurements and decreases as the length of the interval increases. Finally, there
is potentially also measurement error in the measurement process.
The covariance structure that is appropriate for your model is directly related to which component
of variability is the dominant component. For example, if the serial correlation among the measurements
is minimal, then the random effects probably account for most of the variability in the data
and the remaining error components have a very simple covariance structure.
After a candidate-mean model is selected, fitting the model using ordinary least squares regression
and examining the residuals might help determine the appropriate covariance structure. A function
consisting of ordinary least squares that describes the association among repeated measurements
and is easily estimated with irregular observation times is the sample variogram.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-150 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
The data values in the sample variogram are calculated from the observed half-squared differences
between pairs of residuals within individuals, where the residuals are ordinary least squares residuals
based on the mean model, and the corresponding time differences. The vertical axis in the variogram
represents the residual variability within subject over time. The scatter plot contains a smoothed
nonparametric curve, which estimates the general pattern in the sample variogram. This curve can be used
to decide whether the mixed model should include serial correlation. If a serial correlation component
is warranted, the fitted curve can be used in selecting the appropriate serial correlation function. The fitted
curve can also be used to determine whether measurement error and random effects are evident
in the model.
You can also use the information criteria (such as the AIC and BIC) produced by PROC MIXED as a tool
to help you select the most appropriate covariance structure. The smaller the information criteria value,
the better the model. However, only choose the covariance structures that make sense given the data.
For data with unequally spaced time points and different time points across subjects, only compound
symmetry and the spatial covariance structures are appropriate covariance structures. If the time points
are equally spaced, then the AR(1) and Toeplitz covariance structures could be examined. If the time
points were unequally spaced but have the same time points across subjects, then the unstructured
covariance structure could be examined.
PROC MIXED allows heterogeneity in the residual covariance parameters with the GROUP= option.
All observations having the same level of the GROUP effect have the same covariance parameters. Each
new level of the GROUP effect produces a new set of covariance parameters with the same structure
as the original group.
After an appropriate covariance structure is selected, model-building efforts should be directed
at simplifying the mean structure of the model. Because the model should be hierarchically well
formulated, the first step is to evaluate the interactions. One recommended approach is to eliminate the
interactions one at a time, starting with the least significant interaction. If you use the model fit statistics
such as AIC, then you must use the ML estimation method. However, after the final model is chosen, refit
the model using REML because REML estimators are superior.
When the sample variogram clearly shows that the random effects error component is much larger than
the serial correlation error component, a longitudinal model using the RANDOM statement might
be useful. These models are called random coefficient models because the regression coefficients for one
or more covariates are assumed to be a random sample from some population of possible coefficients.
In longitudinal models, the random coefficients are the subject-specific parameter estimates. Random
coefficient models are useful for highly unbalanced data with many repeated measurements per subject.
In random coefficient models, the fixed effect parameter estimates represent the expected values
of the population of intercepts and slopes. The random effects for intercept represent the difference
between the intercept for the ith subject and the overall intercept. The random effects for slope represent
the difference between the slope for the ith subject and the overall slope. Random coefficient models also
have a random error term for the within-subject variation.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.6 Chapter Summary 2-151
In PROC MIXED, you can compute predicted response values using empirical best linear unbiased
predictions (EBLUPs). These predictions can be interpreted as a weighted mean of the population average
profile and the observed data profile. If the residual variability is large in comparison to the between-
subject variability, more weight is given to the overall average profile compared to the observed data.
However, if the residual variability is small in comparison to the between-subject variability, more weight
will be given to the observed data profile.
You can also fit a model in PROC MIXED with both the RANDOM and REPEATED statements.
However, this model is generally not recommended in practice. These models tend to have convergence
and estimation problems, especially with complex covariance structures.
The purpose of model diagnostics is to compare the data with the fitted model to highlight any systematic
discrepancies. Conditional residual plots can be used to detect outliers and whether the random effects are
properly selected. Marginal residual plots can be used to diagnose whether you selected the fixed effect
part of the model properly. Model diagnostics are especially important in linear mixed models because
likelihood-based estimation methods are particularly sensitive to unusual observations.
If the model is correctly specified and the covariance structure is appropriate, then the violation
of the normality assumption of the random effects has little effect on the estimation of the fixed effect
parameter estimates and their standard errors. However, violation of the normality assumption
of the random effects clearly affects the standard errors and parameter estimates of the random effects.
General form of the MIXED procedure:
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-152 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
2.7 Solutions
Solutions to Exercises
1. Fitting a General Linear Mixed Model
a. Fit a general linear mixed model with the three main effects, the three two-factor interactions,
and the quadratic and cubic effects of hours. Request the parameter estimates and the Kenward-
Roger method for computing the degrees of freedom. In the REPEATED statement, request
the unstructured covariance structure and the R matrix along with the correlations computed from
the R matrix.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours
/ solution ddfm=kr;
repeated / type=un subject=patient r rcorr;
title 'Longitudinal Model with Unstructured Covariance Structure';
run;
Model Information
drug 3 a b p
Dimensions
Covariance Parameters 15
Columns in X 15
Columns in Z 0
Subjects 24
Max Obs per Subject 5
Number of Observations
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-153
Iteration History
0 1 810.76784735
1 2 736.70254878 0.00009683
2 1 736.67527615 0.00000056
3 1 736.67512447 0.00000000
Fit Statistics
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-154 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
14 74.09 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
1) The unstructured covariance structure is legitimate for this example because the time intervals
are the same across patients.
2) The R matrix represents the residual covariance matrix. The value in row 1 and column 1
represents the variance of the first measurement. The value in row 2 and column 2 represents
the variance of the second measurement. The value in row 1 and column 2 represents
the covariance of the first and second measurements.
3) The R correlation matrix consists of the correlations of the measurements within patient. It
seems that the autocorrelations decrease over time, especially early in the clinical trial.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-155
4) The null model likelihood ratio test compares the fitted model to a model with an independent
covariance structure. The test is significant, which indicates that the unstructured covariance
structure does a better job modeling the residual error compared to the independent
covariance structure.
Model Information
drug 3 a b p
Dimensions
Covariance Parameters 2
Columns in X 15
Columns in Z 0
Subjects 24
Max Obs per Subject 5
Number of Observations
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-156 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Iteration History
0 1 810.76784735
1 1 754.42562477 0.00000000
CS patient 49.5195
Residual 28.1857
Fit Statistics
1 56.34 <.0001
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-157
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
1) The compound symmetry covariance structure can be used with any longitudinal data because
the covariance structure assumes equal correlations regardless of the time interval. Therefore,
it can handle equally or unequally spaced time intervals. However, it is usually a poor choice
because the correlations usually decrease with an increasing time interval.
2) The AICC statistic is much lower for the model with the compound symmetry covariance
structure because the penalty is much less severe. The model with the unstructured
covariance structure is estimating 15 covariance parameters while the model with the
compound symmetry covariance structure is estimating only 2. Obviously the 13 extra
covariance parameters do not add much to the model fit.
3) The model with the compound symmetry covariance structure has two higher-order terms
significant at the .05 level and the model with the unstructured covariance structure had no
significant higher-order terms at this alpha level. The differences between the two models
regarding inference are due to the fact that the unstructured covariance structure is probably
too complex for the longitudinal data in this example. Therefore, you sacrifice power and
efficiency. However, the compound symmetry covariance structure is probably too simple for
the longitudinal data in this example. Thus, you increase the Type I error.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-158 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
c. Fit the same model but with the spatial power covariance structure. Because you are using the
spatial power covariance structure, and add a measurement error component and use the
FIRSTORDER suboption.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours
/ solution ddfm=kr(firstorder);
repeated / type=sp(pow)(hours) local subject=patient r rcorr;
title 'Longitudinal Model with Spatial Power Covariance Structure';
run;
Model Information
drug 3 a b p
Dimensions
Covariance Parameters 3
Columns in X 15
Columns in Z 0
Subjects 24
Max Obs per Subject 5
Number of Observations
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-159
Iteration History
0 1 810.76784735
1 2 774.20384832 1.92981521
2 4 761.08247978 1.03422286
3 4 759.04559900 2.98907683
4 1 758.17432643 0.31771101
5 1 757.90332970 0.01420522
6 2 756.59661708 0.03981076
7 2 754.17270042 0.18554890
8 2 750.40180026 0.11827784
9 2 748.22207185 0.00199477
10 2 747.61827563 0.00005801
11 1 747.60224474 0.00000001
Fit Statistics
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-160 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
2 63.17 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
1) The variance plus the residual is an estimate of the variance of the measurements.
The LOCAL option adds an additional variance parameter, which in general adds an
observational error to the time series structure. The spatial power parameter estimate becomes
a correlation coefficient when it is raised to the power of the value of the time interval.
2) The AICC statistic is lower because the spatial power covariance structure is a better fit
to the residual error. The unstructured covariance structure is too complex while the
compound symmetry covariance structure is too simple.
3) No higher-order terms are significant for this model. These inferences differ from the model
with the compound symmetry covariance structure because using the compound symmetry
covariance structure inflated the Type I error.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-161
%variogram(data=long.heartrate,resvar=heartrate,clsvar=drug,
expvars=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours,id=patient,
time=hours,maxtime=5);
%variance(data=long.heartrate,id=patient,resvar=heartrate,
clsvar=drug,expvars=hours drug baseline hours*drug
hours*baseline drug*baseline hours*hours hours*hours*hours,
subjects=24,maxtime=5);
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-162 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
1) The sample variogram clearly shows that the heart rate data have some measurement error,
a relatively small error component dealing with serial correlation, and a relatively large error
component dealing with the random effects.
2) The LOCAL option should be used along with a covariance structure that allows
the correlations to change over unequal time intervals. Random coefficients models might
be useful also.
b. Plot the autocorrelation function by time interval using PROC SGPLOT with the penalized
B-spline curve
data varioplot;
set varioplot;
autocorr=1-(variogram/64.31);
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-163
1) The autocorrelation function clearly decreases over time and it does not decrease to 0 within
the range of the data. This supports the recommendation that a covariance structure that
handles serial correlation is needed in the model.
c. Generate a graph of the model fit statistics by covariance structure. Select the following
covariance structures: compound symmetry, unstructured, spatial power, spatial exponential,
spatial Gaussian, spatial spherical, and spatial linear. Use ODS to save the model fit statistics
and graph the AIC, AICC, and BIC statistics.
ods select none;
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=cs subject=patient;
ods output fitstatistics=csmodel;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-164 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
data model_fit;
length model $ 7 type $ 4;
set csmodel (in=cs)
unstmodel (in=un)
powmodel (in=pow)
linmodel (in=lin)
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-165
gaumodel (in=gau)
expmodel (in=exp)
sphmodel (in=sph);
if substr(descr,1,1) in ('A','B');
if substr(descr,1,3) = 'AIC' then type='AIC';
if substr(descr,1,4) = 'AICC' then type='AICC';
if substr(descr,1,3) = 'BIC' then type='BIC';
if cs then model='CS';
if un then model='UNSTR';
if pow then model='SpPow';
if lin then model='SpLin';
if exp then model='SpExp';
if gau then model='SpGau';
if sph then model='SpSph';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-166 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
1) The spatial exponential, spatial power, and spatial spherical covariance structures appear
to model the residual error the best.
3. Developing and Interpreting Models
a. Reduce the mean model by eliminating unnecessary higher-order terms. Use the ML estimation
method and the spatial exponential covariance structure. Also add a measurement error
component and use the FIRSTORDER suboption. Use the p-values of the effects along with
the AICC statistic to decide which terms to eliminate. Do not eliminate the main effects.
proc mixed data=long.heartrate method=ml;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours
/ solution ddfm=kr(firstorder);
repeated / type=sp(exp)(hours) local subject=patient;
title 'Longitudinal Model with Spatial Exponential '
'Covariance Structure';
run;
Partial Output
Covariance Parameter Estimates
Fit Statistics
2 59.27 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-167
Num Den
Effect DF DF F Value Pr > F
Fit Statistics
The AICC statistic went down compared to the last model (807.2 versus 812.1).
Null Model Likelihood Ratio Test
2 60.36 <.0001
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-168 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Fit Statistics
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-169
2 59.08 <.0001
The AICC continues to decrease from the last model (806.4 versus 807.2).
Solution for Fixed Effects
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Fit Statistics
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-170 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
2 58.47 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-171
Fit Statistics
The AICC statistic increases from the last model (806.9 versus 806.4). Although the AICC statistic
increases, evaluate only the main effects model. If the AICC statistic is higher for the main effects model
compared to the model with the cubic effect of hours, then the quadratic and cubic effect of hours will
remain in the final model.
Solution for Fixed Effects
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-172 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Partial Output
Covariance Parameter Estimates
Fit Statistics
2 56.42 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
1) The AICC statistic is the smallest of any model (804.8 versus 806.4). Therefore, the main
effects model will be the final model. Furthermore, none of the higher-order terms are
significant in the reduced models.
b. For the reduced model, generate another graph of the model fit statistics by covariance structure.
Use the REML estimation method and only select the five spatial covariance structures.
ods select none;
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline;
repeated / type=sp(pow)(hours) local subject=patient;
ods output fitstatistics=powmodel;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-173
data model_fit1;
length model $ 7 type $ 4;
set powmodel (in=pow)
linmodel (in=lin)
gaumodel (in=gau)
expmodel (in=exp)
sphmodel (in=sph);
if substr(descr,1,1) in ('A','B');
if substr(descr,1,3)='AIC' then type='AIC';
if substr(descr,1,4)='AICC' then type='AICC';
if substr(descr,1,3)='BIC' then type='BIC';
if pow then model='SpPow';
if lin then model='SpLin';
if exp then model='SpExp';
if gau then model='SpGau';
if sph then model='SpSph';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-174 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
1) The spatial exponential covariance structure is still one of the best fits.
2) The spatial power covariance structure appears to be a relatively poor fit in the reduced model
compared to its fit for the complex mean model.
c. Refit the reduced model using the REML estimation method and the spatial exponential
covariance structure. Also request the correlations from the R matrix and the parameter estimates
for the fixed effects.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline / solution ddfm=kr (firstorder);
repeated / type=sp(exp)(hours) local subject=patient rcorr;
title 'Reduced Model with Spatial Exponential '
'Covariance Structure';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-175
Model Information
drug 3 a b p
Dimensions
Covariance Parameters 3
Columns in X 6
Columns in Z 0
Subjects 24
Max Obs per Subject 5
Number of Observations
Iteration History
0 1 837.26507957
1 3 785.92913529 0.02297363
2 1 785.86160625 0.00010724
3 1 785.86105917 0.00000068
4 1 785.85806809 0.00000064
5 1 785.82588950 0.00000074
6 2 784.63431676 0.01037039
7 2 782.40898562 0.03544906
8 2 779.55195994 0.08082419
9 2 777.85987594 0.00284395
10 1 776.84834470 0.00088709
11 1 776.54727936 0.00014216
12 1 776.50260059 0.00000542
13 1 776.50102828 0.00000001
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-176 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Fit Statistics
2 60.76 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-177
1) The parameter estimate for hours indicates that for every one-unit increase in hours, the heart
rate decreases 7.218. The linear effect of hours is significant at the .05 level. The parameter
estimates for drug are contrasts of drug a to the placebo and drug b to the placebo. Both
parameter estimates are not significant. Finally, the parameter estimate for baseline indicates
that for every one-unit increase in baseline, the heart rate increases 0.5594. The linear effect
of baseline is significant.
a. Fit a random coefficient model with a random intercept and hours. Specify the fixed effects as
hours, drug, and baseline. Use an unstructured covariance structure and print out the G matrix,
the correlation matrix based on the V matrix, the parameter estimates for the fixed effects, and the
parameter estimates for the random effects. Use the Kenward-Roger method for computing
degrees of freedom.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline / solution ddfm=kr;
random intercept hours / solution type=un subject=patient g vcorr;
title 'Random Coefficients Model for Heart Rate Data';
run;
Model Information
drug 3 a b p
Dimensions
Covariance Parameters 4
Columns in X 6
Columns in Z per Subject 2
Subjects 24
Max Obs per Subject 5
Number of Observations
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-178 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Iteration History
0 1 837.26507957
1 2 779.82841658 0.00000027
2 1 779.82833922 0.00000000
Estimated G Matrix
Fit Statistics
3 57.44 <.0001
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-179
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Std Err
Effect Subject Estimate Pred DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-180 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Num Den
Effect DF DF F Value Pr > F
1) The G matrix consists of the variances and covariances of the random effects. The value in
column 1 and row 1 represents the variance of the intercepts. The value in column 2 and row
2 represents the variance of the slopes for hours. The value in column 2 and row 1 represents
the covariance of the intercepts and the slopes of hours.
The information gleaned from the G matrix is that the intercepts and the slopes for hours are
negatively correlated.
2) The residual covariance estimate represents the error that remains after the fixed effects
and random effects are accounted for. This will be modeled by the R matrix, which has
an independent covariance structure.
3) The AICC statistic is slightly larger than the reduced model in the last exercise (788.2 versus
782.7).
4) The correlations from the V matrix appear to decrease at a slower rate when compared
to the correlations from the R matrix from the reduced model in the last exercise.
5) The parameter estimates for the random effects represents deviations from the fixed effects.
Therefore, subject 1 deviates –0.5824 from the population intercept and 6.2144 from
the population slope for hours. The equation for subject 1 (who is taking the placebo)
is Y = 35.46 – 0.8129 * hours + 0.5434 * baseline.
b. Fit a model with both the REPEATED and RANDOM statements. Specify a random intercept
and hours, and use the unstructured covariance structure. Print the G matrix, the correlation
matrix based on the V matrix, and the parameter estimates for the fixed effects. Specify the
spatial exponential covariance structure for the R matrix, add a measurement error component,
and use the FIRSTORDER suboption.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline / solution ddfm=kr (firstorder);
random intercept hours / type=un subject=patient g vcorr;
repeated / type=sp(exp)(hours) local subject=patient;
title 'Model with REPEATED and RANDOM statements for '
'Heart Rate Data';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-181
Model with REPEATED and RANDOM statements for Heart Rate Data
Model Information
drug 3 a b p
Dimensions
Covariance Parameters 6
Columns in X 6
Columns in Z per Subject 2
Subjects 24
Max Obs per Subject 5
Number of Observations
Iteration History
0 1 837.26507957
1 4 779.85976042 0.00000144
2 1 779.82838462 0.00000000
Estimated G Matrix
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-182 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Fit Statistics
4 57.44 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-183
1) The first covariance parameter estimate represents the variance of the intercepts. The second
covariance parameter estimate represents the covariance of the intercepts and the linear effect
of hours. The third covariance parameter estimate represents the variance of the linear effect
of hours. Adding the fourth and sixth estimates represents the variance of the residuals
in the spatial exponential covariance structure. Finally, the fifth estimate is the parameter
estimate in the spatial exponential covariance structure, which is used to compute the
correlations within subject.
2) The correlations in the V matrix show very little change from the random coefficients model.
3) Because the parameter estimate for the spatial exponential covariance structure is essentially
0, the REPEATED statement is not needed. The AICC statistic also increased from the
random coefficients model (790.4 versus 788.2). The final Hessian matrix is also not positive
definite. Therefore, this model is an inferior model.
4) The inferences for the fixed effects have not changed.
5. Assessing the Model
a. Fit a repeated measures model with the main effects and use the spatial exponential covariance
structure with the local option. Specify plots of the likelihood distances, the PRESS statistics,
influence statistics, and marginal residuals (student, Pearson, and scaled) using the MARGINAL
and BOX residual plot options. Use iterative analysis and set the maximum number of iterations
to 5 and use the FIRSTORDER suboption
proc mixed data=long.heartrate plots=(distance press
studentpanel(marginal box) pearsonpanel(marginal box)
vcirypanel(box));
class drug patient;
model heartrate=hours drug baseline / solution ddfm=kr (firstorder)
influence(effect=patient iter=5) vciry ;
repeated / type=sp(exp)(hours) local subject=patient;
title 'Reduced Model with Spatial Exponential Covariance '
'Structure';
run;
Model Information
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-184 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
drug 3 a b p
patient 24 201 202 203 204 205 206 207
208 209 210 211 212 214 215
216 217 218 219 220 221 222
223 224 232
Dimensions
Covariance Parameters 3
Columns in X 6
Columns in Z 0
Subjects 24
Max Obs per Subject 5
Number of Observations
Iteration History
0 1 837.26507957
1 3 785.92913529 0.02297363
2 1 785.86160625 0.00010724
3 1 785.86105917 0.00000068
4 1 785.85806809 0.00000064
Iteration History
5 1 785.82588950 0.00000074
6 2 784.63431676 0.01037039
7 2 782.40898562 0.03544906
8 2 779.55195994 0.08082419
9 2 777.85987594 0.00284395
10 1 776.84834470 0.00088709
11 1 776.54727936 0.00014216
12 1 776.50260059 0.00000542
13 1 776.50102828 0.00000001
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-185
Fit Statistics
2 60.76 <.0001
Standard
Effect drug Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-186 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-187
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-188 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-189
Cook's
Number of D
Observations PRESS Cook's Cov
patient in Level Iterations Statistic D MDFFITS COVRATIO COVTRACE Parms
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-190 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
RMSE
MDFFITS without Restricted
Cov COVRATIO COVTRACE deleted Likelihood
patient Parms CovParms CovParms level Distance
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-191
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-192 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-193
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-194 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
11
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Why is ordinary least squares not the preferred estimation method for fixed
effects in general linear mixed models?
a. Ordinary least squares does not support random effects.
b. Ordinary least squares does not support correlated error terms.
c. Ordinary least squares does not support nonnormal distribution of error
terms.
d. Both a and b.
18
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-195
24
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
31
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-196 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
39
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
50
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-197
What can you conclude if the intercept of the fitted nonparametric curve in
the sample variogram has values much greater than 0?
a. Serial correlation error needs to be addressed in the covariance
structure.
b. Measurement error needs to be addressed in the covariance structure.
c. Random effects error needs to be addressed in the covariance structure.
d. It is irrelevant because the slope of the fitted nonparametric curve
determines the source of the error component.
57
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
63
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-198 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
67
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
76
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-199
81
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
97
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-200 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
When is the V matrix the same in the random coefficient model and a model
with the REPEATED statement and several time points?
a. Random coefficient model has a random intercept and slope, and the
repeated model has spatial power covariance structure.
b. Random coefficient model has a random intercept and slope, and the
repeated model has compound symmetry covariance structure.
c. Random coefficient model has only a random intercept, and the
repeated model has compound symmetry covariance structure.
d. Random coefficient model has only a random intercept, and the
repeated model has spatial power covariance structure.
102
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
107
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-201
What covariance structure does the R matrix have in the first random
coefficient model?
a. Unstructured
b. Independent
c. Compound symmetry
d. Spatial power
115
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
131
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-202 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 3 Longitudinal Data
Analysis with Discrete Responses
Exercises............................................................................................................. 3-57
Demonstration: Fitting Generalized Linear Mixed Models with Splines .......................... 3-76
Exercises............................................................................................................. 3-81
Exercises............................................................................................................3-115
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-3
Objectives
3
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
4
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-4 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Longitudinal models fit in the MIXED procedure have the assumption that the conditional responses are
normally distributed. However, the normality assumption might not always be reasonable, especially
when the response variable is discrete. Therefore, generalized linear mixed models will be used to analyze
nonnormal responses. For example, longitudinal data with response variables that are binary or discrete
counts can now be modeled using these models.
Generalized linear mixed models have the flexibility to model random effects
and correlated errors for nonnormal data.
• A linear predictor can contain random effects.
• The random effects are normally distributed.
• The conditional mean relates to the linear predictor through a link function:
g (( y | ))
5
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Generalized linear mixed models can model data from an exponential family of distributions, as well
as models with random effects. In these models, you apply a link function to the conditional mean E(y|)
where are the random effects. The conditional distribution of y| plays the same role as the distribution
of y in the fixed-effects generalized linear model.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-5
6
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
If there are no random effects, PROC GLIMMIX fits generalized linear models. In these models, PROC
GLIMMIX estimates the parameters by maximum likelihood, restric ted maximum likelihood,
or quasi-likelihood. Maximum likelihood and restricted maximum likelihood have been discussed earlier.
Quasi-likelihood will be discussed in a later section.
g ( E ( yi )) 0 1 x1i k xk i
• The model relates the expected value of the response variable to the linear
predictor through a link function.
• The variance of the response variable is a specified function of its mean.
• The distribution of the response variable can come from a family of
exponential distributions.
7
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-6 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
To understand generalized linear mixed models, you need to have an understanding of generalized linear
models. These models extend the general linear model in several ways.
1. The distribution of the response variable can come from a family of exponential distributions rather
than just the normal distribution. The exponential family comprises many of the elementary discrete
and continuous distributions.
2. The link function allows a wide variety of response variables to be modeled rather than just
continuous response variables. For example, if the mean of the data is naturally restricted to a range
of values such as a proportion, the appropriate link function will ensure that the predicted values are
within the appropriate range.
3. The variance can be a specified function of the mean rather than just being constant.
Generalized linear models can also be fit using the GENMOD procedure. This procedure will
be shown in a later section.
8
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Generalized linear models have three components (McCullagh and Nelder 1989):
random component identifies the response variable and its probability distribution
systematic component specifies the predictor variables used in a linear predictor
link function specifies the function of E(Y) that the model equates to the systematic
component.
For the general linear model, the link function is the identity link (modeling the mean), the response
variable is normally distributed, and the variance is constant. For logistic regression, the link function
is the logit link ( g (u ) log( ) ) and the response variable follows a binomial distribution
1
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-7
(a common distribution for binary outcomes). For Poisson regression, the link function is the natural log
and the response variable follows the Poisson distribution.
Each distribution in the exponential family has a natural location parameter, θ. For each distribution, there
exists a link function to transform the linear predictor to θ. This link function is called the canonical link.
For example, in the normal distribution the natural location parameter is the mean. Models with canonical
links usually make the best sense on mathematical grounds, but you can choose other link functions
besides the canonical links.
The reason for restricting the distribution of the response variable to the family of exponential
distributions is that the same algorithm to compute maximum likelihood parameter estimates
applies to this entire family for any choice of monotonic and differentiable link function.
9
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The slide above shows the relationships between the general linear model (GLM), general linear mixed
model (GLMM), generalized linear model (GzLM), and generalized linear mixed model (GzLMM).
General linear models assume normal data, and can be viewed as a special case of generalized linear
models, which can be used to model data from an exponential family of distributions.
Generalized linear models cannot accommodate random effects, and can be viewed as a special case
of generalized linear mixed models.
Generalized linear mixed models can model data from an exponential family of distributions,
as well as models with random effects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-8 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
10
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
GLIMMIX Procedure
PRO C GLIMMIX <options>;
CLASS variables;
CO NTRAST 'label' contrast-specification </ options>;
CO VTEST <'label'> <test-specification> </ options>;
E F FECT effect-name = effect-type ( var-list < / effect-options >) ;
E S TIMATE 'label' contrast-specification </ options>;
LS MESTIMATE fixed-effect <'label'> values <divisor=n>
</ options>;
MO DEL response <(response options)>=<fixed-effects>
</ options>;
NLO PTIONS <options>;
O UTPUT <OUT=SAS-data-set> <keyword> </ options>;
PA RMS (value-list) …</ options>;
RA NDOM random-effects </ options>;
W E IGHT variable;
Programming statements…
RUN; 12
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The CONTRAST, ESTIMATE, COVTEST, and RANDOM statements can appear multiple times.
All other statements can appear only once with the exception of programming statements. The PROC
GLIMMIX and MODEL statements are required, and the MODEL statement must appear after the
CLASS statement if a CLASS statement is included.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-9
EFFECT The EFFECT statement enables you to construct special collections of columns for
design matrices. These collections are referred to as constructed effects to distinguish
them from the usual model effects that are formed from continuous or classification
variables. The name of the effect is specified after the EFFECT keyword. This name
can appear in only one EFFECT statement and cannot be the name of a variable
in the input data set. The effect-type is specified after an equal sign, followed by a list
of variables within parentheses, which are used in constructing the effect. Effect-
options that are specific to an effect-type can be specified after a slash (/) following
the variable list.
ESTIMATE provides a mechanism for obtaining custom hypothesis tests. As in the CONTRAST
statement, the basic element of the ESTIMATE statement is the contrast-
specification, which consists of MODEL and G-side RANDOM effects and their
coefficients.
LSMESTIMATE provides a mechanism for obtaining custom hypothesis tests among the least squares
means. In contrast to the hypotheses tested with the ESTIMATE or CONTRAST
statements, the LSMESTIMATE statement enables you to form linear combinations
of the least squares means, rather than linear combination of fixed-effects parameter
estimates or random-effects solutions, or both. Multiple-row sets of coefficients are
permitted.
MODEL names the dependent variable and the fixed effects. In contrast to PROC GLM, you
do not specify random effects in the MODEL statement. The dependent variable can
be specified using either the response syntax or the events/trials syntax. The
events/trials syntax is specific to models for binomial data.
NLOPTIONS allows for the specification and control of the nonlinear optimization methods.
OUTPUT creates a data set that contains predicted values and residual diagnostics, computed
after fitting the model. By default, all variables in the original data set are included
in the output data set.
PARMS specifies initial values for the covariance or scale parameters, or it requests a grid
search over several values of these parameters in generalized linear mixed models.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-10 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
RANDOM defines the Z matrix of the mixed model, the random effects in the vector, the
structure of G, and the structure of R. The random effects can be classification or
continuous effects, and multiple RANDOM statements are possible. The RANDOM
_RESIDUAL_ statement indicates a residual-type (R-side) random component that
defines the R matrix.
WEIGHT uses weights to account for the differential weighting of observations. Observations
with nonpositive or missing weights are not included in the resulting analysis.
If a WEIGHT statement is not included, all observations used in the analysis are
assigned a weight of 1.
Selected MODEL statement options:
DIST= specifies the built-in (conditional) probability distribution of the data. If you specify
the DIST= option and you do not specify a user-defined link function,
a default link function is chosen. If you do not specify a distribution, the GLIMMIX
procedure defaults to the normal distribution for continuous response variables and
to the multinomial distribution for classification or character variables, unless the
events/trial syntax is used in the MODEL statement. If you choose the events/trial
syntax, the GLIMMIX procedure defaults to the binomial distribution.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-11
13
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
As with PROC MIXED, PROC GLIMMIX has RANDOM statements, which allow for subject-specific
(conditional) inference. Other features of PROC GLIMMIX include
CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements, which produce hypothesis
tests and estimable linear combinations of effects.
NLOPTIONS statement, which enables you to exercise control over the numerical optimization.
COVTEST statement, which enables you to obtain inferences for the covariance parameters.
computed variables with SAS programming statements inside PROC GLIMMIX (except for variables
listed in the CLASS statement). These computed variables can appear in the MODEL, RANDOM,
WEIGHT, or FREQ statements.
choice of model-based variance-covariance estimators for the fixed effects or empirical (sandwich)
estimators to make the analysis robust against misspecification of the covariance structure and to adjust
for small-sample bias.
joint modeling for multivariate data.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-12 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
14
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The GLIMMIX and MIXED procedures are closely related and have some common functionality.
However, there are important differences, such as, no REPEATED statement in PROC GLIMMIX.
Furthermore, MODEL, WEIGHT, and FREQ variables, as well as variables specifying RANDOM effects,
SUBJECT= and GROUP= structures, do not have to be in the data set with PROC GLIMMIX. They can
be computed with programming statements in the procedure.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-13
Notice that both the FREQ statement and the WEIGHT statement are available in PROC GLIMMIX.
The variable in the FREQ statement identifies a numeric variable that contains the frequency of
occurrence for each observation. PROC GLIMMIX treats each observation as if it appears f times,
where f is the value of the FREQ variable for the observation. The analysis that is produced using a FREQ
statement reflects the expanded number of observations. The WEIGHT statement replaces R with
W−1/2 RW−1/2, where W is a diagonal matrix containing the weights.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-14 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
g() represents a differentiable monotonic link function and g-1 () is its inverse.
A represents a diagonal matrix and contains the square root of the variance function
of the model. The variance function expresses the variance of a response as a function
of the mean.
R represents the variance-covariance matrix of the residual effects. The residual effects are
referred to as the R-side random effects. The R matrix is, by default, the scaled identity
matrix, R=I, where is the scale parameter, and is, by definition, 1 for some
distributions (for example, binary, binomial, Poisson, and exponential distribution).
To specify a different R matrix, use the RANDOM statement with the _RESIDUAL_
keyword or the RESIDUAL option in the RANDOM statement.
17
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-15
The GLIMMIX procedure distinguishes two types of random effects. If the variance of the random effect
is contained in the matrix G, then it is called a G-side random effect. If the variance of the random effect
is contained in the matrix R, then it is called an R-side random effect. R-side effects are also called
residual effects. An R-side random effect in PROC GLIMMIX is equivalent to a REPEATED effect
in PROC MIXED. Models without G-side effects are also known as marginal (or population-averaged)
models. All random effects are specified through the RANDOM statement in PROC GLIMMIX.
The R matrix is by default the scaled identity matrix. To specify a different R matrix, use the RANDOM
statement with the _RESIDUAL_ keyword or the RESIDUAL option. To add a multiplicative
overdispersion parameter, use the _RESIDUAL_ keyword in a separate RANDOM statement.
18
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
If there are no repeated effects, use the RANDOM statement with the _RESIDUAL_ keyword to specify
the R-side random effects. The equivalent code in PROC MIXED is:
proc mixed data=long.aids;
model cd4_scale=time;
repeated / type=sp(pow)(time) subject=id;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-16 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
19
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
To specify that the time effect for each patient is an R-side effect with a spatial power covariance
structure, use the RESIDUAL option. Since continuous effects are not allowed in R-side random effects,
two versions of the time variable were created. A continuous time is used in the MODEL statement while
the classification time is used in the RANDOM statement. The equivalent code in PROC MIXED
is shown below.
proc mixed data=aids noclprint;
class timec;
model cd4_scale=time;
repeated timec / type=sp(pow)(time) subject=id;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-17
20
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Because the generalized linear mixed model is g ( | ) X Z , the G-side random effects are fit
inside the link function. In other words, they are on the linked scale, which is similar to random effects
in linear mixed models. The correlations among the repeated measures on the linked scale are
accommodated by the G-side random effects (V=ZGZ’+f(R)). G-side random effect models have subject-
specific interpretations. That is, G-side random effect models provide the model for each subject
identified in the G-side random effects.
21
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-18 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
On the other hand, the R-side random effects are fit outside the link function. The correlations among
the repeated measures outside the link function are directly modeled as long as no G-side random effects
are present. R-side random effect models have population average interpretations if there are no G-side
random effects. That is, models with only R-side random effects provide predictions for the population.
22
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The distributions can be specified using the DIST= option in the MODEL statement in PROC GLIMMIX.
Notice that the lognormal distribution is not using the likelihood function for a lognormal distribution.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-19
Instead, it assumes that for the dependent variable Y, log(Y) follows a normal distribution ( , 2 )
and the likelihood function is for the log(Y), not Y itself.
Combinations of distributions can be specified using the DIST=BYOBS(variable) option in the MODEL
statement in PROC GLIMMIX. This option enables you to model multivariate responses with different
distributions for each response variable. An example is Poisson distribution for one variable and normal
distribution for another response variable.
25
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC GLIMMIX estimates the parameters by the pseudo-likelihood method by default for models with
discrete outcomes and random effects. Two maximum likelihood estimation methods based on integral
approximation are available in the PROC GLIMMIX. The METHOD=QUAD option in the PROC
GLIMMIX statement requests that the GLIMMIX procedure approximate the marginal log likelihood
with an adaptive Gauss-Hermite quadrature. The METHOD=LAPLACE option in the PROC GLIMMIX
statement requests that the GLIMMIX procedure approximate the marginal log likelihood by using
the Laplace method.
Models with normally distributed outcomes use by default REML and models with discrete
outcomes with no random effects use by default ML.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-20 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The challenge in fitting GzLMMs is how to obtain the marginal log likelihood
function:
p(y | x, , )q( )d
difficult to have closed-form solutions
Generalized linear mixed models are much more complex than linear mixed models because of the
difficulties in obtaining marginal log-likelihood functions. For all these models, parameter estimates are
obtained by maximizing the objective function, which is the marginal log-likelihood function. For linear
mixed models with normal errors and random effects, the marginal distribution of y over all possible
levels of random effects is simply normal with a mean of X and a covariance V. The log-likelihood
is readily available based on this distribution. However, the marginal distribution of y for generalized
linear mixed models is not readily available for non-Gaussian distributions. Therefore, the challenge
in fitting a generalized linear mixed model is to obtain the marginal distribution of y, or the marginal log-
likelihood function to be maximized.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-21
27
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The first step in the pseudo-likelihood linearization method is achieved by taking the first-order Taylor
series expansions to linearize the generalized linear mixed model to linear mixed models. Taylor series
expansions enable you to use the derivatives of a function to approximate the function as a sum
of polynomials.
After the linearization, a linear mixed model P = X + Z + can be fit. P is referred to as the pseudo-
response. represents fixed effects, represents random effects, and represents residuals in the linear
mixed model with the pseudo-response P. The residuals are assumed to be normally distributed with
1 1
mean 0 and variance var() = var(P|)= Δ A RA Δ , where
Δ is a diagonal matrix of derivatives of the conditional mean evaluated at the expansion locus.
A represents a diagonal matrix of the square root of the variance function of the model.
R is variance-covariance matrix of the residual effects, or the R-side random effects.
The variance of y, conditional on random effects , is var(y|)= A RA , and the marginal variance
1 1
in the linear mixed pseudo-model is V = ZGZ ' Δ A RA Δ .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-22 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Benefits of Linearization
• Can be used to fit models where the joint distribution is difficult, if not
impossible, to ascertain.
• Can fit complex models such as models with correlated errors, a large
number of random effects, crossed random effects, and multiple types
of subjects.
28
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The class of models for which pseudo-likelihood estimation can be applied is much larger than the class
of models that maximum likelihood can be applied in PROC GLIMMIX.
Drawbacks of Linearization
29
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Because the linearization approach approximates the generalized linear mixed models as linear mixed
models, the computed likelihood is for these linear mixed models, not the original model. It is not the true
likelihood of your problem. Likelihood ratio tests that compare nested models might not be
mathematically valid and the model fit statistics should not be used for model comparisons (AIC, BIC).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-23
In addition, the normal assumption for the linearized model might not be appropriate. As a result, the
variance estimates for random effects might be biased. This is often the case when the pseudo-response
is far from normal, such as when a binary outcome has many non-events or few clusters (subjects).
30
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC GLIMMIX includes the fixed effects and all covariance parameters in the optimization when you
choose METHOD=QUAD or METHOD=LAPLACE. Both produce the maximum likelihood estimations.
Laplace estimates typically exhibit better asymptotic behavior and less small-sample bias than pseudo-
likelihood estimators. However, for both Laplace and quadrature methods, the class of models for which
the marginal log likelihood is available is much smaller compared to the class of models to which pseudo-
likelihood estimation can be applied.
The term quadrature is more or less a synonym for numerical integration, especially as applied
to one-dimensional integrals. Two-dimensional integration is sometimes described as cubature,
although this term is much less frequently used and the meaning of quadrature is understood for
higher dimensional integration, as well.
f ( x)dx wi f ( xi )
b
a
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-24 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
31
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC GLIMMIX has a dedicated algorithm for METHOD=LAPLACE, which enables a larger class
of models, such as crossed random effects, random effects with no SUBJECT= option, or subjects that
do not have to be nested as compared to the Gauss-Hermite quadrature. It also allows the NOBOUND
option. As the number of random effects increases, Laplace approximation presents a computationally
more expedient alternative.
If you wonder whether METHOD=LAPLACE would present a viable alternative to a model that you can
fit with METHOD=QUAD, the “Optimization Information” table can provide some insights. The table
contains as its last entry the number of quadrature points determined by PROC GLIMMIX to yield
a sufficiently accurate approximation of the log likelihood (at the starting values). In many cases, a single
quadrature node is sufficient. In that case, the estimates are identical to those of METHOD=LAPLACE.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-25
The COVTEST statement enables you to obtain statistical inferences for the
covariance parameters.
• fit the model using PROC GLIMMIX
• specify hypotheses about the covariance parameters in the COVTEST
statement.
The procedure will do the following:
• refit the model under the restriction on the covariance parameters
• compare -2(restricted) log likelihoods
• make p-value adjustments for testing on the boundary, if possible and necessary
32
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The COVTEST statement enables you to obtain statistical inferences for the covariance parameters
in a mixed model by likelihood-based tests comparing full and reduced models with respect to the
covariance parameters. The comparisons of the models are based on the log likelihood or restricted log
likelihood in models that are fit by maximum likelihood (ML) or restricted maximum likelihood (REML).
With pseudo-likelihood methods, the calculations are based on the final pseudo-data of the converged
optimization. Confidence limits and bounds are computed as Wald or likelihood ratio limits. You can
specify multiple COVTEST statements.
33
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-26 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The test-specification in the COVTEST statement draws on keywords that represent a particular null
hypothesis, lists or data sets of parameter values, or general contrast specifications. Valid keywords
are as follows:
GLM | INDEP tests the model against a null model of complete independence. All G-side
covariance parameters are eliminated and the R-side covariance structure is
reduced to a diagonal structure.
ZEROG tests whether the G matrix can be reduced to a zero matrix. This eliminates
all G-side random effects from the model.
Only a single keyword is permitted in the COVTEST statement. To test more complicated hypotheses,
you can formulate tests by providing the values for the reduced covariance parameters. For example,
the last example on the slide tests if the last covariance parameter, which corresponds to the slope
variance, is zero.
• When the model is estimated by ML or REML, you will get the same results
from the COVTEST statement as you would from conducting a likelihood
ratio test using the full and reduced models.
• When the model is estimated by pseudo-likelihood, you will not get the
same results with the COVTEST statement as you would from a pseudo-
likelihood ratio test based on the full and reduced models.
34
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-27
The COVTEST statement not only works for models estimated by the maximum likelihood method, but
it also works for models estimated by the pseudo-likelihood (linearization) method. However, you do not
get true likelihood ratio tests from the COVTEST statement in the latter case.
When the model is estimated by pseudo-likelihood, PROC GLIMMIX takes the pseudo-data set from
the last iteration (the converged data set) from the full model, and treats it as a linear mixed model for
the COVTEST operations. Therefore, there are no more data set updates. The log likelihood of the
constrained model is then always ordered properly, guaranteeing that the likelihood ratio test statistic
is nonnegative.
35
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-28 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Time-independent
predictor variables
37
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Example: Radial keratotomy is a form of surgery used to reduce myopia (nearsightedness). To evaluate
the long-term (10-year) efficacy and stability of the surgery, a longitudinal study of 362 adult
myopic patients was conducted. After surgery, patients were examined at 6 months and then
annually each year for 10 years. At each visit their refractive error was recorded. The concern
of the scientists is that the refractive error would continue to change over time and the patients
would become less and less nearsighted.
These are the variables in the data set:
patientid patient identification number.
visit time of follow-up visit (1=1 year, 4=4 years, 10=10 years).
unstable the outcome variable coded as 1 if there is a continuing effect of the surgery and 0
otherwise. For visit 1, a continuing effect was defined as if there was a reduction in
myopia of 0.5 diopters or more between 6 months and 1 year after surgery. For visit 4
and visit 10, a continuing effect was defined as if there was a reduction in myopia of 1
diopter or more between 6 months and 4 years after surgery for visit 4 and between 6
months and 10 years after surgery for visit 10.
diameter diameter of the clear zone during the surgery (in mm).
age patient age at baseline in years.
gender patient’s gender.
The radial keratotomy data were provided by Azhar Nizam, Senior Associate, Rollins School
of Public Health of Emory University. The data were modified based on published reports from
the NEI funded Prospective Evaluation of Radial Keratotomy Study (Waring et al. 1994)
to protect confidentiality.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-29
Analysis Strategy
38
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When building a longitudinal model with a discrete response, it is recommended to first do an exploratory
data analysis with contingency tables and logit plots. A useful contingency table would be the subject’s
identification number by the time or visit value to make sure no subject has multiple records with the
same time or visit value. Then fit a generalized linear mixed model and decide whether you want to use
R-side random effects or G-side random effects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-30 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Estimated Logits
mi 1
ln
M i mi 1
where
m i= number of events
Mi = number of cases
39
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When the response variable is binary, it is common practice to transform the vertical axis of a scatter plot
to the logit scale and plot the logit by the continuous predictor variable. For continuous predictor
variables with a large number of unique values, binning the data (collapsing data values into groups) is
necessary to compute the logit.
A common approach in computing logits is to take the log of the odds. However, the logit is undefined for
any bin in which the outcome rate is 100% or 0%. To eliminate this problem and reduce the variability
of the logits, a common recommendation is to add a small constant to the numerator and denominator
of the formula that computes the logit (Duffy and Santner 1989).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-31
Example: Generate a line listing of the keratotomy data and logit plot of age.
/* long03d01.sas */
proc print data=long.keratotomy(obs=20);
title 'Line Listing of Keratotomy Data';
run;
Notice that there is one observation per time point. Also notice that the variables age, diameter,
and gender are time-independent variables and visit is a time-dependent variable. Finally, notice that
the values of visit are in the proper order within each patient. If the values of visit are not in the proper
order or if there are missing time points for some patients, then visit (or a copy of visit) will have
to be specified in the CLASS statement and specified in the RANDOM statement as a repeated effect
in PROC GLIMMIX.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-32 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
data bins;
set bins;
logit=log((unstable+1)/(_freq_-unstable+1));
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-33
There seems to be no relationship between the probability of the continuing effect of the surgery
and the age of the patient.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-34 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
logit -0.5
-1.0
3.0 3.5 4.0
Diameter
41
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The variable diameter has a linear relationship with the logits. It seems that the patients with smaller
clear zones (they received more surgery) have a higher probability of having a continuing effect
of the surgery.
logit
-1.0
-2.0
1 4 10
Visit
42
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-35
The variable visit might have a linear relationship with the outcome; it is also possible that the
relationship might be curvilinear. It seems that the longer the follow-up time from the surgery, the higher
the probability of a continuing effect of the surgery. Therefore, the refractive error continues to change as
a result of the surgery and the patients become less and less nearsighted. This is of medical concern
because beyond a certain point, being less nearsighted means becoming farsighted. Because people tend
to become farsighted, as they get older, the continuing effect of the surgery might be accelerating this
process.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-36 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Example: Fit a generalized linear mixed model to the long.keratotomy data. Specify R-side
random effects, a binary distribution, and use the unstructured covariance structure.
Specify the ODDSRATIO option in the MODEL statement and create customized odds
ratios comparing male to female for gender, 2 to 1 for visit, and 3 to 4 for diameter. Use
an optimization technique of Newton-Raphson with ridging, and create an ODDSRATIO
plot displaying the statistics and a box plot for gender.
/* long03d02.sas */
proc glimmix data=long.keratotomy noclprint=5
plots=(oddsratio(stats) boxplot(fixed));
class patientid gender;
model unstable(event='1') = age diameter gender visit
/ solution ddfm=kr dist=binary
or(diff=first
at visit diameter =1 4
units diameter = -1);
random _residual_ / subject = patientid type=un;
nloptions tech=nrridg;
title 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
run;
Selected PROC GLIMMIX statement option:
PLOTS= requests that the GLIMMIX procedure produce statistical graphics via the Output
Delivery System.
NOCLPRINT suppresses the display of the Class Level Information table. If you specify a number, only
levels with totals that are less than that number are listed in the table.
Selected plot options:
BOXPLOT requests box plots for the residuals in your model by the classification effects only.
The FIXED box plot option produces box plots for all fixed effects consisting entirely
of classification variables.
ODDSRATIO requests a display of odds ratios and their confidence limits when the link function
permits the computation of odds ratios. The STATS odds ratio plot option adds
the numeric values of the odds ratio and its confidence limits to the graphic.
Selected MODEL statement response variable option:
EVENT= specifies the event category for the binary response model.
Selected MODEL statement options:
SOLUTION requests that a solution for the fixed-effects parameters be produced.
DIST= specifies the built-in (conditional) probability distribution of the data.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-37
DDFM= specifies the method for computing the denominator degrees of freedom for the tests
of fixed effects resulting from the MODEL, CONTRAST, ESTIMATE, LSMEANS, and
LSMESTIMATE statements. The keyword KR specifies the Kenward-Roger adjustment.
OR requests estimates of odds ratios and their confidence limits provided the link function
is either the logit, cumulative logit, or generalized logit.
Model Information
The Model Information table summarizes important information about the model that you fit and about
aspects of the estimation technique. The marginal variance matrix is block-diagonal, and observations
from the same PATIENTID form the blocks. The default estimation technique in generalized linear mixed
models is residual pseudo-likelihood, for distributions other than the normal.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-38 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The Class Level Information table lists the levels of the variables specified in the CLASS statement
and the ordering of the levels. The patientid levels have been suppressed because there are more than 5
patientid levels. The Number of Observations table displays the number of observations read and used
in the analysis. There are 362 patients in the study, so 6 patients were dropped because they had missing
values in every observation.
Response Profile
Ordered Total
Value unstable Frequency
1 0 634
2 1 412
The Response Profile table shows the response variable values listed according to their ordered values.
By default, the response variable values are ordered alphanumerically and PROC GLIMMIX models
the probability of ordered value 1. Because you used the EVENT=option, in this example, the model
is based on the probability of having a continuing effect of the surgery (unstable=1).
Dimensions
The Optimization Information table provides information about the methods and size of the optimization
problem.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-39
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
The Iteration History table displays information about the progress of the optimization process. After the
initial optimization, PROC GLIMMIX performed 10 updates before the convergence criterion was met.
Fit Statistics
The ratio of the generalized chi-square statistic and its degrees of freedom is a measure of the residual
variability in the linearized pseudo model. It is not a useful measure for model assessment under pseudo-
likelihood estimation.
Covariance Parameter Estimates
Cov Standard
Parm Subject Estimate Error
The Covariance Parameter Estimates table displays estimates and asymptotic estimated standard errors for
all covariance parameters. Since R-side random effects are being used, the estimates represent the
variances and covariances of the measurements on the logit scale.
Solutions for Fixed Effects
Standard
Effect gender Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-40 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The Solutions for Fixed Effects table displays the parameter estimates for the fixed effects in the model.
The results show that diameter, gender, and visit are all significant at the 0.05 significance level.
The parameter estimates are on the logit scale.
Odds Ratio Estimates
95% Confidence
gender age diameter visit _gender _age _diameter _visit Limits
The Odds Ratio Estimates table lists the variables and their values that are used in the computation
of the odds ratio, the estimate of the odds ratio, the degrees of freedom, and the 95% confidence limits for
the estimate of the odds ratio. By default, the reference values for continuous variables are the average
values. The reference level for gender is female because the option DIFF=FIRST was used. The first row
of the table compares age 34.964 (value in the numerator for the odds ratio) to age 33.964 (average value)
holding gender, diameter, and visit constant. The odds ratio is 1.011 with a 95% confidence interval
of 0.987 to 1.035. The second row of the table compares diameter 3 to diameter 4 (the reference value
was specified in the odds ratio AT option and the one unit decrease was specified in the odds ratio UNIT
option) holding gender, visit, and age constant. The odds ratio is 3.374 with a 95% confidence interval
of 2.153 to 5.288. The third row compares visit 2 to visit 1 holding the other variables constant. The odds
ratio is 1.401 with a confidence interval of 1.339 to 1.466. Finally, the fourth row of the table compares
gender Male to gender Female holding the other variables constant. The odds ratio is 1.763 with
a confidence interval of 1.233 to 2.521.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-41
The odds ratio plot illustrates the odds ratios from the last table along with the 95% confidence limits.
When the line segment representing the confidence interval crosses 1, the odds ratio is not significant.
If higher order terms were in the model such as interactions or polynomials, the odds ratios
computed with the ODDSRATIO option would take the higher order terms into account.
Type III Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
The Type III Tests of Fixed Effects table displays significance tests for the fixed effects in the model.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-42 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The box plot for gender shows a few extreme Pearson residuals. Females exhibit more extreme positive
outliers while males exhibit more extreme negative outliers.
Example: Fit a generalized linear mixed model to the long.keratotomy data set using G-side random
effects, the method of adaptive Gauss-Hermite quadrature, and the between-within degrees of
freedom adjustment. Specify the intercept as the random effect and use a binary distribution.
Use an optimization technique of Newton-Raphson with ridging, use the COVTEST statement
to test whether the G matrix can be reduced to a zero matrix, and create an output data set with
the EBLUPs and XBETAs.
proc glimmix data=long.keratotomy noclprint=5 method=quad;
class patientid gender;
model unstable(event='1') = age diameter gender visit
/ solution dist=binary ddfm=bw;
random intercept / subject = patientid;
nloptions tech=nrridg;
covtest "H0: No random effects" zerog;
output out=predict pred(blup ilink)=eblup
pred(noblup ilink)=xbeta;
title 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-43
The term quadrature is more or less a synonym for numerical integration, especially
as applied to one-dimensional integrals. Two-dimensional integration is sometimes described
as cubature, although this term is much less frequently used and the meaning of quadrature
is understood for higher dimensional integration as well.
The default pseudo-likelihood estimation method for models containing random effects is RSPL.
This is the acronym for the residual subject-specific pseudo-likelihood method. The other three
methods are MSPL, RMPL, and MMPL. The first letter determines whether estimation is based
on a residual likelihood (R) or a maximum likelihood (M). The second letter identifies the
expansion locus for the linearization, which can be the vector of random effects solutions (S) or
the mean of the random effects (M).
In models for normal data with identity link, METHOD=RSPL and METHOD=RMPL are
equivalent to restricted maximum likelihood estimation, and METHOD=MSPL and
METHOD=MMPL are equivalent to maximum likelihood estimation.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-44 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Model Information
The between-within degrees of method is used because the Kenward-Roger degrees of freedom method
cannot be used with the maximum likelihood estimation methods.
Class Level Information
Response Profile
Ordered Total
Value unstable Frequency
1 0 634
2 1 412
Dimensions
Optimization Information
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-45
The Optimization Information table shows the number of quadrature points chosen by the procedure for
the numerical integration calculations. Based on the algorithm used, five nodes are determined by PROC
GLIMMIX to be sufficient for the quadrature. If you want to have a larger number of quadrature points,
you can use the QPOINTS= suboption in the METHOD=QUAD option.
Recall that quadrature provides an approximation of the definite integral of a function. This
is usually stated as a weighted sum of function values at specified points within the domain
of integration. These specified points are known as the quadrature points.
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 11 1080.8513468 . 565.5695
1 0 13 1077.5642995 3.28704729 1098.838
2 0 9 1058.1407776 19.42352192 236.1398
3 0 9 1051.1727897 6.96798789 63.87405
4 0 9 1049.5098427 1.66294704 14.30124
5 0 9 1049.3679612 0.14188149 1.414772
6 0 9 1049.3664725 0.00148871 0.01517
7 0 9 1049.3664723 0.00000020 0.00002
Fit Statistics
The Fit Statistics table lists information about the fitted model. PROC GLIMMIX computes various
information criteria, which typically apply a penalty to the (possibly restricted) log likelihood, log
pseudo-likelihood, or log quasi-likelihood: this penalty depends on the number of parameters or
the sample size, or both. The consistent AIC (CAIC) is an extension of the AIC and it was derived in
order to make the AIC asymptotically consistent and to penalize overparameterization more stringently.
The Hannan-Quinn information criterion (HQIC) has a penalty term that is between the AIC and the BIC.
Fit Statistics for Conditional Distribution
The fit statistics for conditional distribution are useful for evaluating the fixed effect model.
If the variance function, the model, and the random effects structure are correctly specified, the Pearson
Chi-Square/DF value should be close to 1. Even under correct specifications, there will be some
variations about the value 1. However, if this value is large, then you must fix something about your
model. It might be the conditional distribution of the response variable, the fixed effects, or the random
effects specified in your model might need revising.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-46 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Standard
Cov Parm Subject Estimate Error
Standard
Effect gender Estimate Error DF t Value Pr > |t|
Although the parameter estimates are different from the model fit by the pseudo-likelihood method,
the inferences are approximately the same.
Type III Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
Common questions in mixed modeling are whether variance components are zero, whether random
effects are independent, and whether rows (columns) can be added or removed from an unstructured
covariance matrix. The likelihood ratio chi-square test indicates that the “no random effects model”
is rejected. The model with random effects fits your data better than the model without random effects.
When the parameters under the null hypothesis fall on the boundary of the parameter space, the
distribution of the likelihood ratio statistic can be a complicated mixture of distributions. In certain
situations it is known to be a relatively straightforward mixture of central chi-square distributions. When
the GLIMMIX procedure recognizes the model and hypothesis as a case for which the mixture is readily
available, the p-value of the likelihood ratio test is determined accordingly as a linear combination of
central chi-square probabilities. The Note column in the Likelihood Ratio Tests of Covariance Parameters
table, along with the table’s footnotes, informs you about when mixture distributions are used
in the calculation of p-values.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-47
The output shows the best linear unbiased predictions and the values of the linear predictors for the first
twenty observations in the keratotomy study.
Example: Fit a generalized linear mixed model to the long.keratotomy data set using G-side random
effects, the method of adaptive Gauss-Hermite quadrature, and the between-within
degrees of freedom adjustment. Specify the intercept as the random effect and visit as a
categorical variable. Use the binary distribution and an optimization technique of
Newton-Raphson with ridging.
proc glimmix data=long.keratotomy noclprint=5 method=quad;
class patientid gender visit;
model unstable(event='1') = age diameter gender visit
/ solution dist=binary ddfm=bw;
random intercept / subject = patientid;
nloptions tech=nrridg;
title 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-48 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Model Information
Response Profile
Ordered Total
Value unstable Frequency
1 0 634
2 1 412
Dimensions
Optimization Information
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-49
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 12 1076.1085758 . 620.4781
1 0 13 1073.6473396 2.46123623 1313.735
2 0 10 1052.8574819 20.78985766 284.293
3 0 10 1045.3465254 7.51095646 77.94131
4 0 10 1043.4636627 1.88286271 18.14176
5 0 10 1043.285331 0.17833176 1.996121
6 0 10 1043.2830093 0.00232169 0.027969
7 0 10 1043.2830088 0.00000048 0.000026
Fit Statistics
The AIC of 1057.28 is lower than the AIC of the model that treated visit as a continuous variable
(AIC of 1061.37). Therefore, treating visit as a categorical variable led to a better fitting model.
Fit Statistics for Conditional Distribution
Standard
Cov Parm Subject Estimate Error
Standard
Effect gender visit Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-50 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-51
44
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Robust standard errors are derived by the sandwich estimator of the covariance matrix of the regression
coefficients. In general, the sandwich estimator uses a matrix with the diagonal elements equal to the
individual squared residuals to estimate the common variance (the square of any residual is an estimate
of the variance at that predictor variable value). This works because the average of a lot of poor
estimators (individual squared residuals) can be a good estimator of the common variance. In fact, Liang
and Zeger (1986) showed that the robust standard errors are robust to departures of the working
correlation matrix from the true correlation structure.
In the GLIMMIX procedure robust standard errors can be obtained by using the EMPIRICAL option
in the PROC GLIMMIX statement. The EMPIRICAL option in models w ith random effects is valid only
when the model is processed by subjects. The robust standard errors computed in PROC GLIMMIX have
advantages over the robust standard errors computed in other procedures because the classical sandwich
estimator can be biased if the number of subjects (or clusters) is small. However, the EMPIRICAL option
in PROC GLIMMIX has some suboptions that produce bias-corrected sandwich estimators.
The name “sandwich” estimator stems from the layering of the estimator. An empirically based
estimate of the inverse variance of the parameter estimates (the “meat”) is wrapped by the model-
based variance estimate (the “bread”).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-52 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Example: Fit a generalized linear mixed model to the long.keratotomy data and specify the likelihood-
based sandwich estimators. Specify R-side random effects, a binary distribution, and use the
unstructured covariance structure. Use an optimization technique of Newton-Raphson with
ridging, and request the covariance matrix diagnostics.
/* long03d03.sas */
proc glimmix data=long.keratotomy noclprint=5 empirical=mbn;
class patientid gender;
model unstable(event='1') = age diameter gender visit
/ solution dist=binary covb(details);
random _residual_ / subject = patientid type=un;
nloptions tech=nrridg;
title1 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
title2 "with Sandwich Estimators";
run;
Selected PROC GLIMMIX options:
EMPIRICAL requests that the covariance matrix of the parameter estimates be computed
as one of the asymptotically consistent estimators, known as sandwich or
empirical estimators.
EMPIRICAL=MBN requests the new likelihood-based sandwich estimator. The MBN suboptions are
a sample size adjustment (the adjustment is applied when the DF suboption is
in effect. The NODF suboption suppresses this component of the adjustment.),
and the tuning parameters r (lower bound of the design parameter) and d (used
in the computation of Morel’s parameter).
Selected MODEL statement options:
COVB produces the approximate variance-covariance matrix of the fixed-effects
parameter estimates
COVB(DETAILS) enables you to obtain a table of statistics about the covariance matrix of the fixed
effects. If an adjusted estimator is used because of the EMPIRICAL= or
DDFM=KENWARDROGER option, the GLIMMIX procedure displays statistics
for the adjusted and unadjusted estimators as well as statistics comparing them.
This enables you to diagnose, for example, changes in rank (because of an
insufficient number of subjects for the empirical estimator) and to assess
the extent of the covariance adjustment. In addition, the GLIMMIX procedure
then displays the unadjusted (model-based) covariance matrix of the fixed-effects
parameter estimates.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-53
Model Information
The design-adjusted MBN estimator applies a bias correction of the classical sandwich estimator that rests
on an additive correction of the residual crossproducts and a sample size correction. The three default
suboptions are df (sample size adjustment is applied), r=1 and d=2 (tuning parameters for the algorithm) .
Besides good statistical properties in terms of Type I error rates in small sample size situations, the MBN
estimator also has the desirable property of recovering rank when the number of sampling units is small.
The Kenward-Roger degrees of freedom method is not available when you use the EMPIRICAL option.
Class Level Information
Response Profile
Ordered Total
Value unstable Frequency
1 0 634
2 1 412
Dimensions
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-54 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Optimization Information
The optimization information is exactly the same as the information from the model with no EMPIRICAL
option with the R-side random effects. The EMPIRICAL option affects the standard errors and therefore
the inferences for the fixed effects. The optimization technique and size of the optimization problem
should not be affected.
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
The iteration history is also the same as the model with no EMPIRICAL option with the R-side random
effects.
Model Based Covariance Matrix for Fixed Effects (Unadjusted)
The model-based covariance matrix for the fixed effects shows the variances along the diagonal cells
and the covariances on the off-diagonal cells.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-55
Comparing the unadjusted covariance matrix for the fixed effects with the empirical covariance matrix for
fixed effects, it appears the variance estimate for age decreased while the variance estimates for
diameter, gender, and visit increased with the adjustment. The model-based covariance matrix estimates
are based directly on the assumed covariance structure (in this example, the unstructured covariance
structure). The model-based standard errors are better estimates if the assumed model for the covariance
structure is correct, but worse if the assumed structure is incorrect. The empirical covariance matrix
estimates are robust to the choice of the covariance structure.
Diagnostics for Covariance Matrices of Fixed Effects
Model-
Based Adjusted
Dimensions Rows 6 6
Non-zero entries 25 25
Eigenvalues > 0 5 5
= 0 1 1
max abs 0.869 0.9121
min abs non-zero 629E-8 58E-7
Condition number 138209 157305
This table, produced by the COVB(DETAILS) option in the MODEL statement, enables you to diagnose
and assess the extent of the covariance adjustment. Typically, the most important information in this table
is in the Summaries and Eigenvalues information. The trace is the sum of the diagonal elements. If the
adjustment raises the standard errors, then the trace of the adjusted COVB matrix should be larger than
the model-based COVB matrix. In this example, the trace of the adjusted COVB is larger than the model-
based, which means the adjustment raised the standard errors of the fixed effects. In addition, the number
of positive and zero eigenvalues should be the same between the unadjusted and adjusted covariance
matrices.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-56 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Fit Statistics
The ratio of the generalized chi-square statistic and its degrees of freedom is the same as the model with
no EMPIRICAL option with the R-side random effects.
Covariance Parameter Estimates
Cov Standard
Parm Subject Estimate Error
The variance component estimate in the Covariance Parameter Estimates table is exactly the same
as the result from the model with no EMPIRICAL option with the R-side random effects. Since the
EMPIRICAL option affects only the standard error of the fixed effects and therefore the inference for
fixed effects (and not the random effects), the covariance parameter estimates should not be affected.
Solutions for Fixed Effects
Standard
Effect gender Estimate Error DF t Value Pr > |t|
Num Den
Effect DF DF F Value Pr > F
The Type III Tests of Fixed Effects are computed based on the empirical estimates. The results are similar,
but not identical to the results from the model with no EMPIRICAL option with the R-side random
effects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-57
Exercises
A longitudinal study was undertaken to assess the health effects of air pollution on children. The data
contain repeated binary measures of wheezing status for each of 537 children from Steubenville, Ohio.
The measurements were taken at age 7, 8, 9, and 10 years. The smoking status of the mother
at the first year of the study was also recorded. The data are stored in a SAS data set called long.wheeze.
The data were obtained with permission from the OZDATA website. This website is a collection
of data sets and is maintained in Australia.
1) Interpret the odds ratio for age. Would the odds ratio change for a two-year decrease in age?
2) Interpret the Tests of Covariance Parameters table. Does the model with the random effects fit
the data better than the model without random effects?
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-58 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The odds ratio for age in the exercise was 1.634. How can this be
interpreted?
a. A one-year decrease from age 10 results in a 63% increase in the odds of
wheezing.
b. A one-year decrease from any age results in a 63% increase in the odds
of wheezing.
c. A one-year increase from any age results in a 63% increase in the odds of
wheezing.
d. A one-year increase from age 10 results in a 63% increase in the odds of
wheezing.
47
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Objectives
51
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-59
52
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In some situations with a continuous outcome, there is a restricted range of values because
of the limitations of the measuring techniques. This is a common feature in bioassay analyses. With
restricted ranges, there is usually a lower limit of quantification (LOQ) and an upper limit of
quantification. For example, suppose that the response variable had a lower LOQ of 300 and the upper
LOQ of 900 because of the limitations of the measuring device. Analyzing the response variable
as continuous might not be optimal given the truncated nature of the distribution. An alternative way
to analyze a continuous variable with a restricted range is to create ordered categories and fit an ordinal
logistic regression model.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-60 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Cumulative Logits
53
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In ordinal logistic regression, the logit is now a cumulative logit. If k is the number of categories for
the outcome variable, then the number of cumulative logits is k-1. The GLIMMIX procedure models the
probabilities of levels of the response variable having lower ordered values in the Response Profile table.
Logistic Models
54
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC GLIMMIX estimates a separate intercept for each cumulative logit. However, PROC GLIMMIX
does not estimate a separate slope for each cumulative logit, but rather a common slope across the
cumulative logits for each predictor variable. This common slope is a weighted average across the logits.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-61
Therefore, a parallel-lines regression model is fitted in which each curve that describes the cumulative
probabilities has the same shape. The only difference in the curves is the difference between the values
of the intercept parameters. This model is called a proportional odds model.
Logit(cum P)
Age
55
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The common effect of the predictor variable for different cumulative logits in the proportional odds model
can be motivated by assuming that a regression model holds when the response is measured more finely
(Anderson and Phillips 1981). For example, suppose there is an underlying continuous response variable
with ordered categories that is produced via cutoff points. The relationship between the predictor variable
and the outcome should not depend on the cutoff points. In other words, the effect parameters are
invariant to the choice of categories for the outcome variable. Only the intercept is affected by the cutoff
points.
Because there is a common slope for each predictor variable, the odds ratio is constant for all the
categories. The odds ratios can be interpreted as the effect of the predictor variable on the odds of being
in a lower rather than in a higher category, regardless of what cumulative logit you are examining
(the odds are cumulative odds). If you use the DESCENDING option in the MODEL statement, the odds
ratio is the effect of the predictor variable on the odds of being in a higher rather than a lower category.
The proportional odds model is also invariant to the choice of the outcome categories. There is some loss
of efficiency when you collapse the ordinal categories, but when the observations are evenly spread
among the categories the efficiency loss is minor. However, the efficiency loss is large when you collapse
the ordinal categories to a binary response (Agresti 1996). Allison (1999) recommends that you need
at least 10 observations for each category of the response variable. As the number of categories increases,
ordinary least squares might be appropriate. However, Hastie et al. (1989) showed that ordinary least
squares methods could give misleading results with up to 13 categories of the response variable.
The proportional odds model also makes no assumptions about the distances between the categories.
Therefore, how you code the ordinal outcome variable has no effect on the odds ratios.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-62 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Which one of the following statements is true for proportional odds models?
a. The model fits separate intercepts.
b. The model fits separate slopes.
c. The cumulative logits compare each category to the last category.
d. The coding of the ordinal outcome affects the odds ratios.
56
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
CD4+ Cell
Numbers
Example: The human immune deficiency virus (HIV) causes AIDS by attacking an immune cell called
the CD4+ cell, which facilitates the body’s ability to fight infection. An uninfected person has
approximately 1100 cells per milliliter of blood. Because CD4+ cells decrease in number from
the time of infection, a person’s CD4+ cell count can be used to monitor disease progression.
A subset of the Multicenter AIDS Cohort Study (Kaslow et al. 1987) was obtained for 369
infected men to examine CD4+ cell counts over time. The data is stored in a SAS data set
called long.cd4cat.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-63
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-64 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Example: Fit an ordinal logistic model to the CD4+ cell count data in long.cd4cat. Specify a random
intercept and time with an unstructured covariance structure. Specify the ODDSRATIO
option in the MODEL statement and create customized odds ratios specifying a reference
value of 0 for time, cigarettes, drug, partners, and depression. Use Newton-Raphson with
ridging, create an odds ratio plot displaying the statistics, and test whether the G-side random
effects are significant.
/* long03d04.sas */
proc glimmix data=long.cd4cat method=laplace plots=oddsratio(stats);
model cd4cat = time age cigarettes drug partners depression
time*age time*depression
time*partners time*drug time*cigarettes time*time
time*time*time
/ dist=multinomial link=cumlogit solution ddfm=bw
or(at time cigarettes drug partners depression =
0 0 0 0 0);
random intercept time / subject=id type=un;
nloptions tech=nrridg;
covtest "H0: No random effects" zerog;
title 'Ordinal Model of Aids Data';
run;
Selected MODEL statement option:
LINK= specifies the link function in the generalized linear mixed model.
DDFM The BW|BETWITHIN option divides the residual degrees of freedom into between-subject
and within-subject portions. It then determines whether a fixed effect changes within any
subject. If so, it assigns within-subject degrees of freedom to the effect. Otherwise, it
assigns the between-subject degrees of freedom to the effect. If the analysis is not processed
by subjects, the DDFM=BW option has no effect.
One exception to the preceding method is the case where you model only R-side covariation with an
unstructured covariance matrix (TYPE=UN). However, only G-side effects can be modeled with the
multinomial distribution. The cumulative logit link function is appropriate only for multinomial
distributions.
Ordinal Model of Aids Data
Model Information
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-65
The Kenward Roger degrees of freedom adjustment is not available for either of the maximum likelihood
estimation techniques.
Response Profile
Ordered Total
Value cd4cat Frequency
1 1 182
2 2 741
3 3 736
4 4 717
You can reverse the order of the response categories with the DESC option in the MODEL statement.
Dimensions
Optimization Information
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 24 4919.4503999 . 2955.587
1 0 22 4652.6684639 266.78193594 525.7503
2 0 22 4600.9405459 51.72791798 115.8175
3 0 22 4595.4042136 5.53633239 13.98589
4 0 22 4595.0664823 0.33773125 2.063348
5 0 22 4595.0623412 0.00414105 0.051142
6 0 22 4595.0623404 0.00000090 0.000035
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-66 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Fit Statistics
Fit Statistics are presented because the true likelihood, as opposed to pseudo likelihood, is computed. The
fit statistics for conditional distribution is not useful in the comparison of marginal models with different
fixed effects.
Covariance Parameter Estimates
Cov Standard
Parm Subject Estimate Error
The estimated variance of the subject-specific intercepts is 2.9023, while the estimated variance
of the subject-specific slopes for time is 0.4361.
Solutions for Fixed Effects
Standard
Effect cd4cat Estimate Error DF t Value Pr > |t|
Notice that there are three intercepts for the model corresponding to the three cumulative logits. The first
one compares the log of the probability of CD4+ counts 300 or less to the probability of CD4+ cell counts
of 301 or higher. The second one compares the log of the probability of CD4+ counts 600 or less to the
probability of CD4+ cell counts of 601 or higher. Finally, the third one compares the log of the probability
of CD4+ counts of 900 or less to the probability of CD4+ cell counts of 901 or higher.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-67
The results show that the cubic effect of time, cigarettes, drug, and the time*cigarettes interaction are
all significant at the 0.05 significance level. The negative coefficient for cigarettes indicates that patients
who smoke have lower probabilities of having lower-ordered values for the response variable. In other
words, patients who smoke have higher CD4+ cell counts.
Odds Ratio Estimates
time age cigarettes drug partners depression _time _age _cigarettes _drug
1 2.636 0 0 0 0 0 2.636 0 0
0 3.636 0 0 0 0 0 2.636 0 0
0 2.636 1 0 0 0 0 2.636 0 0
0 2.636 0 1 0 0 0 2.636 0 0
0 2.636 0 0 1 0 0 2.636 0 0
0 2.636 0 0 0 1 0 2.636 0 0
time age cigarettes drug partners depression _time _age _cigarettes _partners
1 2.636 0 0 0 0 0 2.636 0 0
0 3.636 0 0 0 0 0 2.636 0 0
0 2.636 1 0 0 0 0 2.636 0 0
0 2.636 0 1 0 0 0 2.636 0 0
0 2.636 0 0 1 0 0 2.636 0 0
0 2.636 0 0 0 1 0 2.636 0 0
time age cigarettes drug partners depression _time _age _cigarettes _depression
1 2.636 0 0 0 0 0 2.636 0 0
0 3.636 0 0 0 0 0 2.636 0 0
0 2.636 1 0 0 0 0 2.636 0 0
0 2.636 0 1 0 0 0 2.636 0 0
0 2.636 0 0 1 0 0 2.636 0 0
0 2.636 0 0 0 1 0 2.636 0 0
time age cigarettes drug partners depression _time _age _cigarettes Estimate
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-68 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
95%
Confidence
time age cigarettes drug partners depression _time _age _cigarettes Limits
95%
Confidence
time age cigarettes drug partners depression _time _age _cigarettes Limits
The Odds Ratio Estimates table shows that the odds ratio for a one-unit increase in time is 3.322 with
a 95% confidence interval of 2.710 to 4.071. Note that the odds ratio takes into account the higher-order
terms that involve time. The odds ratio for a one-unit increase in age (3.636 in the numerator and 2.636
in the denominator) is 0.992. The odds ratio for a one-unit increase in cigarettes is 0.706 and for
a one-unit increase in drug is 0.630. The odds ratio for a one-unit increase in partners is 0.972 and for
a one-unit increase in depression is 1.024. These are usually easier to read from the odds ratio plot.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-69
The odds ratio plot displays the odds ratios along with the confidence limits.
Type III Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
The results show the cubic and quadratic effects of time, the time by cigarettes interaction, cigarettes,
drug, and depression are significant at the 0.05 significance level.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-70 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The likelihood ratio chi-square test indicates that the no random effects model is rejected. The model with
random effects fits your data better than the model without random effects.
The note in the table indicates that this test of covariance parameters based on the likelihood
is a standard test with unadjusted p-values.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-71
Regression Splines
61
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC GLIMMIX has the functionality to include spline functions in the model. A spline function
is a piecewise polynomial function where the individual polynomials have the same degree and connect
smoothly at join points whose abscissa values, referred to as knots, are pre-specified. You can use spline
functions to fit curves to a wide variety of data.
( x ki )d if x>ki
( x ki )d
0 if x≤ki
where d is the degree of the spline transformation and i is the knot number.
62
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-72 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The name “truncated power function” is derived from the fact that these functions are shifted power
functions that are truncated to zero to the left of the knot. These functions are piecewise polynomial
functions whose function values and derivatives of all orders up to d-1are zero at the defining knot (k i ).
Hence, these functions are splines of degree d. The final model consists of d+1 polynomial terms
and the truncated power functions.
f ( X ) 0 1 X 2 ( X k1 ) when k1 X k2
f ( X ) 0 1 X 2 ( X k1 ) 3 ( X k2 ) when k2 X k3
f ( X ) 0 1 X 2 ( X k1 ) 3 ( X k2 ) 4 ( X k3 )
when X k3
63
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The main advantage of the truncated power function basis is the simplicity of its construction and the ease
of interpreting the parameters in a model that corresponds to these basis functions.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-73
k2
k3
f(X)
k1
X
64
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
A spline of degree 0 is a step function with steps located at the knots (k 1, k 2, and k 3). A spline of degree 1
is a piecewise linear function where the lines connect at the knots. A spline of degree 2 is a piecewise
quadratic curve whose values and slopes coincide at the knots. A spline of degree 3 is a piecewise cubic
curve whose values, slopes, and curvature coincide at the knots. Visually, a cubic spline is a smooth
curve, and it is the most commonly used spline when a smooth fit is desired. When no knots are used,
splines of degree d are simply polynomials of degree d.
65
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-74 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
The EFFECT statement enables you to construct special collections of columns for or matrices
in your model. These collections are referred to as constructed effects to distinguish them from the usual
model effects formed from continuous or classification variables.
In the EFFECT statement, the name of the effect is specified after the EFFECT keyword. This name can
appear in only one EFFECT statement and cannot be the name of a variable in the input data set.
The effect type is specified after an equal sign, followed by a list of variables us ed in constructing
the effect within parentheses. Effect-type specific options can be specified after a slash (/) following
the variable list.
The following effect-types are available in the EFFECT statement:
COLLECTION is a collection effect defining one or more variables as a single effect with
multiple degrees of freedom. The variables in a collection are considered
as a unit for estimation and inference.
MULTIMEMBER | MM is a multimember classification effect whose levels are determined by one
or more variables that appear in a CLASS statement.
• Constructed effects
proc glimmix;
class A B;
effect spl=spline(x);
model y = A B spl A*spl;
run;
66
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
A constructed effect is assigned through the EFFECT statement. In the slide above, the EFFECT
statement defines a constructed effect named spl. The columns of spl are formed from the data set
variable x as a cubic B-spline basis with three equally spaced interior knots (which is the default).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-75
Each constructed effect corresponds to a collection of columns that are referred to by using the name that
you supply. You can specify multiple EFFECT statements, and all EFFECT statements must precede
the MODEL statement.
For more information about the B-spline basis, see the PROC GLIMMIX documentation.
Effect Type
EFFECT spl=SPLINE(x);
- This constructs spline effects from B-spline or truncated power function bases.
- Options give control over knot construction, number of knots, spline basis, and so on.
- This enables you to fit a spline model for certain terms while enjoying parametric model
capabilities.
67
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
There are many spline options in the EFFECT statement to give you control over the basis function
(B spline (the default) or the truncated power function), the degree of the spline function (default is 3),
the placement of the knots (default is EQUAL), and the number of knots (default is 3).
One of the advantages of using the constructed spline effects in the model is that you are able to model
some terms through a spline function, which is typically provided in nonparametric regression
procedures, while performing some tasks that are available only to parametric models, such as having
a mathematical form of the fitted model, performing comparisons involving the spline terms, and so on.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-76 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Example: Fit an ordinal logistic model to the CD4+ cell count data in long.cd4cat. Specify a random
intercept and time with an unstructured covariance structure. Create a constructed spline effect
of time specifying a truncated power function basis for the spline expansion excluding the
intercept column. Specify that the internal knots be placed at 4 equally spaced percentiles of
time and specify the degree of the spline transformation to be 2. Use Newton-Raphson with
ridging, the Laplace likelihood approximation, and the between-within degrees of freedom
adjustment.
/* long03d05.sas */
proc glimmix data=long.cd4cat method=laplace;
effect spl = spline(time / details basis=tpf(noint)
knotmethod=percentiles(4) degree=2);
model cd4cat = spl age cigarettes drug partners depression
time*age time*depression
time*partners time*drug time*cigarettes
/ dist=multinomial link=cumlogit solution ddfm=bw;
random intercept time / subject=id type=un;
nloptions tech=nrridg;
title 'Ordinal Model of Aids Data with a Spline for Time';
run;
DETAILS requests tables that show the knot locations and the knots associated with each
spline basis function.
BASIS=TPF specifies a truncated power function basis for the spline expansion. For splines
of degree d defined with n knots for a variable X, this basis consists of an
intercept, polynomials X, X2 ,X3 ,…,Xd and one truncated power function for each
of the n knots. The option NOINT excludes the intercept column.
KNOTMETHOD= specifies how to construct the knots for spline effects. The PERCENTILES(4)
method requests that internal knots be placed at 4 equally spaced percentiles
of the variable or variables named in the EFFECT statement.
DEGREE= specifies the degree of the spline transformation. The degree must be a
nonnegative integer. The degree is typically a small integer, such as 0, 1, 2, or 3.
The default is DEGREE=3.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-77
Model Information
Knot
Number time
1 -0.75839
2 0.24914
3 1.22656
4 2.55715
The four knot values are shown, which are placed at 4 equally spaced percentiles of the variable time.
In most situations, there is no subject matter knowledge on where to place the knots. However, where
the knots are placed is usually not that important to the model fit. The number of knots is usually more
important. One criterion to use in deciding the number of knots is the use of the AIC goodness-of-fit
statistic.
Basis Details for Spline Effect spl
1 1
2 2
3 2 -0.75839
4 2 0.24914
5 2 1.22656
6 2 2.55715
The model has six terms for the spline: a linear term, a quadratic term, and 4 truncated power basis
functions placed at the knot values.
Number of Observations Read 2376
Number of Observations Used 2376
Response Profile
Ordered Total
Value cd4cat Frequency
1 1 182
2 2 741
3 3 736
4 4 717
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-78 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Dimensions
The constructed spline effect increased the number of columns in X compared to the last ordinal model.
Optimization Information
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 27 4879.3288001 . 868.2086
1 0 25 4600.9423456 278.38645445 202.6837
2 0 25 4547.0162638 53.92608183 49.48996
3 0 25 4541.7053663 5.31089750 11.9951
4 0 25 4541.4282046 0.27716173 1.293518
5 0 25 4541.4254663 0.00273824 0.015775
6 0 25 4541.425466 0.00000038 8.462E-6
Fit Statistics
The AIC is much lower compared to the last ordinal model (4585.43 versus 4633.06).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-79
Cov Standard
Parm Subject Estimate Error
The estimated variances and covariances of the random effects are similar but not identical to the last
ordinal model.
Solutions for Fixed Effects
Standard
Effect cd4cat spl Estimate Error DF t Value Pr > |t|
Notice there are six spline parameters in the model. They correspond to the linear term for time, the
quadratic term for time, and the four truncated power basis functions for time.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-80 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Num Den
Effect DF DF F Value Pr > F
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-81
Exercises
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-82 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Why are AIC and BIC model fit statistics not produced in the exercise
problem?
a. The use of the RANDOM statement always suppresses the AIC and BIC
statistics.
b. Because the response variable has a binomial distribution, the AIC and
BIC statistics are always suppressed.
c. Because the linearization method was used, the AIC and BIC statistics are
not produced because there is no true likelihood.
d. PROC GLIMMIX does not support AIC and BIC model fit statistics.
70
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Objectives
74
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-83
• GEE models are useful in analyzing data that arise from a longitudinal or
clustered design
• GEE models are marginal models that model the effect of the predictor
variables on the population-averaged response
• GEE models are recommended when the inferences from the regression
equation are the principal interest and the correlation is regarded as a
nuisance.
75
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Generalized estimating equations (GEE) were developed to accommodate correlated observations within
subjects. An estimating equation is simply the equation that you solve to calculate the parameter
estimates. The extra term generalized distinguishes the GEE as the estimating equations that
accommodate the correlation structure of the repeated measurements.
GEE are marginal models where the marginal expectation (average response for observations sharing the
same covariates) is modeled as a function of the predictor variables. The parameters in marginal models
can be interpreted as the influence of the covariates on the population-averaged response. These models
are appropriate when the scientific objectives are to characterize and contrast populations of subjects.
A useful feature of the GEE is that the parameter estimates along with the covariance matrix are
consistently estimated (the standard errors are consistent estimates of the true standard errors) even
if the correlation structure within subject is not known. Therefore, the variances along with the inferences
regarding the parameter estimates are asymptotically correct (Zeger and Liang 1986).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-84 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
76
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Provided that the mean model is correctly specified and the measurements between subjects are
independent, robust standard errors ensure consistent inferences from a GEE regression model. This
is true even if the chosen correlation structure is incorrect or if the strength of the correlation between
measurements varies from subject to subject. Although model-based standard errors are also produced,
they are consistent only if the specified correlation structure is correct. Consequently, the robust standard
errors (which are usually larger) are usually preferred especially when the number of subjects is large.
The desired number of subjects depends on the number of predictor variables in the model. If you have
less than 5 predictor variables, approximately 25 subjects might be enough to use the robust standard
errors. If you have 5 to 12 predictor variables, you would need at least 100 subjects. If you want
to be reasonably confident, then you would need around 200 subjects (Stokes, Davis, Koch 2000).
However, when the number of subjects is very small (less than 20), the model-based standard errors might
have better properties even if the specified correlation structure is wrong (Prentice 1988). This is because
the robust standard errors are asymptotically unbiased, but could be highly biased when the number
of subjects is small.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-85
Robust standard errors are derived by the sandwich estimator of the covariance matrix of the regression
coefficients. In general, the sandwich estimator uses a matrix with the diagonal elements equal to the
individual squared residuals to estimate the common variance (the square of any residual is an estimate
of the variance at that predictor variable value). This works because the average of a lot of poor
estimators (individual squared residuals) can be a good estimator of the common variance. In fact,
Liang and Zeger (1986) showed that the robust standard errors are robust to departures of the working
correlation matrix from the true correlation structure.
77
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In GEE regression models, the number of observations is not the number of subjects but rather
the number of measurements taken on all the subjects (similar to the layout for PROC MIXED).
The variance-covariance matrix is now a block-diagonal matrix in which the observations within each
block (the block corresponds to a subject) are assumed to be correlated and the observations outside
of the blocks are assumed to be independent. In other words, the subjects are still assumed to be
independent of each other and the measurements within each subject are assumed to be correlated.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-86 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Quasi-Likelihood Estimation
78
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Generalized linear models use the likelihood function in statistical inference. However, the distribution
of the response variable must be specified. For repeated measures that are discrete outcomes, it might
be difficult to specify the appropriate theoretical probability distribution. Whereas generalized linear
mixed models use pseudo-likelihood or maximum likelihood methods of estimation, GEE regression
models use the quasi-likelihood method of estimation. This estimation method requires only that you
specify the relationships between the response mean and covariates and between the response mean
and variance. Quasi-likelihood estimation has many of the advantages of maximum likelihood estimation
without requiring full distributional assumptions. This is why the GEE approach is applicable to several
types of response variables (Zeger and Liang 1986).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-87
QIC Statistic
79
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The QIC statistic, which is based on the quasi-likelihood, is computed and can be used for model
assessment for GEE models. The QIC statistic was developed by Pan (2001) as a modification of the AIC
statistic. PROC GENMOD also computes an approximation to QIC defined by Pan (2001) called QICu.
QIC is appropriate for selecting regression models and working correlations, whereas QICu is appropriate
only for selecting regression models.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-88 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
83
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The process of fitting a GEE model can be summarized in a series of steps. First, a regression model
is fitted, which assumes independence and the Pearson standardized residuals are computed. These
residuals are then used to estimate the parameters of the correlation matrix, which characterizes
the correlation of the observations within subject. The correlation parameters are then incorporated into
the GEE estimating equations, which generates new values for the regression coefficients and new
Pearson residuals. These residuals are then used to re-estimate the correlation parameters. The cyclical
process continues until the parameter estimates stabilize and model convergence is achieved.
84
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-89
GENMOD Procedure
86
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
PROC GENMOD can be used to fit GEE models to longitudinal data. The layout of the data is similar
to PROC MIXED where the number of observations is equal to the number of measurements taken
on all the subjects. The variance-covariance matrix is a block diagonal matrix in which the observations
within each block are assumed to be correlated and the observations outside of the blocks are assumed
to be independent.
Selected GENMOD procedure statements:
CLASS specifies the classification variables to be used in the analysis. If the CLASS statement is
used, it must appear before the MODEL statement.
MODEL specifies the response variable and the predictor variables. You can specify the response
in the form of a single variable or in the form of a ratio of two variables called
events/trials. This form is applicable only to summarized binomial response data.
REPEATED invokes the GEE method, specifies the correlation structure, and controls the displayed
output from the longitudinal model.
ESTIMATE provides a means for obtaining a test for a specified hypothesis concerning the model
parameters. It can also be used to produce the odds ratio estimate along with the 95%
confidence limits.
ASSESS computes and plots, using ODS Graphics, model-checking statistics based on aggregates
of residuals.
OUTPUT creates a new SAS data set that contains all the variables in the input data set and,
optionally, the estimated linear predictors and their standard error estimates, the weights
for the Hessian matrix, predicted values of the mean, confidence limits for predicted
values, and residuals.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-90 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-91
PROC GLIMMIX
• can accommodate random effects
• fits unit-specific models and population-average models
• provides (bias-adjusted) sandwich estimators of the covariance matrix of
the fixed effect that are unbiased even when the number of clusters is small
PROC GENMOD
• cannot accommodate random effects
• fits only population-average models
• provides sandwich estimators that are unbiased only when the number of
clusters is large
87
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The robust standard errors computed in PROC GLIMMIX have advantages over the robust standard
errors computed in PROC GENMOD because the classical sandwich estimator, as implemented in GEEs
in PROC GENMOD, tends to underestimate the variance of the fixed effects, particularly if the number
of subjects (or clusters) is small.
The subtle difference between the GEE-like estimates in PROC GLIMMIX and the GEE estimates
in PROC GENMOD is that the parameter estimates are obtained using the moment-based method
in PROC GENMOD, whereas the parameter estimates are obtained using the pseudo-likelihood method
in PROC GLIMMIX. Both approaches (PROC GENMOD and the EMPIRICAL option in PROC
GLIMMIX) assume that the missing data is missing completely at random (MCAR).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-92 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Effect Coding
Design
Variables
Variable Value Label 1 2
Income 1 Low 1 0
2 Medium 0 1
3 High -1 -1
88
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
To obtain the odds ratio (the odds ratio compares the odds of outcome in one group to the odds of
outcome in another group) for a one-unit increase in the predictor variable, an ESTIMATE statement
along with the EXP option has to be used. However, you need to be able to define the coefficients
in the ESTIMATE statement to obtain the odds ratio. For odds ratios involving class variables, there are
several coding schemes available for the design variables created in the CLASS statement.
For effect coding (also called deviation from the mean coding), the number of design variables created is
the number of levels of the CLASS variable minus 1. For example, if the variable income has three
levels, only two design variables were created. By default, all the design variables have a value
of –1 for the last level of the CLASS variable. Parameter estimates of the CLASS main effects using this
coding scheme estimate the difference between the effect of each level and the average effect over all
levels.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-93
89
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
For reference cell coding, the number of design variables created is the number of levels of the CLASS
variable minus 1 and the parameter estimates of the CLASS main effects estimate the difference between
the effect of each level and the last level. For example, the effect for the level low would estimate
the difference between low and high. You can choose the reference level with the REF= option.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-94 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
GLM Coding
Design
Variables
Variable Value Label 1 2 3
Income 1 Low 1 0 0
2 Medium 0 1 0
3 High 0 0 1
90
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
GLM coding uses less than full rank parameterization for variables in a CLASS statement.
This parameterization constructs one design variable for each level of the predictor variable. Therefore,
income would have three design variables where the first design variable is 1 if low, 0 otherwise, the
second design variable is 1 if medium, 0 otherwise, and the third design variable is 1 if high, 0 otherwise.
The rank of a matrix is defined as the maximum number of linearly independent row vectors
in the matrix. If the model has a design matrix that is not full rank, there are an infinite number
of solutions for the parameter estimates.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-95
To obtain coefficients for an odds ratio comparing low income to medium income for a logistic regression
model, first write out the equation for the odds for low income and the odds for medium income. For
reference cell coding, two coefficients are needed because there are two design variables.
3. Compute the odds ratio in terms of the odds for Low Income versus the
odds for Medium Income.
Odds Low e 0 1 *12 *0
0 1 *0 2 *1
Odds Medium e
Compute the odds ratio in terms of the odds of the group in the numerator and the odds in the group
in the denominator. Solving the expression algebraically shows that the coefficients for the odds ratio
comparing low income versus medium income are 1 1.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-96 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
102
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-97
The model has time and the quadratic effect of time as predictor variables. To
estimate the odds ratio for a 3-unit increase in time (time 0 to 3), the
coefficients for the ESTIMATE statement would be which of the following?
a. time 3
b. time 3 time*time 9
c. time 3 time*time 3
d. time -3 time*time -9
103
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
• The user chooses one of the available working correlation matrices in PROC
GENMOD.
• It is recommended that you choose a working correlation matrix that
approximates the average dependence among repeated measurements
within subject.
• Choosing the correct structure might increase efficiency.
105
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
When fitting a GEE model in PROC GENMOD, you should decide what is a reasonable model for
the correlation between measurements within subject. PROC GENMOD offers several common structures
to use to model the working correlation matrix. The choice of the structure should be consistent with the
empirical correlations. Liang and Zeger (1986) showed that there could be important gains in efficiency
by correctly specifying the working correlation matrix. However, the loss of efficiency is inconsequential
when the number of clusters is large (Davis 2002).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-98 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
1 1.0
2 1.0
0
Time
3 1.0
4
0 1.0
106
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The independent correlation structure forces the off-diagonal correlations to be 0. Therefore, no working
correlation structure is estimated in this case. Under this constraint, the coefficients and model-based
standard errors (requested by the MODELSE option in the REPEATED statement) are the same as those
reported in the LOGISTIC procedure. However, PROC GENMOD, by default, computes robust standard
error estimates. These estimates take into account the correlations among the repeated measurements and
usually are different from the model-based standard errors assuming independence.
The independent correlation structure might be a good choice when you have a large number of subjects
with few measurements per subject. The correlation influence is often small enough to have little impact
on the regression coefficients, but the robust standard errors will give the correct inferences. This model
gives consistent estimates of the parameters and standard errors when the mean model is correctly
specified (Davis 2002).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-99
1.0 1 0 0
1.0 1 0
1.0 1
1.0
107
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In the 1-dependent correlation structure, measurements are correlated if they are one time point apart.
They are uncorrelated if they are two or more time points apart.
1.0 1 2 0
1.0 1 2
1.0 1
1.0
108
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-100 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
For the 2-dependent correlation structure, measurements are correlated if they are two or less time periods
apart. Measurements that are one time period apart have different correlations than measurements that are
two time periods apart.
These last two correlation structures are generally called m-dependent correlation structures. The m
represents how many time periods apart the measurements remain correlated. Therefore, a 5-dependent
correlation structure indicates that measurements are correlated if they are five or fewer time periods
apart. This correlation structure is similar to the banded Toeplitz structure in PROC MIXED.
The m-dependent correlation structure assumes equally spaced time points and the same time points
across subjects.
1.0
1.0
1.0
1.0
109
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The exchangeable correlation structure, which is similar to the compound symmetry structure in PROC
MIXED, assumes that the correlations are equal across time points. Although this structure might not
be justified in longitudinal studies, it is often reasonable in situations where the repeated measurements
are not obtained over time (Allison 1999). For example, the exchangeable correlation structure might
be a good choice if the independent experimental units were classrooms and the responses obtained were
from each student in the classroom (Davis 2002).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-101
1.0 2 3
1.0 2
1.0
1.0
110
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The first order autoregressive structure specifies that the correlations be raised to the power of the number
of time points the measurements are apart. For example, if the measurements are three time points apart,
the correlation is 3 . The AR(1) model might be a good choice in a longitudinal model where
measurements are taken repeatedly over time. One shortcoming is that the correlation decays very quickly
as the spacing between measurements increases (Davis 2002).
The AR(1) correlation structure assumes equally spaced time points and the same time points across
subjects.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-102 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
1.0 23 24
1.0 34
1.0
111
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Finally, the unstructured correlation structure is completely unspecified. Therefore, there are
t (t 1)
parameters to be estimated. The unstructured working correlation structure is useful only when
2
there are very few observation times. If there were many time points, you would probably want to impose
some structure to the correlation matrix by selecting one of the other correlation structures (Allison 1999).
Furthermore, when there are missing values or a varying number of observations per subject, a
nonpositive definite matrix might occur, which would stop the parameter estimation process (Stokes,
Davis, and Koch 2000).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-103
• The nature of the problem can suggest the choice of correlation structure.
• If the number of observations is small and there is an equal number of time
points per subject, unstructured is recommended.
• If repeated measurements are obtained over time, AR(1) or m-dependent is
recommended.
• If repeated measurements are not naturally ordered, exchangeable is
recommended.
• If the number of clusters is large and the number of measurements is small,
independent structure might suffice.
112
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
If you do not know which working correlation structure to choose, one recommendation is to compare
the parameter estimates and standard errors from several correlation structures. This might indicate
whether there is sensitivity to the misspecification of the correlation structure. PROC GENMOD also
enables you to choose a user-defined correlation matrix.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-104 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Subjects
113
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
If the estimation of the regression coefficients is the primary objective of your study and there are a large
number of clusters (approximately 200) and a small number of time points, then you should not spend
much time choosing a correlation structure. If the mean model is correctly specified, the GEE method for
the parameter estimates was designed to guarantee consistency of the parameter estimates under minimal
assumptions about the time dependence (Diggle, Heagerty, Liang, and Zeger 2002). Furthermore, the loss
of efficiency from an incorrect choice of the working correlation structure is inconsequential when the
number of subjects is large (Davis 2002).
If there are a small number of clusters, then you should spend time choosing a correlation structure. Both
the model and the correlation structure must be approximately correct to obtain valid inferences (Diggle,
Heagerty, Liang, and Zeger 2002). In this situation it is important to use the model-based standard errors
rather than the robust standard errors (Prentice 1988). Choosing the correct correlation structure will also
result in increased efficiency (Davis 2002).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-105
114
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Missing values that occur intermixed with nonmissing values are called intermittent missing values.
If these missing values are missing completely at random (MCAR), then the consistency results
established by Liang and Zeger (1986) hold. A simple check of MCAR is to divide the subjects into two
groups: those with a complete set of measurements and those with missing measurements. If the MCAR
assumption holds, then both groups (with their measurements) should be random samples of the same
population of measurements. In other words, the probability of missing is independent of the observed
measurements and the measurements that would have been available had they not been missing. The t-
tests for location and more general tests of equality of distribution can be used to test the MCAR
assumption (Little 1995). Tests of MCAR for repeatedly measured categorical data were discussed by
Park and Davis (1993).
Some intermittent missing values can arise due to censoring rules. For example, values outside a stated
range might be simply unreliable because of the limitations of the measuring techniques in use (Diggle,
Heagerty, Liang, and Zeger 2002). Methods for handling censored data in correlation data structures are
addressed in Laird (1988) and Hughes (1999). Intermittent missing values can also be related to the
outcome. For example, a patient might miss an appointment because of an adverse reaction to the
treatment. The fact that the subject remains in the study means that the investigator should have the
opportunity to ascertain the reason for the missing appointment and take corrective action accordingly
(Diggle, Heagerty, Liang, and Zeger 2002, Little 1995).
If all the missing values occur after a certain time point for a subject, then the missing values are called
dropouts. These are a more significant problem compared to intermittent missing values because usually
the subject is withdrawn for reasons directly or indirectly connected to the outcome and are lost to follow-
up. If you treat the dropouts as MCAR when they are in fact informative dropouts, the parameter
estimates will be biased (Diggle and Kenward 1994).
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-106 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Diggle, Heagerty, Liang, and Zeger (2002) state that “An emerging consensus is that analysis of data with
potentially informative dropouts necessarily involves assumptions that are difficult, or even impossible,
to check from the observed data. This suggests that it would be unwise to rely on the precise conclusions
of an analysis based on a particular informative dropout model.” They recommend that a sensitivity
analysis be conducted on the informative dropout model. This provides some protection against the
possibility that conclusions reached from a random dropout model are critically dependent on the validity
of MCAR. Scharstein et al. (1999) provides a discussion on how such sensitivity analyses might
be conducted.
115
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Because the GEE method is semiparametric (not nonparametric), the mean model and variance function
should be correctly specified. Thus, the consistency results of the GEE models depend on the correct
specification of the model for the mean. Furthermore, robust standard errors should be used only with
a large number of subjects.
Park (1993) compared GEE estimators with normal-theory maximum likelihood estimators and reported
that GEE estimators were more sensitive to the occurrence of missing data.
Several studies have shown that the bias and efficiency of the GEE method can depend on the number
of subjects, number of repeated measurements, magnitudes of the correlations among repeated
measurements, and number and type of covariates. Lipsitz et al. (1991) reported that the parameter
estimates for a binary GEE model were biased slightly upward and the bias increased as the magnitude
of the correlation increased. Paik (1988) reported that as the number of covariates increases, the number
of subjects needs to increase for the point estimates and confidence intervals to perform satisfactorily
(with 4 repeated measurements and 4 covariates, he recommended a sample size greater than 50).
One solution to the MCAR limitation is to use the MI procedure to impute the missing values.
PROC MI invokes the MAR assumption. Then fit the GEE model in PROC GENMOD on the
complete data.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-107
When you have a large sample size, what condition is not necessary for the
GEE model to have consistent parameter estimates and standard errors?
a. Missing values are MCAR
b. Correct specification of the model for the mean
c. Correct specification of the variance function
d. Correct specification of the correlation structure
116
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-108 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Example: Fit a GEE model on the long.keratotomy data set specifying the unstructured correlation
structure, reference cell coding for gender with female as the reference cell, and request the
Type 3 score statistics, the final working correlation matrix, the initial maximum likelihood
parameter estimates table, and the model-based standard errors. Also compute the odds ratio
for a one-unit decrease in diameter, a one-unit increase in visit, a ten-unit increase in age,
and an odds ratio comparing males to females.
/* long03d06.sas */
proc genmod data=long.keratotomy desc;
class patientid gender (param=ref ref='Female');
model unstable=age diameter gender visit / dist=bin type3;
repeated subject=patientid / corrw modelse type=unstr printmle;
estimate '10 year increase in age' age 10 / exp;
estimate '1 mm decrease in diameter' diameter -1 / exp;
estimate 'male vs. female' gender 1 / exp;
estimate '1 year increase in followup' visit 1 / exp;
title 'GEE Model of Radial Keratotomy Surgery';
run;
Selected PROC GENMOD statement option:
DESC reverses the sort order for the levels of the outcome variable.
Selected CLASS statement option:
PARAM= specifies the parameterization method for the classification variable or variables.
The default is PARAM=GLM.
REF= specifies the reference level for PARAM=EFFECT, PARAM=REF, and their
orthogonalizations. For an individual variable, you can specify the level of the
variable to use as the reference level. For a global or individual variable, you can
use one of the following keywords. The default is REF=LAST.
FIRST designates the first ordered level as the reference.
LAST designates the last ordered level as the reference.
Selected MODEL statement options:
DIST= specifies the built-in probability distribution to use in the model. The default link
function for the binomial distribution is the logit link function.
TYPE3 requests that Type 3 score statistics be computed for each effect that is specified
in the MODEL statement. Likelihood ratio statistics are produced for models
that are not GEE models.
Selected REPEATED statement options:
CORRW specifies that the final working correlation matrix be printed.
MODELSE displays an analysis of parameter estimates table using model-based standard
errors.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-109
If the repeated measurements are not in the proper order or if there are missing time points for
some subjects, then the WITHIN= option in the REPEATED statement should be used. This
option names a variable that specifies the order of measurements within subjects. Variables used
in the WITHIN= option must also be listed in the CLASS statement.
GEE Model of Radial Keratotomy Surgery
Model Information
The Model Information table provides information about the data set and the model.
Class Level Information
Design
Class Value Variables
gender Female 0
Male 1
The Class Level Information table displays the levels of the class variables.
Response Profile
Ordered Total
Value unstable Frequency
1 1 412
2 0 634
The Response Profile table displays the levels of the response variable. Notice PROC GENMOD shows
which value of the response variable is being modeled.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-110 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Parameter Information
Prm1 Intercept
Prm2 age
Prm3 diameter
Prm4 gender Male
Prm5 visit
Algorithm converged.
The Analysis of Initial Parameter Estimates table (displayed by the PRINTMLE option) shows the
parameter estimates when the observations are treated as independent. These parameter estimates are used
as the starting values for the GEE solution. The inferences from this table should be used only as a
comparison to the inferences from the GEE model.
GEE Model Information
Algorithm converged.
The GEE Model Information table displays information about the longitudinal model fit with GEE.
Because TYPE=UNSTR option is requested, the unstructured correlation structure is used. Furthermore,
because there are 362 patients, there are 362 clusters. Notice that the data are not complete as 25 clusters
have missing values.
Working Correlation Matrix
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-111
Because the unstructured correlation structure is used, the correlations between time points are all
estimated. Because there are a relatively large number of clusters, the choice of correlation structures will
not significantly affect the results of the GEE model.
GEE Fit Criteria
QIC 1088.1506
QICu 1085.3915
The quasi-likelihood information criterion (QIC) is a modification of the Akaike information criterion
(AIC) to apply to models fit by GEEs. The QIC is appropriate for selecting regression models and
working correlations structures as the lower the value the better fit of the model. PROC GENMOD also
computes an approximation to QIC called QICu, and this is appropriate only for selecting regression
models.
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Because the MODELSE option is used, the Analysis Of GEE Parameter Estimates table shows both the
empirical standard error estimates and the model-based standard error estimates. The empirical standard
error estimates are robust estimates that do not depend on the correctness of the structure imposed on the
working correlation matrix. The model-based standard error estimates are based directly on the assumed
correlation structure. The model-based standard errors are better estimates if the assumed model for the
correlation structure is correct, but worse if the assumed model is incorrect (Allison 1999). Because the
sample size is large, the robust standard errors are generally preferred.
Score Statistics For Type 3 GEE Analysis
Chi-
Source DF Square Pr > ChiSq
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-112 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Because the TYPE3 option is used, the Score Statistics For Type 3 GEE Analysis table is displayed. The
results based on the empirical standard errors, the model-based standard errors, and the Type 3 score
statistics are all very similar because of the large sample size. However, the Z statistic (from the table
based on empirical and model-based standard errors) usually produce more liberal p-values than the score
statistic.
If you have a small sample size, the score statistic is the statistic of choice (Stokes, Davis, and
Koch 2000).
Contrast Estimate Results
L'Beta Chi-
Label Confidence Limits Square Pr > ChiSq
The Contrast Estimate Results table displays the results of the ESTIMATE statement. Because the EXP
option is used, the contrast results are exponentiated, which produces the odds ratio estimate. The odds
ratio for diameter (which is in the L’Beta Estimate column) shows that patients with a one-millimeter
decrease in diameter are 3.14 times more likely to have a continuing effect of the surgery with respect
to odds. The 95% confidence bounds are 2.01to 4.88.
There are several disadvantages of the Wald chi-square tests shown in the Contrast Estimate
Results table. One disadvantage is that the tests for individual parameters are dependent on the
measurement scale (they are not invariant to transformations). Another disadvantage of Wald tests
is that they require estimation of the covariance matrix of the vector of parameter estimates.
Estimates of variances and covariances might be unstable if the sample size is small. It is
recommended that you have around 200 clusters to provide a great deal of confidence concerning
assessments of statistical significance at the 0.05 confidence level or smaller (Stokes, Davis,
and Koch 2000). With 362 subjects, the Wald tests should perform reasonably well.
The Mean Estimate column is the linear contrast L’Beta reflected through the inverse link
function (the associated probability). In this example, it is not meaningful.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-113
Example: Using the ESTIMATE statement and the same model as the last demonstration, generate the
probability of a continuing effect of the surgery at a visit of 10 years for 49-year-old males
with a diameter of the clear zone of 3. Use the ODS SELECT statement to select only the table
of the contrast estimate results.
ods select estimates;
proc genmod data=long.keratotomy desc;
class patientid gender (param=ref ref='Female');
model unstable = age diameter gender visit / dist=bin;
repeated subject = patientid / type=unstr;
estimate 'Probability for age 49 diameter 3 gender male visit 10'
int 1 age 49 diameter 3 gender 1 visit 10;
title 'GEE Model of Radial Keratotomy Surgery';
run;
Probability for age 49 diameter 3 gender male visit 10 0.8936 0.8342 0.9335 2.1283
Standard L'Beta
Label Error Alpha Confidence Limits
Probability for age 49 diameter 3 gender male visit 10 0.2616 0.05 1.6157 2.6410
Chi-
Label Square Pr > ChiSq
The estimated probability is 0.8936 with a 95% confidence interval of 0.8342 to 0.9335. You can change
the confidence limits with the ALPHA= option in the ESTIMATE statement. The L’Beta estimate is
the Xβ.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-114 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
119
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
In this example, the number of subjects is very large. Therefore, there should be little difference in the
parameter estimates and the robust standard errors across the different correlation structures. The slide
above illustrates the robustness of the GEE methods with regard to obtaining consistent parameter
estimates and standard errors. Notice the standard errors all increased from the initial model to the GEE-
based models. This makes sense for age, diameter, and gender because these variables are all time-
independent. However, visit is a time-dependent variable, so the standard error should have decreased.
The negative correlations among the observations might have caused this anomaly.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-115
Exercises
A longitudinal study was undertaken to assess the health effects of air pollution on children. The data
contain repeated binary measures of wheezing status for each of 537 children from Steubenville, Ohio.
The measurements were taken at age 7, 8, 9, and 10 years. The smoking status of the mother at the first
year of the study was also recorded. The data are stored in a SAS data set called long.wheeze.
1) For the GEE parameter estimates, which parameters are significant at the 0.05 level?
2) Explain the changes in the p-values and standard errors for smoker and age when comparing
the initial parameter estimates to the GEE parameter estimates.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-116 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Compared to the results of the initial parameter estimates, the p-value and
standard error of the GEE parameter estimate for smoker did which of the
following?
a. Went up because it is a time-dependent variable
b. Went up because it is a time-independent variable
c. Went down because it is a time-dependent variable
d. Went down because it is a time-independent variable
121
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Chapter Summary 3-117
The GLIMMIX procedure uses the pseudo-likelihood (linearization) method to obtain the parameter
estimates and standard errors from a linearized model. The first step is achieved by taking the first-order
Taylor series expansions to linearize the generalized linear mixed model to linear mixed models. Because
the linearization approach approximates the GzLMM as linear mixed models, the computed likelihood is
for these linear mixed models, not the original model. It is not the true likelihood of your problem.
Likelihood ratio tests that compare nested models might not be mathematically valid and the model fit
statistics should not be used for model comparisons (AIC, BIC).
There are two likelihood-based estimation methods. The METHOD=QUAD option in the PROC
GLIMMIX statement requests that the GLIMMIX procedure approximate the marginal log likelihood
with an adaptive Gauss-Hermite quadrature. The METHOD=LAPLACE option in the PROC GLIMMIX
statement requests that the GLIMMIX procedure approximate the marginal log likelihood by using the
Laplace method. Laplace estimates typically exhibit better asymptotic behavior and less small-sample
bias than pseudo-likelihood estimators. On the other hand, the class of models for which a Laplace
approximation of the marginal log likelihood is available is much smaller compared to the class of models
to which pseudo-likelihood estimation can be applied.
In the GLIMMIX procedure, robust standard errors can be obtained by using the EMPIRICAL option in
the PROC GLIMMIX statement. The subtle difference between this option and the GEE estimates in
PROC GENMOD is that the parameter estimates are obtained using the moment-based method in PROC
GENMOD, whereas the parameter estimates are obtained using the pseudo-likelihood method in PROC
GLIMMIX.
Models using the GEE method are marginal models that only estimate population average regression
coefficients and do not estimate subject-specific regression coefficients. These models are not flexible
enough to specify heterogeneity of the covariance parameters. However, fitting models using the GEE
approach has been shown to give consistent estimators of the regression coefficients and their variances
under weak assumptions about the actual correlation among a subject’s observations.
PROC GENMOD can be used to fit longitudinal data models with the use of the GEE method. The layout
of the data is similar to PROC GLIMMIX where the number of observations is the number of
measurements taken on all the subjects. The variance-covariance matrix is a block diagonal matrix in
which the observations within each block are assumed to be correlated while the observations outside of
the blocks are assumed to be independent.
If the estimation of the regression coefficients is the primary objective of your study and you have a large
number of subjects, then you should not spend much time choosing a correlation structure. If the
correlation among the measurements is of prime interest and you have a small number of subjects, then
you should spend time choosing a correlation structure. For this latter case, both the model and the
correlation structure must be approximately correct to obtain valid inferences.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-118 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-119
3.5 Solutions
Solutions to Exercises
1. Generating Empirical Logit Plots
a. Generate a line listing of the wheezing data (first 20 observations) and logit plots of age.
proc print data=long.wheeze(obs=20);
title 'Line Listing of Wheezing Data';
run;
1 No 1 7 0
2 No 1 8 0
3 No 1 9 0
4 No 1 10 0
5 No 2 7 0
6 No 2 8 0
7 No 2 9 0
8 No 2 10 0
9 No 3 7 0
10 No 3 8 0
11 No 3 9 0
12 No 3 10 0
13 No 4 7 0
14 No 4 8 0
15 No 4 9 0
16 No 4 10 0
17 No 5 7 0
18 No 5 8 0
19 No 5 9 0
20 No 5 10 0
1) The data are in the proper order (sorted by age within case).
proc means data=long.wheeze noprint nway;
class age;
var wheeze;
output out=bins sum(wheeze)=wheeze;
run;
data bins;
set bins;
logit=log((wheeze+1)/(_freq_-wheeze+1));
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-120 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
2) The plot shows that the logits possibly have a quadratic relationship with age.
2. Fitting Generalized Linear Mixed Models
a. Fit a generalized linear mixed model to the long.wheeze data set using G-side random effects, the
method of adaptive Gauss-Hermite quadrature, and the between-within degrees of freedom
adjustment. Specify wheeze as the response variable and smoker , age, and age*age as the
predictor variables. Model the probability that wheeze is equal to 1 with the EVENT= option.
Also, request that the solution for the fixed-effects parameters be produced. Specify the
optimization technique of Newton-Raphson with ridging, and compute the odds ratio for smoker
(No as the reference value) and for a one-year decrease in age (10 as the reference value). Create
an odds ratio plot and display the statistics and use the COVTEST statement to test whether the G
matrix can be reduced to a zero matrix.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-121
Model Information
Response Profile
Ordered Total
Value wheeze Frequency
1 0 1822
2 1 326
Dimensions
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-122 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Optimization Information
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 10 1664.4797484 . 10743.16
1 0 13 1600.6608663 63.81888214 1736.633
2 0 8 1593.7671408 6.89372550 288.9059
3 0 8 1591.5112399 2.25590087 78.47274
4 0 8 1590.9975586 0.51368136 13.79153
5 0 8 1590.729367 0.26819152 0.702447
6 0 8 1590.6477122 0.08165484 1.182501
7 0 8 1590.6477055 0.00000666 0.001201
Fit Statistics
Cov Standard
Parm Subject Estimate Error
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-123
Standard
Effect smoker Estimate Error DF t Value Pr > |t|
95% Confidence
smoker age _smoker _age Estimate DF Limits
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-124 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Num Den
Effect DF DF F Value Pr > F
1) The odds ratio for age compares age 9 to age 10 (the odds of the event in age 9 in the
numerator and the odds of the event in age 10 in the denominator) taking into account the
polynomial term. The estimate of 1.634 means that a one-year decrease in age going from age
10 to age 9 results in a 63% increase ((1.634-1)*100) in the odds of wheezing. Since the
polynomial term is in the model, a two-year decrease in age would result in a different
odds ratio.
2) The results in the Tests of Covariance Parameters table indicate that the “no random effects
model” is rejected. Therefore, the model with random effects fits the data better than the
model without random effects.
3. Fitting Generalized Linear Mixed Models with Splines
a. Fit a generalized linear mixed model to the long.wheeze data set but create a spline for age.
Specify a truncated power function basis for the spline expansion and use the NOPOWERS
option to exclude the intercept and polynomial columns. Use the knot method of list and list the
know values as 8 and 9, specify a degree of spline expansion of 3, and request a table that shows
the knot locations and the knots associated with each spline basis function. Use R-side random
effects with an unstructured covariance structure and use an optimization technique of Newton-
Raphson with ridging.
proc glimmix data=long.wheeze noclprint=5;
class case smoker;
effect spl=spline(age / details basis=tpf(nopowers)
knotmethod=list(8 9) degree=3);
model wheeze(event='1') = smoker spl / solution dist=binary;
random _residual_ / type=un subject=case;
nloptions tech=nrridg;
title 'Generalized Linear Mixed Model of Wheezing among Children';
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-125
Model Information
Knot
Number age
1 8.00000
2 9.00000
1 3 8.00000
2 3 9.00000
Response Profile
Ordered Total
Value wheeze Frequency
1 0 1822
2 1 326
Dimensions
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-126 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Optimization Information
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
Fit Statistics
Cov Standard
Parm Subject Estimate Error
Standard
Effect spl smoker Estimate Error DF t Value Pr > |t|
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-127
Num Den
Effect DF DF F Value Pr > F
1) The two spline coefficients for age deal with the truncated power basis functions. The first
spline coefficient for age deals with the first truncated power basis function. For the first
knot, if age is 8 or less than the truncated basis function is 0. If age is greater than 8, then
the truncated power basis function is (age-8)3. For the second truncated power basis function,
if age is greater than 9, then the truncated power basis function is (age-9)3 .
2) Because the linearization method was used, the AIC and BIC statistics are not produced
because there is no true likelihood.
Model Information
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-128 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
case 537 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
...
smoker 2 No Yes
Response Profile
Ordered Total
Value wheeze Frequency
1 1 326
2 0 1822
Parameter Information
Prm1 Intercept
Prm2 smoker No
Prm3 smoker Yes
Prm4 age
Prm5 age*age
Algorithm converged.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-129
Algorithm converged.
QIC 1828.2487
QICu 1825.1870
Chi-
Source DF Square Pr > ChiSq
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-130 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
1) The parameter age*age is significant at the 0.05 level for the empirical standard errors and
the model-based standard errors, but it is not significant for the score statistics. The reason for
this discrepancy is the score statistics are more conservative than the Z statistics.
2) The p-value and standard error for the GEE parameter estimate for smoker went up when
compared to the initial parameter estimate because it is a time-independent variable while the
p-value and standard error for age went down because it is a time-dependent variable.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-131
11
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
23
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-132 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
36
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The odds ratio for age in the exercise was 1.634. How can this be
interpreted?
a. A one-year decrease from age 10 results in a 63% increase in the odds of
wheezing.
b. A one-year decrease from any age results in a 63% increase in the odds
of wheezing.
c. A one-year increase from any age results in a 63% increase in the odds of
wheezing.
d. A one-year increase from age 10 results in a 63% increase in the odds of
wheezing.
48
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-133
Which one of the following statements is true for proportional odds models?
a. The model fits separate intercepts.
b. The model fits separate slopes.
c. The cumulative logits compare each category to the last category.
d. The coding of the ordinal outcome affects the odds ratios.
57
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Why are AIC and BIC model fit statistics not produced in the exercise
problem?
a. The use of the RANDOM statement always suppresses the AIC and BIC
statistics.
b. Because the response variable has a binomial distribution, the AIC and
BIC statistics are always suppressed.
c. Because the linearization method was used, the AIC and BIC statistics are
not produced because there is no true likelihood.
d. PROC GLIMMIX does not support AIC and BIC model fit statistics.
71
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-134 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
85
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
The model has time and the quadratic effect of time as predictor variables. To
estimate the odds ratio for a 3-unit increase in time (time 0 to 3), the
coefficients for the ESTIMATE statement would be which of the following?
a. time 3
b. time 3 time*time 9
c. time 3 time*time 3
d. time -3 time*time -9
104
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
0 1 *3 2 *32
Oddstime3 e
0 1*0 2 *0 e 1*3 2 *9
Oddstime 0 e
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-135
When you have a large sample size, what condition is not necessary for the
GEE model to have consistent parameter estimates and standard errors?
a. Missing values are MCAR
b. Correct specification of the model for the mean
c. Correct specification of the variance function
d. Correct specification of the correlation structure
117
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Compared to the results of the initial parameter estimates, the p-value and
standard error of the GEE parameter estimate for smoker did which of the
following?
a. Went up because it is a time-dependent variable
b. Went up because it is a time-independent variable
c. Went down because it is a time-dependent variable
d. Went down because it is a time-independent variable
122
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-136 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Appendix A References
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A.1 References A-3
A.1 References
1. Agresti, A. (1996), An Introduction to Categorical Data Analysis, New York: John Wiley & Sons.
2. Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transaction
on Automatic Control, 19:716–723.
3. Albert, P.S. and McShane, L.M. (1995), “A generalized estimating equations approach for spatially
correlated binary data: Applications to the analysis of neuroimaging data,” Biometrics, 51: 627–638.
4. Allison, P. (1999), Logistic Regression Using the SAS System: Theory and Application, Cary, NC:
SAS Institute Inc.
5. Anderson, J.A. and Philips, P.R. (1981), “Regression, discrimination, and measurement models for
ordered categorical variables,” Applied Statistician, 30:22–31.
6. Barnhart, H.X. and Williamson, J.M. (1998), “Goodness-of-fit tests for GEE modeling with binary
data,” Biometrics, 54:720–729.
7. Breslow, N.E. and Clayton, D.G. (1993), “Approximate Inference in Generalized Linear Mixed
Models,” Journal of American Statistical Association, 88, 9–25.
8. Brown, H. and Prescott, R. (2001), Applied Mixed Models in Medicine, New York: John Wiley
& Sons Ltd.
9. Chi, E.M. and Reinsel, G.C. (1989), “Models for longitudinal data with random effects and AR(1)
errors,” Journal of the American Statistical Association, 84:452–459.
10. Cook, R.D. (1977), “Detection of influential observations in linear regression,” Technometrics,
19:15–28.
11. Cook, R.D. (1979), “Influential observations in linear regression,” Journal of American Statistical
Association, 74: 169–174.
12. Cook, R.D. (1986), Discussion of paper by S. Chatterjee and A.S. Hadi, Statistical Science,
1:393–397.
13. Davis, C.S. (2002), Statistical Methods for the Analysis of Repeated Measurements, New York:
Springer-Verlag.
14. Diggle, P.J. (1988), “An Approach to the Analysis of Repeated Measurements,” Biometrics, 44: 959–
971.
15. Diggle, P.J. (1990), Time Series: a Biostatistical Introduction, Oxford: Oxford University Press.
16. Diggle, P.J., Heagerty, P., Liang, K., and Zeger, S.L. (2002), Analysis of Longitudinal Data, 2nd
Edition, New York: Oxford University Press.
17. Diggle, P.J. and Kenward, M.G. (1994), “Informative dropout in longitudinal data analysis
(with discussion),” Applied Statistics, 43:49–73.
18. Duffy, T.J. and Santner, D.E. (1989), The Statistical Analysis of Discrete Data, New York: Springer-
Verlag.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A-4 Appendix A References
19. Dunlop, D.D. (1994), “Regression for Longitudinal Data: A Bridge from Least Squares Regression,”
The American Statistician, 48:299–303.
20. Evans, S.R. (1998), Goodness of Fit in Two Models for Clustered Binary Data. Ph.D. dissertation,
University of Massachusetts.
21. Evans, S.R. and Hosmer, D.W. (2004), “Goodness of Fit Tests for Logistic GEE Models: Simulation
Results,” Communication in Statistics, 33:247–258.
22. Guerin, L. and Stroup, W.W. (2000), “A Simulation Study to Evaluate PROC MIXED Analys
is of Repeated Measures Data,” Proceedings of the Twelfth Annual Kansas State University
Conference on Applied Statistics in Agriculture, April 30 – May 2, 2000.
23. Hastie, T.J., Botha, J.L., and Schnitzler, C.M. (1989), “Regression with an ordered categorical
response,” Statistics in Medicine, 8: 785–794.
24. Heagerty, P.J. and Zeger, S.L. (1998), “Lorelogram: a regression approach to exploring dependence in
longitudinal categorical responses,” Journal of the American Statistical Association, 93:150–162.
25. Horton, N.J., Bebchuk, J.D., Jones, C.L., et al. (1999), “Goodness-of-fit for GEE: an example with
mental health service utilization,” Statistics in Medicine, 18:213–222.
26. Hosmer, D.W. and Lemeshow, S. (2000), Applied Logistic Regression, 2nd edition, New York: John
Wiley & Sons.
27. Hughes, J.P. (1999), “Mixed effects models with censored data with application to HIV RNA levels,”
Biometrics, 55:625–629.
28. Kaslow, R.A., et al. (1987), “The Multicenter AIDS Cohort Study: Rationale, Organization, and
Selected Characteristics of the Participants,” American Journal of Epidemiology, 126:310–318.
29. Kenward, M.G.. and Roger, J.H. (1997), “Small Sample Inference for Fixed Effects from Restricted
Maximum Likelihood,” Biometrics, 53:983–997.
30. Kenward, M.G. and Roger, J.H. (2009), “An Improved Approximation to the Precision of Fixed
Effects from Restricted Maximum Likelihood,” Computational Statistics and Data Analysis, 53,
2583–2595.
31. Laird, N.M. (1988), “Missing data in longitudinal studies,” Statistics in Medicine, 7:305–315.
32. Liang, K.Y. and Zeger, S.L. (1986), “Longitudinal Data Analysis using Generalized Linear Models,”
Biometrika, 73:13–22.
33. Lin, D.Y., Wei, L.J., and Ying, Z. (2002), “Model-Checking Techniques Based on Cumulative
Residuals,” Biometrics, 58:1–12.
34. Lipsitz, S., Laird, N., and Harrington, D. (1991), “Generalized estimating equations for correlated
binary data: using odds ratios as a measure of association,” Biometrika, 78:153–160.
35. Littell, R.C., Henry, P.R., and Ammerman, C.B. (1998), “Statistical Analysis of Repeated Measures
Data Using SAS Procedures,” Animal Journal of Animal Science, 76:1216–1231.
36. Littell, R.C., Milliken, G.A., Stroup, W.W., and Wolfinger, R.D. (1996), SAS® System for Mixed
Models, Cary, NC: SAS Institute Inc.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A.1 References A-5
37. Littell, R.C., Stroup, W.W., and Freund, R.J. (2002), SAS® for Linear Models, Fourth Edition, Cary,
NC: SAS Institute Inc.
38. Little, R.J.A. and Rubin, D.B. (1987), Statistical Analysis with Missing Data, New York:
John Wiley & Sons.
39. Little, R.J.A. (1995), “Modeling the Drop-Out Mechanism in Repeated Measures Studies,” Journal of
the American Statistical Association, 90:1112–1121.
40. McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models, 2nd Edition, London: Chapman
and Hall.
41. Molenberghs, G., and Verbeke, G. (2005), Models for Discrete Longitudinal Data, New York:
Springer Science+Business Media, Inc.
42. Paik, M.C. (1988), “Repeated measurement analysis for nonnormal data in small samples,”
Communications in Statistics – Simulation and Computation, 17: 1155–1171.
43. Pan, W. (2001), “Akaike’s information criterion in generalized estimating equations,” Biometrics,
57:120–125.
44. Park, T. (1993), “A comparison of the generalized estimating equation approach with
the maximum likelihood approach for repeated measurements,” Statistics in Medicine, 12: 1723–
1732.
45. Park, T. and Davis, C.S. (1993), “A Test of the Missing Data Mechanism for Repeated Categorical
Data,” Biometrics, 49: 631–638.
46. Pepe, M.S. and Anderson, G.A. (1994), “A cautionary note on inference for marginal regression
models with longitudinal data and general correlated response data,” Communication in Statistics –
Simulation, 23:939–951.
47. Pregibon, D. “Logistic Regression Diagnostics,” The Annals of Statistics, 9: 705–724.
48. Preisser, J.S. and Qaqish, B.F. (1996), “Deletion diagnostics for generalized estimating equations,”
Biometrika, 83: 551–562.
49. Prentice, R. L. (1988), “Correlated binary regression with covariates specific to each binary
observation,” Biometrics, 44: 1033–1048.
50. Ruppert, D., Wand, M.P., and Carroll, R.J., (2003), Semiparametric Regression, Cambridge:
Cambridge University Press.
51. SAS Institute Inc. (1995), Logistic Regression Examples Using the SAS® System, Version 6, First
Edition, Cary, N.C.: SAS Institute Inc.
52. SAS Institute Inc. (2005), The GLIMMIX Procedure, Cary, N.C.: SAS Institute Inc.
53. Schabenberger, O. (2004), “Mixed Model Influence Diagnostics,” SUGI 29 Proceedings.
54. Scharfstein, D.O., Rotnitzky, A. and Robins, J.M. (1999), “Adjusting for nonignorable dropout using
semiparametric non-response models (with Discussion),” Journal of the American Statistical
Association, 94:1096–1120.
55. Schwarz, G. (1978), “Estimating the Dimension of a Model,” Annals of Statistics, 6:461–464.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A-6 Appendix A References
56. Shock, N.W., et al. (1984), Normal Human Aging: The Baltimore Longitudinal Study of Aging,
National Institutes of Health Publication 84–2450. Washington, DC: National Institutes of Health.
57. Sommer, A., Katz, J., and Tarwotjo, I. (1984), “Increased risk of respiratory infection
and diarrhea in children with pre-existing mild vitamin A deficiency,” American Journal
of Clinical Nutrition, 40:1090–1095.
58. Stokes, M.E., Davis, C.S., and Koch, G.G. (2000), Categorical Data Analysis using
the SAS System, Second Edition, Cary, N.C.:SAS Institute Inc.
59. Swallow, W.H. and Monahan, J.F. (1984), “Monte Carlo Comparison of ANOVA, MIVQUE, REML,
and ML Estimators of Variance Components,” Technometrics, 28:47–57.
60. Tufte, E.R. (1990), Envisioning Information, Cheshire, CT: Graphics Press.
61. Verbeke, G.. and Lesaffre, E. (1997), “The effect of misspecifying the random effects distribution in
linear mixed models for longitudinal data,” Computational Statistics and Data Analysis, 23:541–556.
62. Verbeke, G.. and Molenberghs, G. (1997), Linear Mixed Models in Practice: A SAS-Oriented
Approach, New York: Springer-Verlag, Inc.
63. Verbeke, G.. and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data,
New York: Springer-Verlag, Inc.
64. Waring, G.O., Lynn, M.J., and McDonnell, P.J. (1994), “Results of the Prospective Evaluation of
Radial Keratotomy (PERK) Study 10 years after Surgery,” Arch Ophthalmol, 112: 1298–1308.
65. Zeger, S.L. and Liang, K. (1986), “Longitudinal Data Analysis for Discrete and Continuous
Outcomes,” Biometrics, 42:121–130.
66. Zeger, S.L. and Liang, K (1992), “An Overview of Methods for the Analysis of Longitudinal Data,”
Statistics In Medicine, 11:1825–1839.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Appendix B Additional Resources
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-3
B.1 Programs
1. The VARIOGRAM macro creates the data set varioplot that contains the data values to construct
a sample variogram. The data must be sorted by the subject’s identification number.
%macro variogram (data=,resvar=,clsvar=,expvars=,id=,time=,maxtime=,);
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-4 Appendix B Additional Resources
%mend variogram;
The DATA step does a one-to-one merge of the two data sets. This is appropriate because the data sets are
in identical order.
2. The VARIANCE macro computes the variogram-based estimate of the process variance. The data
must be sorted by the subject’s identification number.
%macro variance(data=,id=,resvar=,clsvar=,expvars=,
subjects=,maxtime=,);
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-5
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-6 Appendix B Additional Resources
The DATA step calculates the variance by comparing each subject’s time points to all other time points
for all other subjects. Three arrays are created. The time array contains the time points from the
varsubject data set. The timepts array contains the time points for all the subjects from the varsubject
data set. The diff array contains differences between one time point for one subject with all time points
for all the other subjects.
The first DO loop reads the time points into the timepts array.
After reading in the data set varsubject, populate the array containing differences between one time point
for one subject with all the time points for all the other subjects. In the DO group, the first DO loop is
the loop for one subject. The second DO loop is the loop for each time point for one subject. The third
DO loop cycles through the other subjects. The fourth DO loop cycles through all the time points for each
subject.
The last two DO loops clear out the difference values to avoid carrying over any residual values.
data average_variance(keep=average total nonmissing);
array diff{%eval(&maxtime*&subjects)};
set variance1 end=lastone;
nonmissing+n(of diff1-diff%eval(&maxtime*&subjects));
total+sum(of diff1-diff%eval(&maxtime*&subjects));
if lastone=1 then
do;
average=total/nonmissing;
output;
end;
run;
The DATA step calculates the average of all differences. It accumulates the number of nonmissing
differences and the total of nonmissing differences. After reading all the difference values in the data set,
it calculates the average difference and writes the average to a data set.
proc print data=average_variance;
run;
%mend variance;
3. This program computes a likelihood ratio test that compares a random coefficients model with
the cubic effect of time with a random coefficients model with the quadratic effect of time for
the aids data set.
ods select none;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-7
If one of the variance components is 0, then the NOBOUND option should be used when
computing the likelihood ratio test.
ods select all;
likelihood ratio
test statistic
comparing degrees of
cubic model to quadratic model freedom p-value
23.9263 5 .000224306
The test statistic is clearly significant. Therefore, the cubic effect for time should remain in the
RANDOM statement.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-8 Appendix B Additional Resources
4. This program computes a likelihood ratio test for the fixed effects comparing the full model with a
model without the three non-significant interactions for the aids data set. The program only computes
the degrees of freedom correctly when there are no classification variables in the MODEL statement.
ods select none;
If a fixed effect is in the CLASS statement, then the degrees of freedom must be calculated
differently.
data _null_;
set fulldim;
if descr='Columns in X';
call symput('dffull',value);
run;
data _null_;
set subsetdim;
if descr='Columns in X';
call symput('dfsubset',value);
run;
data _null_;
set fullfit;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-9
data _null_;
set subsetfit;
if descr='-2 Log Likelihood';
call symput('subsetlr',value);
run;
The four DATA _null_ steps create four macro variables with the two values of fixed-effect parameters
and the two values of –2 log likelihoods.
data results;
testfull=&subsetlr-&fulllr;
dffull=&dffull-&dfsubset;
pvaluefull=1-probchi(testfull,dffull);
run;
The testfull variable has the likelihood ratio test statistic value and the dffull variable has the test statistic
degrees of freedom.
ods select all;
likelihood ratio
test statistic
comparing degrees of
Obs full model to subset model freedom p-value
1 2.37334 3 0.49862
The likelihood ratio test is clearly not significant. Therefore, the three interaction terms should be
eliminated from the model.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-10 Appendix B Additional Resources
Objectives
4
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
• PROC GENMOD now computes GEE diagnostics, which account for the
leverage and residuals in a set of observations to determine their influence
on regression parameter estimates and fitted values.
• PROC GENMOD also computes observation-deletion diagnostics and
cluster-deletion diagnostics.
• The diagnostics are generalizations of Cook’s D, dfbeta, and leverage for
general linear models and generalized linear models.
• There are no published cutoffs for the GEE diagnostic statistics.
5
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-11
Preisser and Qaqish (1996) proposed case-deletion regression diagnostics for the GEE methodology. The
diagnostics are an approximation to the difference in the estimated regression coefficients that one would
obtain upon deleting either one observation or one cluster. They proposed the computationally feasible
one-step approximation diagnostics, which are similar to the ones proposed by Pregibon (1981) for
generalized linear models. The authors also showed that the one-step diagnostic statistics were very good
approximations to their exact diagnostic counterparts and their computations were very fast. They
recommend that these diagnostic statistics be routinely used in data analysis for GEE models. However,
there are no published cutoffs for these diagnostic statistics.
Leverage
6
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
The leverage is simply the diagonal elements in the hat matrix, which corresponds to the amount
of influence the observation has on the fitted values. PROC GENMOD produces leverage values for both
observations and clusters.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-12 Appendix B Additional Resources
Cook’s D Statistic
7
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Cook’s D statistic (Cook 1977, 1979) measures the influence of observations on the estimated values
of the linear predictor. These diagnostic statistics measure the influence of a deleted observation or cluster
on the overall fit of the model. PROC GENMOD also produces these statistics for both observations and
clusters.
8
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-13
Another diagnostic statistic proposed by Preisser and Qaqish (1996) examines the lack of fit of a cluster.
The difference between this statistic and Cook’s D is that the cluster is deleted from both the numerator
and denominator. Cook’s D is useful in that the comparison of distances between clusters is meaningful
because they refer to the same metric. However, because the deleted cluster influences the estimate
of the variance, its inclusion might decrease the magnitude of the diagnostic and might hide influence.
Cook (1986) points out that the studentized diagnostic (Preisser lack-of-fit statistic) has a different
interpretation than the standardized version (Cook’s D). The studentized diagnostic has the interpretation
of the influence of the cluster on the parameter estimates and the variance estimate of the parameter
estimates simultaneously. Therefore, Preisser and Qaqish (1996) recommend that the question “influence
on what?” should be the determining factor in your choice of statistics.
Dfbetas
9
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.
PROC GENMOD computes four dfbeta statistics. The statistic dfbetac is the effect of deleting a cluster,
dfbetacs is the standardized version of the cluster deletion statistic, dfbetao is the effect of deleting an
observation, and dfbetaos is the standardized version of the observation deletion statistic.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-14 Appendix B Additional Resources
Example: Fit a GEE model on the keratotomy data and specify the unstructured correlation structure
and reference cell coding for gender with female as the reference cell. Produce the ODS
Statistical Graphics on the GEE diagnostic plots by requesting the cluster leverage plot, the
cluster Cook’s D plot, the cluster lack of fit plot, and the standardized cluster dfbeta plot and
label the observations by the case number. Also create a SAS data set with the diagnostic
statistics for both the observations and the clusters.
/* long0bd01.sas */
proc genmod data=sasuser.keratotomy desc plots(clusterlabel)=
(cleverage dcls mcls dfbetacs);
class patientid gender (param=ref ref='Female');
model unstable = age diameter gender visit / dist=bin;
repeated subject = patientid / type=unstr;
output out=diagnostics CLEVERAGE=cleverage CLUSTER=cluster
CLUSTERCOOKSD=clustercooksd DFBETAC=_all_ DFBETACS=_all
CLUSTERDFIT=clusterdfit COOKSD=cooksd LEVERAGE=leverage;
title 'GEE Model of Radial Keratotomy Surgery';
run;
Selected PROC GENMOD statement options:
PLOTS= specifies plots to be created using ODS Graphics. Many of the observational
statistics in the output data set can be plotted using this option. You are not
required to create an output data set in order to produce a plot.
Selected global plot option:
CLUSTERLABEL displays formatted levels of the SUBJECT= effect instead of plot symbols. This
option applies only to diagnostic statistics for models fit by GEEs that are plotted
against cluster number, and provides a way to identify cluster level names with
corresponding ordered cluster numbers.
Selected plot options:
CLEVERAGE plots the cluster leverage as a function of ordered cluster.
DCLS plots the cluster Cook’s distance statistic as a function of ordered cluster.
MCLS plots the studentized cluster Cook’s distance statistic as a function of ordered
cluster.
DFBETACS plots the standardized cluster deletion statistic as a function of ordered cluster for
each regression parameter in the model.
Selected OUTPUT statement keywords:
CLEVERAGE represents the leverage of a cluster.
CLUSTER represents the numerical cluster index, in order of sorted clusters.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-15
CLUSTERCOOKSD represents the Cook distance-type statistic to measure the influence of deleting an
entire cluster on the overall model fit.
DFBETAC= represents the effect of deleting an entire cluster on parameter estimates. If you
specify the keyword _all_ after the equal sign, variables named
DFBETAC_ParameterName will be included in the output data set to contain the
values of the diagnostic statistic to measure the influence of deleting the cluster
on the individual parameter estimates. ParameterName is the name of the
regression model parameter formed from the input variable names concatenated
with the appropriate levels, if classification variables are involved.
DFBETACS= represents the effect of deleting an entire cluster on normalized parameter
estimates. If you specify the keyword _all_ after the equal sign, variables named
DFBETACS_ParameterName will be included in the output data set to contain the
values of the diagnostic statistic to measure the influence of deleting the cluster
on the individual parameter estimates, normalized by their standard errors.
CLUSTERDFIT represents the studentized Cook distance-type statistic to measure the influence
of deleting an entire cluster on the overall model fit.
COOKSD represents the Cook distance type statistic to measure the influence of deleting a
single observation on the overall model fit.
LEVERAGE represents the leverage of a single observation.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-16 Appendix B Additional Resources
Several clusters appear to have high GEE diagnostic statistics. You should examine influential clusters
and determine whether they are erroneous. If these clusters are legitimate, then they might represent
important new findings. They also might indicate that your current model is inadequate.
The observation labels are difficult to see because so many observations fall in the same space in the
diagnostic plot.
Example: Print the clusters that exceed the 99th percentile for the distribution of Cook’s D for the cluster
and the cluster lack of fit statistic.
proc rank data=diagnostics groups=100 out=percentiles;
var clustercooksd clusterdfit;
ranks percentilecooksd percentiledfit;
run;
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-17
c
l
u c
s l
p t c u
a u d e l l s
t n i r e e t
i s g a c c v v e
e t e m v o o e e r
n a n e i o o r r d
t b d a t s k k a a f
i l e g e i s s g g i
d e r e r t d d e e t
The output shows which clusters had a high cluster Cook’s D or a high cluster lack of fit statistic.
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-18 Appendix B Additional Resources
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.