0% found this document useful (0 votes)
130 views420 pages

Lda - Sas

Uploaded by

Derek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views420 pages

Lda - Sas

Uploaded by

Derek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 420

Longitudinal Data Analysis

with Discrete and


Continuous Responses

Course Notes
Longitudinal Data Analysis with Discrete and Continuous Responses Course Notes was developed by
Mike Patetta. Additional contributions were made by Chris Daman and Jill Tao. Editing and production
support was provided by the Curriculum Development and Support Department.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks
of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and
product names are trademarks of their respective companies.

Longitudinal Data Analysis with Discrete and Continuous Responses Course Notes

Copyright © 2017 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States
of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior
written permission of the publisher, SAS Institute Inc.

Book code E70979, course code LWLONG42/LONG42, prepared date 21Mar2017. LWLONG42_001

ISBN 978-1-63526-093-9
For Your Information iii

Table of Contents

Chapter 1 Introduction to Longitudinal Data Analysis .................................. 1-1

1.1 Longitudinal Data Analysis Concepts .................................................................. 1-3

1.2 Exploratory Data Analysis ................................................................................1-13

Demonstration: Individual and Group Profiles ...............................................1-19

Demonstration: Cross-Sectional and Longitudinal Relationships.......................1-32

Exercises..................................................................................................1-45

1.3 Chapter Summary ...........................................................................................1-46

1.4 Solutions ........................................................................................................1-48

Chapter 2 Longitudinal Data Analysis with Continuous Responses ........... 2-1

2.1 General Linear Mixed Model.............................................................................. 2-3

Demonstration: Fitting a Longitudinal Model in PROC MIXED .......................2-25

Exercises..................................................................................................2-34

2.2 Evaluating Covariance Structures ......................................................................2-36

Demonstration: Sample Variogram ...............................................................2-47

Demonstration: Information Criteria.............................................................2-52

Exercises..................................................................................................2-57

2.3 Model Development and Interpretation...............................................................2-58

Demonstration: Heterogeneity in the Covariance Parameters ...........................2-61

Demonstration: Evaluating Fixed Effects ......................................................2-69

Demonstration: Illustrating Interactions ........................................................2-78

Exercises..................................................................................................2-84

2.4 Random Coefficient Models .............................................................................2-85

Demonstration: Random Coefficient Models .................................................2-94


iv For Your Information

Demonstration: Computing EBLUPs .......................................................... 2-108

Demonstration: Models with Random Effects and Serial Correlation .............. 2-115

Exercises................................................................................................ 2-122

2.5 Model Assessment......................................................................................... 2-123

Demonstration: Model Assessment ............................................................ 2-133

Exercises................................................................................................ 2-148

2.6 Chapter Summary ......................................................................................... 2-149

2.7 Solutions ...................................................................................................... 2-152

Chapter 3 Longitudinal Data Analysis with Discrete Responses ................ 3-1

3.1 Generalized Linear Mixed Models ....................................................................... 3-3

Demonstration: Exploratory Data Analysis Using Logit Plots ...........................3-31

Demonstration: Fitting Models with Binary Responses in PROC GLIMMIX .....3-36

Demonstration: Using the Sandwich Estimator in PROC GLIMMIX ................3-52

Exercises..................................................................................................3-57

3.2 Applications Using the GLIMMIX Procedure .....................................................3-58

Demonstration: Fitting Generalized Linear Mixed Models with an Ordinal


Response ...........................................................................3-64

Demonstration: Fitting Generalized Linear Mixed Models with Splines .............3-76

Exercises..................................................................................................3-81

3.3 GEE Regression Models...................................................................................3-82

Demonstration: Longitudinal Models Using GEE.......................................... 3-108

Exercises................................................................................................ 3-115

3.4 Chapter Summary ......................................................................................... 3-116

3.5 Solutions ...................................................................................................... 3-119


For Your Information v

Appendix A References ......................................................................................A-1

A.1 References ...................................................................................................... A-3

Appendix B Additional Resources .................................................................... B-1

B.1 Programs ........................................................................................................ B-3

B.2 Model Diagnostics for GEE Regression Models ................................................. B-10

Demonstration: GEE Diagnostic Plots ......................................................... B-14


vi For Your Information

To learn more…

For information about other courses in the curriculum, contact the SAS
Education Division at 1-800-333-7660, or send e-mail to [email protected].
You can also find this information on the web at
http://support.sas.com/training/ as well as in the Training Course Catalog.

For a list of other SAS books that relate to the topics covered in this
course notes, USA customers can contact the SAS Publishing Department
at 1-800-727-3228 or send e-mail to [email protected]. Customers outside
the USA, please contact your local SAS office.
Also, see the SAS Bookstore on the web at
http://support.sas.com/publishing/ for a complete list of books and a
convenient order form.
Chapter 1 Introduction to
Longitudinal Data Analysis

1.1 Longitudinal Data Analysis Concepts....................................................................... 1-3

1.2 Exploratory Data Analysis....................................................................................... 1-13


Demonstration: Individual and Group Profiles ........................................................... 1-19

Demonstration: Cross-Sectional and Longitudinal Relationships .................................. 1-32


Exercises............................................................................................................. 1-45

1.3 Chapter Summary.................................................................................................... 1-46

1.4 Solutions ................................................................................................................. 1-48


Solutions to Exercises ........................................................................................... 1-48

Solutions to Student Activities (Polls/Quizzes) ........................................................... 1-52


1-2 Chapter 1 Introduction to Longitudinal Data Analysis

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-3

1.1 Longitudinal Data Analysis Concepts

Objectives

• Understand how longitudinal data differ from cross-sectional data.


• Appreciate the merits of longitudinal data analysis.
• Explain the consequences of ignoring correlated observations.
• Document the strengths of the linear mixed model.

3
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Longitudinal Data Analysis

• The defining feature is that repeated measurements are taken on a subject


through time.
• The models can distinguish changes over time within subjects from
differences between subjects at their baseline levels and can also study
changes over time between groups.
• The models can estimate individual-level (subject-specific) regression
parameters and population-level regression parameters.

4
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-4 Chapter 1 Introduction to Longitudinal Data Analysis

The objectives of longitudinal data analysis are to examine and compare responses over time.
The defining feature of a longitudinal data model is its ability to study changes over time within subjects
and changes over time between groups. For example, longitudinal models can estimate individual-level
(subject-specific) regression parameters and population-level regression parameters.

Longitudinal data sets differ from time series data sets because longitudinal data usually consist of a large
number of a short series of time points. However, time series data sets usually consist of a single, long
series of time points (Diggle, Heagerty, Liang, and Zeger 2002). For example, the monthly average
of the Dow Jones Industrials Index for several years is a time series data set, and the efficacy of a drug
treatment over time for several patients is a longitudinal data set.

Cross-Sectional Analysis

5
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

To illustrate the ability of longitudinal models to study changes over time, consider cross -sectional studies
in which a single outcome is measured for each subject. In the slide above where each point represents
one subject, blood pressure appears to be positively related to age. However, you can reach no
conclusions regarding blood pressure changes over time within subjects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-5

Longitudinal Analysis

6
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Now expand the cross-sectional study of baseline data to a longitudinal study with repeated measurements
over time. The baseline data still show a positive relationship between blood pressure and age. However,
now you can distinguish changes over time within subjects from differences among subjects at their
baseline or initial starting values. Cross-sectional models cannot make this distinction (Diggle, Heagerty,
Liang, and Zeger 2002).

Examples of Longitudinal Studies

• Examine the rate of change in the prostate-specific antigen level to detect


prostate cancer in the early stages of the disease.
• Examine children over time to estimate the increase in risk of respiratory
infection for those who are vitamin A deficient while controlling for other
factors.

7
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-6 Chapter 1 Introduction to Longitudinal Data Analysis

An example of a longitudinal study is the Baltimore Longitudinal Study of Aging (Shoc k et al. 1984).
This is a multidisciplinary observational study where participants return approximately every two years
for three days of biomedical and psychological examinations. One objective of the study is to look for
markers, which can detect prostate cancer at an early stage. One marker with this potential is the prostate-
specific antigen (PSA), which is an enzyme produced by both normal and cancerous prostate cells.
Its level is related to the volume of prostate tissue. However, an elevated PSA level is not necessarily an
indicator of prostate cancer because patients with benign prostatic hyperplasia can also have increased
PSA levels. Therefore, researchers have hypothesized that the rate of change in the PSA level might
be a more accurate method of detecting prostate cancer in the early stages of the disease. A longitudinal
model can address this hypothesis.

Another example of a longitudinal study is the Indonesian children’s health study (Sommer 1982).
In this study more than 3000 children had quarterly medical exams for up to six visits to assess whether
they suffered from respiratory or diarrheal infection and xerophthalmia. One objective of the study was
to determine whether children who had a vitamin A deficiency were at increased risk of respiratory
infection.

Problems with OLS Regression

Models that assume the observations are independent might be


inappropriate for longitudinal data because
• measurements taken on the same subject tend to be more similar than
measurements taken on different subjects
• measurements taken close in time on the same subject tend to be more
similar than measurements taken far apart in time
• the variances of longitudinal data often change with time.

9
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Special methods of statistical analysis are needed for longitudinal data because the set of measurements
on one subject tends to be correlated, measurements on the same subject close in time tend to be more
highly correlated than measurements far apart in time, and the variances of longitudinal data often change
with time. These potential patterns of correlation and variation might combine to produce a complicated
covariance structure. This covariance structure must be taken into account to draw valid statistical
inferences. Therefore, standard regression and ANOVA models might produce invalid results because two
of the parametric assumptions (independent observations and equal variances) might not be valid.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-7

Variance-Covariance Matrix
for OLS Regression

Subject X Y

1 4 10 2
0
2 2 7  2

3 6 12 2
0
4 8 11 2
10
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

To illustrate the differences between longitudinal data models and other types of models, consider
the variance-covariance matrix of the response variable for cross-sectional data. If you have four
observations, you would have a 4-by-4 variance-covariance matrix with the variances on the main
diagonal and the covariances (a measure of how two observations vary together) on the off-diagonals.

In linear regression with continuous responses, the assumptions are that the responses have equal
variances and are independent. Therefore, the variances along the diagonal are equal and the covariances
along the off-diagonals are 0.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-8 Chapter 1 Introduction to Longitudinal Data Analysis

Longitudinal Data

Subject X Yt=1 Yt=2 Yt=3

1 4 10 6 6

2 2 7 5 3

3 6 12 9 8

4 8 11 14 16
11
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

With longitudinal data, there are now multiple measurements taken on each subject. You not only can
examine the differences between subjects, but you can also examine the change within subjects across
time. There are still only four subjects and the response is continuous. How does this change your
variance-covariance matrix?

Variance-Covariance Matrix
for Longitudinal Data

Subject T ime X Y
1 1 4 10
1
1
2
3
4
4
6
6
V1 0
2 1 2 7
2 2 2 5 V2
2 3 2 3
3 1 6 12
3 2 6 9 V3
3 3 6 8
4
4
4
1
2
3
8
8
8
11
14
16
0 V4
12
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-9

For longitudinal data models fit in the MIXED procedure, the number of observations is not the
number of subjects but rather the number of measurements taken on all the subjects. Because there are
three repeated measurements on each subject, you now have 12 observations and a 12-by-12 variance-
covariance matrix. For a simple longitudinal model, the matrix is now a block-diagonal matrix in which
the observations within each block (the block corresponds to a subject) are assumed to be correlated and
the observations outside of the blocks are assumed to be independent. In other words, the subjects are still
assumed to be independent of each other and the measurements within each subject are assumed to
be correlated.

Effect on
Time-Independent Predictor Variables
Ignoring Positive
Correlation

Accounting for
Positive Correlation
ˆ

13
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

If the observations are positively correlated, which often occurs with longitudinal data, then the variances
of the time-independent predictor variables (variables that estimate the group effect or between-subject
effect such as gender, race, treatment, and so on) are underestimated if the data are analyzed
as if the observations are independent. In other words, the Type I error rate (rejecting the null hypothesis
when it is true, also known as a false positive) is inflated for these variables (Dunlop 1994).

Details
Dunlop (1994) shows that the variance of the time-independent predictor variable is

2 2
(1   ) where  is the correlation between the errors within the subject. If the observations
n
are positively correlated within subject, then the variance of the time-independent predictor variable will
be underestimated if the data are analyzed as if all observations are independent.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-10 Chapter 1 Introduction to Longitudinal Data Analysis

Effect on
Time-Dependent Predictor Variables

Accounting for
Positive Correlation

Ignoring Positive
Correlation
ˆ

14
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

For time-dependent predictor variables (variables that measure the time effect or within-subject effect
such as how the measurements change over time), ignoring positive correlation leads to a variance
estimate that is too large. In other words, the Type II error rate (failing to reject the null hypothesis when
it is false, also known as a false negative) is inflated for these variables (Dunlop 1994). Because
the variances of the group effects will be underestimated and the variance of the time effects will
be overestimated if positive correlation is ignored, it is evident that correlated outcomes must
be addressed to obtain valid analyses.
Details
Dunlop (1994) shows that the variance of the time-dependent predictor variable is

2 2
(1   ) and in this situation, ignoring the positive correlation will lead to a variance estimate that
n
is too large.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.1 Longitudinal Data Analysis Concepts 1-11

Statistical Method for Repeated Continuous


Responses

The linear mixed model fit by PROC MIXED will be used in this course. The
strengths are as follows:
• Handles unbalanced data with unequally spaced time points and subjects
observed at different time points
• Uses all the complete time measurements in the analysis
• Directly models the covariance structure
• Provides valid standard errors and efficient statistical tests

15
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The linear mixed model allows a very flexible approach to modeling longitudinal data. The data structure
is the number of observations equals the number of measurements for all the subjects. This means that
the data do not have to be balanced.

An advantage of fitting linear mixed models is that PROC MIXED uses all the complete time
measurements in the analysis. This method differs from complete case analysis in which any observation
with a missing value across any of the time measurements is dropped from the analysis. The method
PROC MIXED uses, called a likelihood-based ignorable analysis, leads to a valid analysis when the
missing data are MAR (missing at random, which is a less restrictive assumption than missing completely
at random (MCAR)). If the probability of missing for a variable X is related to the values of X itself, even
after controlling for the other variables, then the value is not missing at random (NMAR). In other words,
the probability of missing depends on the unobserved values. The ignorable analysis is not valid and more
complex modeling is required (Verbeke and Molenberghs 1997).
PROC MIXED offers a wide variety of covariance structures. This enables the user to directly address
the within-subject correlation structure and incorporate it into a statistical model. By selecting a
parsimonious covariance model that adequately accounts for within-subject correlations, the user can
avoid the problems associated with univariate and multivariate ANOVA using PROC GLM (Littell,
Stroup, and Freund 2002).

A value is missing at random (MAR) if the probability that it is missing on a variable X is related to some
other measured variable (or variables) in the model but does not depend on any unobserved data after
controlling for the observed data. With the MAR assumption, a systematic relationship exists between one
or more measured variables and the probability of missing data. MAR is sometimes referred
to as ignorable missing since the missing data mechanism can be ignored and does not need to be taken
into account as part of the modeling process.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-12 Chapter 1 Introduction to Longitudinal Data Analysis

A value is missing completely at random (MCAR) if the probability that it is missing is independent
of the unobserved values. The formal definition of MCAR requires that the probability of missing data
on a variable X is unrelated to the values of X itself. In other words, the observed data values are a simple
random sample of the values that you would have observed if the data had been complete.

Model-Building Strategies

• Conduct an exploratory data analysis by illustrating the cross-sectional and


longitudinal relationships in the data.
• Fit a complex mean model and output OLS residuals.
• Use the OLS residuals to create a sample variogram to help select a
covariance structure.
• Eliminate unnecessary terms in the complex mean model.
• Evaluate model assumptions and identify potential outliers.

16
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The first step in any model-building process is to conduct a thorough exploratory data analysis.
For longitudinal data this involves plotting the individual measurements over time and fitting a smoothing
spline over time. Plotting different groups over time and illustrating cross -sectional and longitudinal
relationships are also important steps in exploratory data analysis.

The second step is to fit a complex mean model in PROC MIXED and output the ordinary least squares
residuals. These residuals can be used to create a sample variogram, and the pattern in the sample
variogram can be helpful in selecting a covariance structure.
The third step is to fit the linear mixed model in PROC MIXED using the selected covariance structure.
Eliminating unnecessary terms and fitting a parsimonious model are important steps in the model building
process. After a candidate model is selected, the final steps of the model building process are to evaluate
model assumptions and to identify potential outliers.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-13

1.01 Multiple Choice Poll

For time-independent predictor variables, if you ignore the positive


correlation among the repeated measurements, which of the following is
true?

a. The parameter estimates are biased downward.


b. The parameter estimates are biased upward.
c. The standard errors are underestimated.
d. The standard errors are overestimated.

17

17
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

1.2 Exploratory Data Analysis

Objectives

• Graph individual and group profiles.


• Illustrate how to identify cross-sectional and longitudinal patterns.

21
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-14 Chapter 1 Introduction to Longitudinal Data Analysis

Recommendations

• Graph as much of the relevant raw data as possible.


• Highlight aggregate patterns of potential scientific interest.
• Identify both cross-sectional and longitudinal patterns.
• Identify unusual individuals or observations.

22
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The first step in any model-building process is exploratory data analysis. In this step you create graphs
that expose the patterns relevant to the scientific question. The recommendations on the slide above,
given by Diggle, Heagerty, Liang, and Zeger (2002), are used to produce the graphs in this section
and the section on diagnostics.

Individual Profiles

Plotting observed profiles over time


• helps identify general trends within subjects
• might detect nonlinear change over time
• provides information about the variability at given times.

23
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-15

A scatter plot of the response versus time is a useful graph. Connecting the repeated measurements
for each subject over time shows you whether there is a discernible pattern common to most subjects.
These individual profiles can also provide some information about between-subject variability.

Individual Profiles

24
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

For example, the slide above is a graph of weight over time for several subjects. These individual profiles
illustrate several important patterns (Diggle, Heagerty, Liang, and Zeger 2002).
1. All of the subjects are gaining weight.
2. The subjects that are the heaviest at the beginning of the study tend to be the heaviest throughout
the study.
3. The variability of the measurements is smaller at the beginning of the study compared
to the end of the study.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-16 Chapter 1 Introduction to Longitudinal Data Analysis

Group Profiles

25
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Besides plotting the response over time, it is also useful to include different subgroups on the same graph
to illustrate the relationship between the response and an explanatory variable over time. For example,
in the slide above it appears that both males and females have decreasing blood pressures over time.
However, the slope for the males seems to be more pronounced than the slope for the females, which
might indicate an interaction between gender and time.

CD4+ Cell Numbers Data Set

CD4+ Cell
Numbers

Years since Seroconversion


26
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-17

Example: The human immune deficiency virus (HIV) causes AIDS by attacking an immune cell called
the CD4+ cell, which facilitates the body’s ability to fight infection. An uninfected person has
approximately 1100 cells per milliliter of blood. Because CD4+ cells decrease in number from
the time of infection, a person’s CD4+ cell count can be used to monitor disease progression.
A subset of the Multicenter AIDS Cohort Study (Kaslow et al. 1987) was obtained for 369
infected men to examine CD4+ cell counts over time. The data are stored in a SAS data set
called long.aids.
These are the variables in the data set:
CD4 CD4+ cell count.
time time in years since seroconversion (time when HIV becomes detectable).
age in years relative to arbitrary origin.
cigarettes packs of cigarettes smoked per day.
drug recreational drug use (1=yes, 0=no).
partners number of partners relative to arbitrary origin.
depression CES-D (a depression scale).
id subject identification number.
 The data were obtained with permission from Professor Peter Diggle’s website.

Objectives of CD4+ Cell Numbers Study

• Estimate the average time course of CD4+ cell depletion.


• Estimate the time course for individual men.
• Characterize the degree of heterogeneity across men in the rate of
progression.
• Identify factors that predict CD4+ cell changes.

27
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-18 Chapter 1 Introduction to Longitudinal Data Analysis

The researchers hope to characterize the typical time course of CD4+ cell depletion. This information can
clarify the relationship between HIV and the immune system, which might be helpful when counseling
infected men.
This observational longitudinal study has unbalanced data because the measurements can occur anytime
and the number of measurements can vary across subjects. The linear mixed model using PROC MIXED
is the model of choice for this analysis.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-19

Individual and Group Profiles

Example: Examine the data in long.aids by producing a line listing report and descriptive statistics for
the numeric variables. Also produce graphs of the individual profiles and the group profiles.
/* long01d01.sas */
options nodate;
proc print data=long.aids(obs=17);
var id cd4 time age cigarettes drug partners depression;
title 'Line Listing of CD4+ Data';
run;

Line Listing of CD4+ Data

Obs id CD4 time age cigarettes drug partners depression

1 10002 548 -0.74196 6.57 0 0 5 8


2 10002 893 -0.24641 6.57 0 1 5 2
3 10002 657 0.24367 6.57 0 1 5 -1
4 10005 464 -2.72964 6.95 0 1 5 4
5 10005 845 -2.25051 6.95 0 1 5 -4
6 10005 752 -0.22177 6.95 0 1 5 -5
7 10005 459 0.22177 6.95 0 1 5 2
8 10005 181 0.77481 6.95 0 1 5 -3
9 10005 434 1.25667 6.95 0 1 5 -7
10 10029 846 -1.24025 2.64 0 1 5 18
11 10029 1102 -0.74196 2.64 0 1 5 18
12 10029 801 -0.25188 2.64 0 1 5 38
13 10029 824 0.25188 2.64 0 1 5 7
14 10029 866 0.76934 2.64 0 1 5 15
15 10029 704 1.41273 2.64 0 1 5 21
16 10029 757 1.80698 2.64 0 1 5 25
17 10029 726 2.42026 2.64 0 1 5 29

The variable age is a time-independent variable and the variables cigarettes, drug, partners,
and depression are time-dependent variables. The data are unbalanced because the subjects are measured
at different time points and the number of measurements is different across subjects.
proc means data=long.aids n min max mean median std;
var cd4 time age cigarettes drug partners depression;
title 'Descriptive Statistics for CD4+ Data';
run;

Descriptive Statistics for CD4+ Data

The MEANS Procedure

Variable N Minimum Maximum Mean Median Std Dev


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
CD4 2376 10.0000000 3184.00 765.1313131 701.5000000 399.3715606
time 2376 -2.9897330 5.4592740 0.8284246 0.7296370 1.8782282
age 2376 -11.2900000 29.0800000 2.6359512 1.5100000 7.5039253
cigarettes 2376 0 4.0000000 0.9890572 0 1.4389639

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-20 Chapter 1 Introduction to Longitudinal Data Analysis

drug 2376 0 1.0000000 0.7558923 1.0000000 0.4296474


partners 2376 -5.0000000 5.0000000 -0.0340909 -1.0000000 3.6588315
depression 2376 -7.0000000 49.0000000 2.4957912 0 9.5863051
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

The outcome variable is CD4 with a range of 10 to 3184. The variable time has a range of nearly 3 years
before seroconversion (time when HIV becomes detectable) to 5.5 years after seroconversion.
The variable age is in years relative to an arbitrary origin. The variable cigarettes measures the number
of packs smoked per day with a range of 0 (non-smokers) to 4 (heavy smokers). The variable drug
is a binary variable where 1 means the subject used recreational drugs since the time of the last CD4+
measurement. The mean of 0.76 shows that 76% of the observations had indicated the usage
of recreational drugs since their last CD4+ cell count measurement. The variable partners measures
the number of partners relative to an arbitrary origin. The variable depression is a measure of depressive
symptoms where a higher score indicates greater depressive symptoms.
To compute descriptive statistics aggregated by subject, an output data set must be created in the MEANS
procedure aggregated by subject. A DATA step is used to create a new variable called druguse that
indicates whether the subject used recreational drugs at any time during the study period.
proc means data=long.aids noprint nway;
class id;
var cd4 age cigarettes drug partners depression;
output out=subject mean=avgid_cd4 avgid_age
avgid_cigarettes avgid_drug avgid_partners avgid_depression;
run;

data subject;
set subject;
druguse=(avgid_drug gt 0);
run;

proc means data=subject n min max mean median std;


var _freq_ avgid_cd4 avgid_age avgid_cigarettes
avgid_drug druguse avgid_partners avgid_depression;
title 'Descriptive Statistics for CD4+ Data '
'Aggregated by Subject';
run;
Selected PROC MEANS statement option:
NWAY causes the output data set to have only one observation for each level of the class
variable.
Descriptive Statistics for CD4+ Data Aggregated by Subject

The MEANS Procedure

Variable N Minimum Maximum Mean Median Std Dev


ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
_FREQ_ 369 1.0000000 12.0000000 6.4390244 6.0000000 2.7141327
avgid_cd4 369 245.6000000 1979.50 773.7727088 731.4000000 290.4222593
avgid_age 369 -11.2900000 29.0800000 2.3844444 1.3400000 7.4355005
avgid_cigarettes 369 0 4.0000000 1.0914632 0 1.3870508
avgid_drug 369 0 1.0000000 0.7728360 1.0000000 0.3373715
druguse 369 0 1.0000000 0.9024390 1.0000000 0.2971230

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-21

avgid_partners 369 -4.5555556 5.0000000 0.1284397 -0.4000000 2.7544843


avgid_depression 369 -6.8333333 40.5000000 2.5154360 1.0000000 7.7397510
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

The _FREQ_ variable indicates that 369 subjects participated in the study. The average number
of repeated measures was 6.4 with a range of 1 to 12. The average of the average CD4+ cell count was
773.77 with one subject having the lowest average of 245.6 and another subject having the highest
average of 1979.5. The variable druguse indicates that 90% of the participants used recreational drugs
during the study period. The average of the average depression score was 2.5 with one subject having
the lowest average of –6.8 and another subject having the highest average of 40.5.
To create a graph of individual profiles, plot CD4 versus time by subject identification number.
proc sgplot data=long.aids nocycleattrs noautolegend;
series y=cd4 x=time / group=id
lineattrs=(color=blue pattern=1);
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 3500 by 500) label='CD4 Cell Counts';
title 'Individual Profiles of the CD4+ Data';
run;
Selected PROC SGPLOT statement options:
CYCLEATTRS | NOCYCLEATTRS specifies whether plots are drawn with unique attributes in the
graph. By default, the SGPLOT procedure automatically assigns
unique attributes in many situations, depending on the types of
plots that you specify. If the plots do not have unique attributes
by default, then the CYCLEATTRS option assigns unique
attributes to each plot in the graph. The NOCYCLEATTRS
option prevents the procedure from assigning unique attributes.
NOAUTOLEGEND disables automatic legends from being generated. By default,
legends are created automatically for some plots, depending
on their content. This option has no effect if you specify
a KEYLEGEND statement.

Selected SGPLOT procedure statements:


SERIES creates a line plot.
XAXIS specifies options for the X-axis.
YAXIS specifies options for the Y-axis.
Selected SERIES statement options:
X= variable specifies the variable for the X-axis.
Y= variable specifies the variable for the Y-axis.
GROUP= variable specifies a variable that is used to group the data. A separate
plot is created for each unique value of the grouping variable.
The plot elements for each group value are automatically
distinguished by different visual attributes.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-22 Chapter 1 Introduction to Longitudinal Data Analysis

LINEATTRS= style-element specifies the appearance of the series line. You can specify the
appearance by using a style element or by using suboptions.
If you specify a style element, you can in addition specify
suboptions to override specific appearance attributes.

Selected LINEATTRS= options:


COLOR= color specifies the color of the line.
PATTERN= line-pattern specifies the line pattern for the line.

Selected XAXIS statement option:


VALUES= ( value-1 < ...value-n >) specifies the values for the ticks on the axis.

The individual profiles plot is essentially useless. This is a common problem when there are many
subjects in a data set.
A more meaningful plot is an overlay plot of the individual profiles and the average trend. Therefore,
a smoothed line is fitted using a penalized B-spline curve. The individual profiles serve as a light
background and the average trend is a dark line in the foreground. This strategy was suggested by Tufte
(1990) where communication of statistical information is enhanced by adding detail in the background.
proc sgplot data=long.aids nocycleattrs noautolegend;
series y=cd4 x=time / group=id transparency=0.5
lineattrs=(color=cyan pattern=1);

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-23

pbspline y=cd4 x=time / nomarkers smooth=50 nknots=5


lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 3500 by 500) label='CD4 Cell Counts';
title 'Individual Profiles with the Average Trend Line';
run;
Selected SGPLOT procedure statement:
PBSPLINE creates a fitted penalized B-spline curve.
Selected SERIES statement option:
TRANSPARENCY= specifies the degree of transparency for the lines and markers. Specify a value
from 0.0 (completely opaque) to 1.0 (completely transparent). The default is 0.

Selected PBSPLINE statement options:


X= numeric-variable specifies the variable for the X-axis.
Y= numeric-variable specifies the variable for the Y-axis.
NOMARKERS removes the scatter markers from the plot.
SMOOTH= specifies a smoothing parameter value. If you do not specify this option, a
smoothing value is determined automatically.

NKNOTS= specifies the number of evenly spaced internal knots. The default is 100.
Selected LINEATTRS= option:
THICKNESS= specifies the thickness of the line. You can also specify the unit of measure.
The default unit is pixels.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-24 Chapter 1 Introduction to Longitudinal Data Analysis

The average trend shows that the CD4+ cell count is fairly constant around 1000 but drops off around
the time of seroconversion. The rate of CD4+ cell loss seems to be more rapid immediately after
seroconversion. The relationship between CD4+ cell counts and time seems to be cubic in nature.
To highlight the group profiles, smoothed lines representing different subgroups were fitted. The first
graph is by recreational drug usage.
proc format;
value druggrp 0='no recreational drug use'
1='recreational drug use';
run;

proc sgplot data=long.aids;


pbspline y=cd4 x=time / group=drug nomarkers smooth=50 nknots=5
lineattrs=(thickness=3) name="drug";
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 1500 by 500) label='CD4 Cell Counts';
keylegend "drug";
format drug druggrp.;
title 'Drug Usage Subgroups';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-25

Selected SGPLOT procedure statement:


KEYLEGEND adds a legend to the plot. The argument in the statement (“name”) specifies the
names of one or more plots that you want to include in legend. Each name that
you specify must correspond to a value that you entered for the NAME= option
in a plot statement. If you do not specify a name, then the legend contains
references to all of the plots in the graph.

Selected PBSPLINE statement option:


NAME= specifies a name for the plot. You can use the name to refer to this plot in other
statements.

There seems to be very little difference in the trends of the two recreational drug groups.
To define the cigarette usage subgroups, collapse the levels of cigarettes into three groups. Observations
with no cigarette usage are in group 1, observations with one to two packs smoked per day are in group 2,
and observations with three to four packs smoked per day are in group 3. The new variable ciggroup is
added to long.aids for further use.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-26 Chapter 1 Introduction to Longitudinal Data Analysis

data long.aids;
set long.aids;
ciggroup=1*(cigarettes=0)+2*(0<cigarettes<=2)+3*(2<cigarettes<=4);
run;

proc format;
value cgroup 1='non-smoker'
2=1 to 2 packs per day'
3='3 or more packs per day';
run;

proc sgplot data=long.aids;


pbspline y=cd4 x=time / group=ciggroup nomarkers smooth=50 nknots=5
lineattrs=(thickness=3) name="cigarette";
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 1500 by 500) label='CD4 Cell Counts';
keylegend "cigarette";
format ciggroup cgroup.;
title 'Cigarette Usage Subgroups';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-27

There seems to be a difference between the cigarette usage subgroups. Heavy cigarette users seem to have
a much more rapid rate of CD4+ cell loss compared to non-smokers. The difference in the smoothed lines
might indicate a time by cigarette interaction.
To define the age subgroups, collapse the levels of age into four groups. Observations in the first quartile
are in group 1, observations in the second quartile are in group 2, observations in the third quartile are in
group 3, and observations in the fourth quartile are in group 4.
proc rank data=long.aids groups=4 out=ageranks;
var age;
ranks agegroup;
run;

proc format;
value quartile 0='1st quartile'
1='2nd quartile'
2='3rd quartile'
3='4th quartile';
run;

proc sgplot data=ageranks;


pbspline y=cd4 x=time / group=agegroup nomarkers smooth=50 nknots=5
lineattrs=(thickness=3) name="age";
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 1500 by 500) label='CD4 Cell Counts';
keylegend "age";
format agegroup quartile.;
title 'Age Subgroups';
run;
Selected PROC RANK statement option:
GROUPS=n bins the variables into n groups.
Selected RANK procedure statement:

RANKS names the group indicators in the OUT= data set. If the RANKS statement is omitted,
then the group indicators replace the VAR variables in the OUT= data set.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-28 Chapter 1 Introduction to Longitudinal Data Analysis

There seems to be no difference between the four age groups with regard to the trend of CD4+ cell loss.
To define the number of partner subgroups, collapse the levels of partners into four groups.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-29

proc rank data=long.aids groups=4 out=partner_ranks;


var partners;
ranks partnergroup;
run;

proc sgplot data=partner_ranks;


pbspline y=cd4 x=time / group=partnergroup nomarkers smooth=50
nknots=5 lineattrs=(thickness=3) name="partner";
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 1500 by 500) label='CD4 Cell Counts';
keylegend "partner";
format partnergroup quartile.;
title 'Partner Subgroups';
run;

There seems to be no difference between the four partner groups with regard to the trend of CD4+ cell
loss.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-30 Chapter 1 Introduction to Longitudinal Data Analysis

To define the depression subgroups, collapse the levels of depression into four groups.
proc rank data=long.aids groups=4 out=depressranks;
var depression;
ranks depressgroup;
run;

proc sgplot data=depressranks;


pbspline y=cd4 x=time / group=depressgroup nomarkers smooth=50
nknots=5 lineattrs=(thickness=3) name="depress";
xaxis values=(-3 to 5.5 by 0.5) label='Years since Seroconversion';
yaxis values=(0 to 1500 by 500) label='CD4 Cell Counts';
keylegend "depress";
format depressgroup quartile.;
title 'Depression Subgroups';
run;

There seems to be no difference between the four depression groups with regard to the trend of CD4+ cell
loss.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-31

Cross-Sectional versus Longitudinal Relationship

Scatter plot of cross-sectional relationship


• baseline Y versus baseline X

Scatter plot of longitudinal relationship


• change in Y versus change in X

29
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

One of the recommendations in exploratory data analysis is to display both the cross -sectional and
longitudinal relationships between the response variable and the time-dependent explanatory variables.
The cross-sectional relationship can be displayed by a scatter plot of the baseline (or initial value) CD4+
cell count versus the baseline explanatory variable values. The longitudinal relationship can be displayed
by a scatter plot of the change in CD4+ cell counts (Y at time t – Y at time 1) versus the change in the
explanatory variable values (X at time t – X at time 1). Fitting a smooth curve in the scatter plot can
indicate whether there is evidence of a relationship (Diggle, Heagerty, Liang, and Zeger 2002).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-32 Chapter 1 Introduction to Longitudinal Data Analysis

Cross-Sectional and Longitudinal Relationships

Example: Create five cross-sectional scatter plots of the baseline CD4+ cell counts versus the baseline
of age, recreational drug use, cigarettes, depression score, and number of partners. Also create
four longitudinal scatter plots of the change in CD4+ cell counts versus the change in
recreational drug use, cigarettes, depression score, and number of partners. Fit a penalized
B-spline curve in the plots with continuous covariates and a regression line in the plots with
binary covariates.
/* long01d02.sas */
data aids1 aids2;
set long.aids;
by id;
retain basecd4 basedrug basedepress basecig basepart;
if first.id then
do;
basecd4=cd4;
basedrug=drug;
basedepress=depression;
basecig=cigarettes;
basepart=partners;
output aids1;
end;
if not first.id then
do;
chngcd4=cd4-basecd4;
chngdrug=drug-basedrug;
chngdepress=depression-basedepress;
chngcig=cigarettes-basecig;
chngpart=partners-basepart;
output aids2;
end;
run;
The first step is to create the baseline variables (in aids1) and the difference from time t and baseline
variables (in aids2). Because the data are sorted by subject id and time, BY-group processing is used
to identify the first observation by subject id. The first observation is used to assign the baseline variable
values. The RETAIN statement is used to retain the baseline values across the executions of the DATA
step. These baseline values are used to create the change in CD4+ cell count and the change in the
covariates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-33

proc sgplot data=aids1 noautolegend;


scatter y=basecd4 x=age / markerattrs=(color=cyan symbol=circle);
pbspline y=basecd4 x=age / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(-12 to 30 by 2) label='Baseline Age';
yaxis values=(0 to 3000 by 1000) label='Baseline CD4+';
title 'Baseline CD4+ Cells vs. Baseline Age';
run;

There seems to be a slight positive relationship between baseline CD4+ cell counts and the age of
the patient.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-34 Chapter 1 Introduction to Longitudinal Data Analysis

proc sgplot data=aids1 noautolegend;


scatter y=basecd4 x=basedrug / markerattrs=(color=cyan
symbol=circle);
reg y=basecd4 x=basedrug / nomarkers
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(0 to 1 by .1) label='Baseline Recreational Drug Use';
yaxis values=(0 to 3000 by 1000) label='Baseline CD4+';
title 'Baseline CD4+ Cells vs. Baseline Recreational Drug Use';
run;
Selected SGPLOT procedure statement:
REG creates a fitted regression line or curve.

There seems to be a slight positive relationship between baseline CD4+ cell count and baseline
recreational drug use.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-35

proc sgplot data=aids1 noautolegend;


scatter y=basecd4 x=basedepress / markerattrs=(color=cyan
symbol=circle);
pbspline y=basecd4 x=basedepress / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(-10 to 50 by 10) label='Baseline Depression Score';
yaxis values=(0 to 3000 by 1000) label='Baseline CD4+';
title 'Baseline CD4+ Cells vs. Baseline Depression Score';
run;

There seems to a slight upward trend between the baseline values of CD4+ cell counts and the baseline
values of the depression scores.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-36 Chapter 1 Introduction to Longitudinal Data Analysis

proc sgplot data=aids1 noautolegend;


scatter y=basecd4 x=basecig / markerattrs=(color=cyan
symbol=circle);
pbspline y=basecd4 x=basecig / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(0 to 4 by 1) label='Baseline Cigarette Usage';
yaxis values=(0 to 3000 by 1000) label='Baseline CD4+';
title 'Baseline CD4+ Cells vs. Baseline Cigarette Usage';
run;

The graph shows a positive relationship between the baseline values of CD4+ cell counts and the baseline
values of number of packs of cigarettes smoked per day. It seems that heavy smokers have higher CD4+
cell counts than non-smokers. This was also shown in the cigarette subgroup profile plot.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-37

proc sgplot data=aids1 noautolegend;


scatter y=basecd4 x=basepart / markerattrs=(color=cyan
symbol=circle);
pbspline y=basecd4 x=basepart / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(-5 to 5 by 1) label='Baseline Partners';
yaxis values=(0 to 3000 by 1000) label='Baseline CD4+';
title 'Baseline CD4+ Cells vs. Baseline Partners';
run;

There is little evidence of a relationship between baseline CD4+ cell counts and baseline number
of partners. The uptick in baseline CD4+ at the most extreme left of the plot is due to only one
observation.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-38 Chapter 1 Introduction to Longitudinal Data Analysis

proc sgplot data=aids2 noautolegend;


scatter y=chngcd4 x=chngdrug / markerattrs=(color=cyan
symbol=circle);
pbspline y=chngcd4 x=chngdrug / nomarkers smooth=50 nknots=3
lineattrs=(color=blue pattern=1 thickness=3);
refline 0;
xaxis values=(-1 to 1 by .1) label='Change in Recreational Drug
Use';
yaxis values=(-2500 to 2500 by 500) label='Change CD4+';
title 'Change CD4+ Cells vs. Change in Recreational Drug Use';
run;
Selected SGPLOT procedure statement:
REFLINE creates a horizontal or vertical reference line.

There seems to be no relationship between the change in CD4+ cell counts and the change in recreational
drug use. The only noticeable pattern is that the patients who had no change in their recreational drug use
had the smallest decreases in CD4+ cell counts.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-39

proc sgplot data=aids2 noautolegend;


scatter y=chngcd4 x=chngdepress / markerattrs=(color=cyan
symbol=circle);
pbspline y=chngcd4 x=chngdepress / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
refline 0;
xaxis values=(-50 to 50 by 10) label='Change in Depression Score';
yaxis values=(-2500 to 2500 by 500) label='Change CD4+';
title 'Change CD4+ Cells vs. Change in Depression Score';
run;

There is some evidence that there is a negative relationship between the change in CD4+ cell counts
and the change in depression score. This implies that a decrease in CD4+ cell counts is associated with
an increase in depression.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-40 Chapter 1 Introduction to Longitudinal Data Analysis

proc sgplot data=aids2 noautolegend;


scatter y=chngcd4 x=chngcig / markerattrs=(color=cyan
symbol=circle);
pbspline y=chngcd4 x=chngcig / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
refline 0;
xaxis values=(-4 to 4 by 1) label='Change in Cigarette Usage';
yaxis values=(-2500 to 2500 by 500) label='Change CD4+';
title 'Change CD4+ Cells vs. Change in Cigarette Usage';
run;

There is some evidence that there is a positive relationship between the change in CD4+ cell counts
and the change in the number of packs smoked per day. This implies that a decrease in CD4+ cell counts
is associated with a decrease in smoking.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-41

proc sgplot data=aids2 noautolegend;


scatter y=chngcd4 x=chngpart / markerattrs=(color=cyan
symbol=circle);
pbspline y=chngcd4 x=chngpart / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
refline 0;
xaxis values=(-10 to 10 by 2) label='Change in Partners';
yaxis values=(-2500 to 2500 by 500) label='Change CD4+';
title 'Change CD4+ Cells vs. Change in Partners';
run;

There seems to be a strong positive relationship between the change in CD4+ cell counts and the change
in the number of partners. This implies that a decrease in CD4+ cell counts is associated with a decrease
in the number of partners.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-42 Chapter 1 Introduction to Longitudinal Data Analysis

Summary of Exploratory Data Analysis

• There seems to be a cubic relationship between CD4+ cell count and time.
• The group profile plots show a time by cigarette usage interaction.
• The cross-sectional plots show a positive relationship between the baseline
CD4+ cell counts and the baseline cigarette usage.
• The longitudinal plots show a positive relationship between the change in
CD4+ cell counts and the change in the number of partners.

31
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Careful exploratory data analysis might help you identify scientifically relevant variables to include in
your candidate model. In the candidate model for PROC MIXED, the exploratory results indicate to at
least include the quadratic and cubic effects of time and the time by cigarette interaction. The results
of the cross-sectional and longitudinal plots might help you understand the degree of heterogeneity across
men in the rate of CD4+ cell count depletion.

SAS Studio

SAS Studio is the new browser-based SAS programming environment.

32
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-43

SAS Studio is the new browser-based SAS programming environment that you can use for data
exploration and analysis. A tutorial on SAS Studio can be found at:
https://support.sas.com/training/tutorial/studio/get-started.html.

Interactive Mode

Some SAS procedures, such as PROC GLM, are interactive. That means they
remain active until you submit a QUIT statement, or until you submit a new
PROC or DATA step.

In SAS Studio, you can use the code editor to run these procedures, as well as
other SAS procedures, in interactive mode.

By default, SAS Studio does not


run in interactive mode.
This icon in SAS Studio toggles
interactive mode on and off.

33
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Some procedures, such as GLM, are interactive, meaning they remain active until you submit a QUIT
statement, or a new PROC or DATA step. You can run these procedures interactively in SAS Studio using
the code editor. However, you must enable the interactive mode by using the icon shown above to activate
the interactive mode.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-44 Chapter 1 Introduction to Longitudinal Data Analysis

Considerations for Running in


Interactive Mode

• Interactive mode starts a new SAS session.


• Librefs and macro variables used in the course must be defined for each
new SAS session.

SAS Studio Documentation:


http://support.sas.com/software/products/sasstudio/#s1=2

34
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Running SAS Studio in interactive mode starts a new SAS session. This means that library references and
macro variables must be defined for each new session. More information can be found at the SAS Studio
documentation: http://support.sas.com/software/products/sasstudio/#s1=2

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.2 Exploratory Data Analysis 1-45

Exercises

1. Conducting an Exploratory Data Analysis


A pharmaceutical firm conducted a clinical trial to examine heart rates among patients. Each patient
was subjected to one of three possible drug treatment levels: drug a, drug b, and a placebo. A baseline
measurement was taken and the heart rates were recorded at five unequally spaced time intervals:
1 minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour. The data are stored in the SAS data set
long.heartrate.
These are the variables in the data set:

heartrate heart rate


patient patient identification number
drug drug treatment level (a, b, and p)

hours time point heart rate was recorded (0.01677, 0.08333, 0.25000, 0.5000, 1.000)
baseline baseline heart rate.
a. Submit the program long00d01.sas. Print the first 25 observations in the data set long.heartrate.
Then create an output data set using PROC MEANS with the mean heart rate for each patient by
drug. Also include the baseline heart rate in the data set. Then generate descriptive statistics for
the mean patient heart rate and baseline by drug.

1) Are the patients and measurements of the heart rates sorted?


2) Do there appear to be any differences in the means of baseline and the mean patient heart rate
by drug?
b. Generate an individual profiles plot with an average trend line using PROC SGPLOT. Use 50
as the smoothing factor with 5 knots in the PBSPLINE statement.
1) What is the general pattern of heartrate by hours?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-46 Chapter 1 Introduction to Longitudinal Data Analysis

1.02 Multiple Choice Poll

What is the general pattern of heart rate by hours?

a. The average heart rate appears to be increasing over time.


b. The average heart rate appears to be decreasing over time.
c. The average heart rate appears to have a quadratic relationship with
time.
d. The average heart rate appears to be constant over time.

36

36
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

1.3 Chapter Summary


The objectives of longitudinal data analysis are to examine and compare responses over time.
The defining feature of a longitudinal data model is its ability to study changes over time within subjects
and changes over time between groups.

Special methods of statistical analysis are needed for longitudinal data because the set of measurements
on one subject tend to be correlated, measurements on the same subject close in time tend to be more
highly correlated than measurements far apart in time, and the variances of longitudinal data often change
with time. These potential patterns of correlation and variation might combine to produce a complicated
covariance structure. This covariance structure must be taken into account to draw valid statistical
inferences. Therefore, standard regression and ANOVA models might produce invalid results because two
of the parametric assumptions (independent observations, equal variances) might not hold.

If the observations are positively correlated, which often occurs with longitudinal data, then the variance
of the time-independent predictor variables are underestimated if the data are analyzed as if the
observations are independent. In other words, the Type I error rate is inflated for these variables.

For time-dependent predictor variables, ignoring positive correlation leads to a variance estimate that is
too large. In other words, the Type II error rate is inflated for these variables.
The linear mixed model allows a flexible approach to modeling longitudinal data. The linear mixed model
 handles unbalanced data with unequally spaced time points and subjects observed at different time
points
 uses all the available data in the analysis
 directly models the covariance structure
 provides valid standard errors and efficient statistical tests.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.3 Chapter Summary 1-47

The first step in any model-building process is exploratory data analysis. In this step you create graphs
that expose the patterns relevant to the scientific question. General recommendations are
 graph as much of the relevant raw data as possible
 highlight aggregate patterns of potential scientific interest
 identify both cross-sectional and longitudinal patterns
 identify unusual individuals or observations.

A meaningful plot is an overlay plot of the individual profiles and the average trend. A smoothed line
representing the average trend can be fitted using a spline routine. The individual profiles can serve as a
light background and the average trend can be a dark line in the foreground.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-48 Chapter 1 Introduction to Longitudinal Data Analysis

1.4 Solutions
Solutions to Exercises
1. Conducting an Exploratory Data Analysis
A pharmaceutical firm conducted a clinical trial to examine heart rates among patients. Each patient
was subjected to one of three possible drug treatment levels: drug a, drug b, and a placebo. A baseline
measurement was taken and the heart rates were recorded at five unequally spaced time intervals: 1
minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour. The data are stored in the SAS data set
long.heartrate.

These are the variables in the data set:


heartrate heart rate
patient patient identification number
drug drug treatment level (a, b, and p)

hours time point heart rate was recorded (0.01677, 0.08333, 0.25000, 0.5000, 1.000)
baseline baseline heart rate.

a. Submit the program long00d01.sas. Print the first 25 observations in the data set long.heartrate.
Then create an output data set using PROC MEANS with the mean heart rate for each patient by
drug. Also include the baseline heart rate in the data set. Then generate descriptive statistics for
the mean patient heart rate and baseline by drug.
proc print data=long.heartrate(obs=25);
title 'Line Listing of Heart Rate Data';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-49

Line Listing of Heart Rate Data

Obs patient drug baseline hours heartrate

1 201 p 92 0.01667 76
2 201 p 92 0.08333 84
3 201 p 92 0.25000 88
4 201 p 92 0.50000 96
5 201 p 92 1.00000 84
6 202 b 54 0.01667 58
7 202 b 54 0.08333 60
8 202 b 54 0.25000 60
9 202 b 54 0.50000 60
10 202 b 54 1.00000 64
11 203 p 84 0.01667 86
12 203 p 84 0.08333 82
13 203 p 84 0.25000 84
14 203 p 84 0.50000 86
15 203 p 84 1.00000 82
16 204 a 72 0.01667 72
17 204 a 72 0.08333 68
18 204 a 72 0.25000 68
19 204 a 72 0.50000 78
20 204 a 72 1.00000 72
21 205 b 80 0.01667 84
22 205 b 80 0.08333 84
23 205 b 80 0.25000 96
24 205 b 80 0.50000 92
25 205 b 80 1.00000 72

1) The patients and the measurement times are sorted.

proc means data=long.heartrate noprint nway;


id baseline;
class patient drug;
var heartrate;
output out=subject mean=avgpat_heart;
run;

proc means data=subject n min max mean median std;


class drug;
var avgpat_heart baseline;
title 'Descriptive Statistics for Heart Rate Data Aggregated by '
'Subject';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-50 Chapter 1 Introduction to Longitudinal Data Analysis

Descriptive Statistics for Heart Rate Data Aggregated by Subject

The MEANS Procedure

N
drug Obs Variable N Minimum Maximum Mean Median
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
a 8 avgpat_heart 8 56.0000000 91.2000000 77.5000000 76.8000000
baseline 8 60.0000000 100.0000000 80.7500000 81.0000000

b 8 avgpat_heart 8 60.4000000 93.6000000 83.6500000 86.8000000


baseline 8 54.0000000 104.0000000 83.2500000 88.0000000

p 8
avgpat_heart 8 66.8000000 88.4000000 78.8500000 81.6000000
baseline 8 68.0000000 102.0000000 84.7500000 86.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
N
drug Obs Variable Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
a 8 avgpat_heart 12.1245913
baseline 12.8257553

b 8 avgpat_heart 10.5536723
baseline 14.9642431

p 8
avgpat_heart 7.9840913
baseline 11.2090525
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

2) Patients on the placebo have the highest baseline mean while patients on drug b have the
highest mean heart rate. The differences are relatively small across treatment groups.

b. Generate an individual profiles plot with an average trend line using PROC SGPLOT. Use 50 as
the smoothing factor with 5 knots in the PBSPLINE statement.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1.4 Solutions 1-51

proc sgplot data=long.heartrate nocycleattrs noautolegend;


series y=heartrate x=hours / group=patient lineattrs=(color=cyan
pattern=1);
pbspline y=heartrate x=hours / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(0 to 1 by 0.1) label='Time Point Heart Rate was
Recorded in Hours';
yaxis values=(40 to 120 by 10) label='Heart Rate';
title 'Individual Profiles of the Heart Rate Data';
run;

1) The average heart rate appears to decrease over time.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
1-52 Chapter 1 Introduction to Longitudinal Data Analysis

Solutions to Student Activities (Polls/Quizzes)

1.01 Multiple Choice Poll – Correct Answer

For time-independent predictor variables, if you ignore the positive


correlation among the repeated measurements, which of the following is
true?

a. The parameter estimates are biased downward.


b. The parameter estimates are biased upward.
c. The standard errors are underestimated.
d. The standard errors are overestimated.

18

18
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

1.02 Multiple Choice Poll – Correct Answer

What is the general pattern of heart rate by hours?

a. The average heart rate appears to be increasing over time.


b. The average heart rate appears to be decreasing over time.
c. The average heart rate appears to have a quadratic relationship with
time.
d. The average heart rate appears to be constant over time.

37

37
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 2 Longitudinal Data
Analysis with Continuous Responses

2.1 General Linear Mixed Model ..................................................................................... 2-3


Demonstration: Fitting a Longitudinal Model in PROC MIXED ..................................... 2-25
Exercises............................................................................................................. 2-34

2.2 Evaluating Covariance Structures .......................................................................... 2-36


Demonstration: Sample Variogram .......................................................................... 2-47

Demonstration: Information Criteria ......................................................................... 2-52


Exercises............................................................................................................. 2-57

2.3 Model Development and Interpretation................................................................... 2-58


Demonstration: Heterogeneity in the Covariance Parameters ...................................... 2-61

Demonstration: Evaluating Fixed Effects .................................................................. 2-69


Demonstration: Illustrating Interactions .................................................................... 2-78

Exercises............................................................................................................. 2-84

2.4 Random Coefficient Models .................................................................................... 2-85


Demonstration: Random Coefficient Models ............................................................. 2-94
Demonstration: Computing EBLUPs ...................................................................... 2-108

Demonstration: Models with Random Effects and Serial Correlation ............................2-115

Exercises........................................................................................................... 2-122

2.5 Model Assessment ................................................................................................ 2-123


Demonstration: Model Assessment ....................................................................... 2-133

Exercises........................................................................................................... 2-148

2.6 Chapter Summary.................................................................................................. 2-149

2.7 Solutions ............................................................................................................... 2-152


2-2 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Solutions to Exercises ......................................................................................... 2-152


Solutions to Student Activities (Polls/Quizzes) ......................................................... 2-194

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-3

2.1 General Linear Mixed Model

Objectives

• Learn the concepts regarding the general linear mixed model.


• Illustrate the various covariance structures available in the MIXED
procedure.
• Fit a general linear mixed model in PROC MIXED.

3
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

General Linear Model

y    
where y is the vector of observed responses

 is the design matrix of predictor variables

 is the vector of regression parameters

 is the vector of random errors.

4
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-4 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The general linear mixed model is an extension of the general linear model. The standard linear
regression model, which is used in the GLM procedure, models the mean of the response variable
by using the regression parameters. The random errors are assumed to be independent and normally
distributed with a mean of 0 and a common variance. If the parametric assumptions are valid (other than
the normality assumption), then the estimated regression parameters are the best linear unbiased estimates
(BLUE).

General Linear Mixed Model

y      
where
 is the design matrix of random variables

 is the vector of random-effect


parameters

 is no longer required to be independent and


homogeneous.

5
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The general linear mixed model extends the general linear model by the addition of random effect
parameters and by allowing a more flexible specification of the covariance matrix of the random errors.
For example, general linear mixed models allow for both correlated error terms and error terms with
heterogeneous variances. The matrix Z can contain continuous or dummy predictor variables, just like
the matrix X. The name mixed model indicates that the model contains both fixed-effect parameters and
random-effect parameters.

In the longitudinal model proposed by Diggle, Heagerty, Liang, and Zeger (2002), it is assumed that
the error terms have a constant variance and can be decomposed as

 i   (1)i   (2)i
where

(1)i
is the measurement error reflecting the variation added by the measurement process.

 (2)i
is the error associated with the serial correlation in which times closer together are more
correlated than times farther apart.

i denotes the subject.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-5

If you assume that the measurement errors have an independent covariance structure (   ), then you
2

should concern yourself only with covariance structures that reflect the serial correlation.

Fixed Effects

Represents all possible


levels of the variable.
A
Drug B
Represents all levels
C in which inferences
are to be made.

6
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Variable effects are either fixed or random depending on how the levels of the variables that appear in the
study are selected. For example, the above slide represents a clinical trial analyzing the effectiveness of
three drugs. If the three drugs are the only candidates for the clinical trial and the conclusions of the
clinical trial are restricted to just those three drugs, then the effect of the variable drug is a fixed effect.

Fixed and Random Effects

A
Drug B Fixed Effect

C
Levels represent only
a random sample of a larger
7 set of potential levels.

18
Clinic
23 Interest is in drawing
inferences that are valid for
41 the complete population of
levels.
7
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-6 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

However, suppose the clinical trial was performed in four clinics and the four clinics are a sample from a
larger population of clinics. The conclusions of the clinical trials are not only restricted to the four clinics
but rather to the population of clinics. The appropriate model in this study is a general linear mixed model
with drug as a fixed-effect variable and clinic as a random-effect variable.

MIXED Procedure

General form of the MIXED procedure:

P R OC MIXED DATA=SAS-data-set <options>;


CL ASS variables;
MOD EL response=<fixed effects></ options>;
R ANDOM random effects </ options>;
R EP EATED <repeated effect> </ options>;
R UN;

8
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC MIXED is used to model linear mixed models. The procedure also provides you with
the flexibility of modeling not only the means of your data, but the variances and covariances as well.
Selected MIXED procedure statements:
CLASS specifies the classification variables to be used in the analysis. The CLASS statement
must precede the MODEL statement.

MODEL specifies the response variable (one and only one) and all the fixed effects, which
determine the X matrix of the mixed model. The MODEL statement is required and only
one is allowed with each invocation of PROC MIXED.

RANDOM defines the random effects, which determine the Z matrix of the mixed model.
The random effects can be categorical or numeric, and multiple RANDOM statements are
possible. When random intercepts are needed, you must specify INTERCEPT (or INT)
as a random effect. The covariance structure of the random effects corresponds
to the G matrix.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-7

REPEATED specifies the R matrix in the mixed model. If no repeated statement is specified, then
R is assumed to have the independent covariance structure. The repeated effect defines
the ordering of the repeated measurements within each subject. If no repeated effect
is specified, then the repeated measures data must be similarly ordered for each subject.
All missing response variable values must be indicated with periods in the input data set
unless they all fall at the end of a subject’s repeated response profile. The repeated effect
must contain only classification variables. Furthermore, the levels of the repeated effect
must be different for each observation within a subject.

Model Assumptions in PROC MIXED

• Random effects and error terms are normally distributed with means of 0.
• Random effects and error terms are independent of each other.
• The relationship between the response variable and predictor variables is
linear.

9
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

A nonlinear mixed model can be used when modeling a process that follows a more general nonlinear
relationship. Nonlinear mixed models can be fit in the NLMIXED procedure. Note that polynomial
models do not belong to nonlinear models. Polynomial models are still linear in the parameters
of the mean function. Nonlinear models refer to the nonlinear relationship between the response variable
and the fixed effect parameters (in other words, Y  1e 2 x1   ).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-8 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.01 Multiple Choice Poll

Which of the following characteristics do general linear models and general


linear mixed models have in common?
a. Both models support fixed and random effects.
b. Both models can handle correlated error terms.
c. Both models assume that the error terms are normally distributed.
d. None of the above.

10
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Estimation in Mixed Models for Fixed Effects

Variance-covariance matrix of the observations involves


• the covariance structure of the random effects, denoted as G
• the covariance structure of the random errors, denoted as R.
Ordinary least squares is no longer the best method.

12
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Estimation is more difficult in the mixed model than in the general linear model. Not only do you have
fixed effects as in the general linear model, but you also have to estimate the random effects,
the covariance structure of the random effects, and the covariance structure of the random errors.
Ordinary least squares is no longer the best method because the distributional assumptions regarding
the random error terms are too restrictive. In other words, the parameter estimates are no longer the best
linear unbiased estimates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-9

Estimation in Mixed Models for Fixed Effects

Estimated generalized least squares (EGLS)


• takes into account the covariance structures G and R
• requires a reasonable estimate of G and R
• is the solution for fixed effects.
The formula for EGLS is

ˆ  (Vˆ 1) Vˆ 1Y


where Vˆ  Gˆ   Rˆ
13
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Notice that EGLS requires the knowledge of G and R. Because you rarely have this information, the goal
becomes finding a reasonable estimate for G and R.

Estimation in General Linear Models


versus Mixed Models

Assumes that errors are


independent, normally Estimates parameters and
GLM
distributed, and with standard errors using OLS
common variance

Estimates parameters and


Mixed Estimates G and R using a
standard errors using
Model likelihood-based method
estimated GLS

14
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The parameters of the covariance matrices G and R must be estimated. After they are estimated, they are
substituted in place of the true parameter values in G and R to compute estimates of  and V (  ) .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-10 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Maximum Likelihood Estimation

15
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The maximum likelihood estimation method finds the parameter estimates that are most likely to occur
given the data. The parameter estimates are derived by maximizing the likelihood function, which
is a mathematical expression that describes the joint probability of obtaining the data expressed
as a function of the parameter estimates.

PROC MIXED implements two likelihood-based methods, maximum likelihood (ML) and restricted
maximum likelihood (REML), to estimate the parameters in G and R. The difference between ML and
REML is the construction of the likelihood function. REML constructs the likelihood based on residuals
and obtains maximum likelihood estimates of the variance components from this restricted/residual
likelihood function. However, the two methods are asymptotically equivalent and often give very
similar results.

Details
PROC MIXED constructs an objective function associated with ML or REML and maximizes it over all
unknown parameters. The corresponding log likelihood functions are as follows:
1 1 n
ML : l (G, R)   log | V |  r 'V 1r  log 2
2 2 2 ,
1 1 1 n p
REML : lR (G, R)   log | V |  log | X 'V 1 X |  r 'V 1r  log 2
2 2 2 2 ,

where r  y  X ( X 'V 1 X ) X 'V 1 y and p=rank(X).


By default, PROC MIXED uses a ridge-stabilized Newton-Raphson algorithm to find the parameter
estimates that minimize –2 times the log likelihood functions.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-11

ML versus REML

• Both are based on the likelihood principle, which has the properties of
consistency, asymptotic normality, and efficiency.
• REML corrects for the downward bias in the ML parameters in G and R.
• REML handles strong correlations among the responses more effectively.
• REML is less sensitive to outliers in the data compared to ML.
• The differences between ML and REML estimation increase as the number
of fixed effects in the model increases and the number of subjects
decreases.

16
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The distinction between ML and REML becomes important only when the number of fixed effects
is relatively large. In that case, the comparisons unequivocally favor REML. First, REML copes much
more effectively with strong correlations among the responses for the subjects than does ML. Second,
REML estimates do not have the downward bias that ML estimates have because REML estimators take
into account the degrees of freedom from the fixed effects in the model. Finally, REML estimators are
less sensitive to outliers in the data than ML estimators. In fact, when the estimates do vary substantially,
Diggle, Heagerty, Liang, and Zeger favor REML (2002).
There is also the noniterative MIVQUE0 method, which performs minimum variance quadratic unbiased
estimation of the covariance parameters. However, Swallow and Monahan (1984) present simulation
evidence favoring REML and ML over MIVQUE0. MIVQUE0 is generally not recommended except for
situations when the iterative REML and ML methods fail to converge and it is necessary to obtain
parameter estimates from a fitted model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-12 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.02 Multiple Choice Poll

Why is ordinary least squares not the preferred estimation method for fixed
effects in general linear mixed models?
a. Ordinary least squares does not support random effects.
b. Ordinary least squares does not support correlated error terms.
c. Ordinary least squares does not support nonnormal distribution of error
terms.
d. Both a and b.

17
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Block-Diagonal Covariance Matrix

R1
0
R2

R3
0 R4

19
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC MIXED requires that the data be structured so that each observation represents the measurement
for a subject at only one moment in time. Therefore, if Subject A had five repeated measurements, Subject
A would have five observations. An ID variable is needed to link the repeated measurements to the
subjects, and a time variable is needed to order the repeated measurements within each subject.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-13

With repeated measures data using the SUBJECT= option in the REPEATED statement, the matrix R has
a block-diagonal covariance structure where the block corresponds to the covariance structure for each
subject. The observations within each block can take on a variety of covariance structures while the
observations outside of the blocks are assumed to be independent. In PROC MIXED, the blocks must
have the same structure but can have different parameter estimates.

Selecting the Appropriate Covariance Structure

When finding reasonable estimates for R,


• if you choose a structure that is too simple, then you risk increasing the
Type I error rate
• if you choose a structure that is too complex, then you sacrifice power and
efficiency.

20
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The validity of the statistical inference of the general linear mixed model depends on the covariance
structure that you select for R. Therefore, a large amount of time spent on building the model is spent
on choosing a reasonable covariance structure for R.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-14 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Variance Component (VC) or Simple


Time Point
1 2 3 4

 2
1

0
 2
2
Time
Point
 2
3

0  2
4

21
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The simplest covariance structure is the independent or variance component model, where the within-
subject error correlation is zero. This is the default structure for both the RANDOM and REPEATED
statements. For the between-subject errors, the simple covariance structure might be a reasonable
assumption. However, for the within-subject errors, the simple covariance structure might be a reasonable
choice if the repeated measurements occurred at long enough intervals so that the correlation is
effectively zero relative to other variation.

Compound Symmetry
Time Point
1 2 3 4

1.0    1

1.0   2

 2 Time
Point
1.0  3

1.0 4

22
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-15

The covariance structure with the simplest correlation model is the compound symmetry structure.
It assumes that the correlation (  ) is constant regardless of the distance between the time points. This
is the assumption that univariate ANOVA makes, but it is usually not a reasonable choice in longitudinal
data analysis. However, this covariance structure might be reasonable when the repeated measurements
are not obtained over time. For example, the compound symmetry covariance structure might be a good
choice if the independent experimental units were classrooms and the responses obtained were from each
student in the classroom (Davis 2002).

2.03 Multiple Choice Poll

Which one of the following statements is true regarding the restricted


maximum likelihood (REML) method?
a. REML handles strong correlations among the responses less effectively
than maximum likelihood.
b. REML parameter estimates have a downward bias that the maximum
likelihood parameter estimates do not have.
c. REML parameter estimates approximate maximum likelihood parameter
estimates as the number of fixed effects becomes large.
d. REML parameter estimates are less sensitive to outliers in the data than
maximum likelihood parameter estimates.

23
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-16 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Unstructured Covariance
Time Point
1 2 3 4

 1
2
 12  13  14 1

 2
2
 23  24 2
Time

 3
2
 34 3
Point

 2
4
4

25
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The unstructured covariance structure is parameterized directly in terms of variances and covariances
where the observations for each pair of times have their own unique correlations. The variances are
constrained to be nonnegative and the covariances are unconstrained. This is the covariance structure used
in multivariate ANOVA.

The correlation coefficient for row 1 column 2 is

 12
12 
 12 * 2 2
There are two potential problems with using the unstructured covariance. First, it requires the estimation
t (t  1)
of a large number of variance and covariance parameters ( ). This can lead to severe computational
2
problems, especially with unbalanced data. Second, it does not exploit the existence of trends in variances
and covariances over time, and this can result in erratic patterns of standard error estimates (Littell et al.
1998). If a simpler covariance structure is a reasonable alternative, then the unstruc tured covariance
structure wastes a great deal of information, which would adversely affect efficiency and power.
Although the unstructured covariance structure does not require equal spacing among the time points,
the structure is not appropriate for the R matrix in the CD4+ cell count example because the spacing
between time points is different across subjects. For example, the time interval between the first and
second measurements for Subject 1 might be different from the time interval for Subject 2. The time
interval between measurements can be different within the subjects (time between first and second
measurements can be different from time between second and third measurements), but the time interval
between specific measurements (first and second, second and third, and so on) must be the same across all
subjects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-17

First-Order Autoregressive AR(1)


Time Point
1 2 3 4

1.0  2 3 1

1.0  2 2
Time
 2
Point
1.0  3

4
1.0

26
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The first-order autoregressive covariance structure takes into account a common trend in longitudinal
data; the correlation between observations is a function of the number of time points apart. In this
structure, the correlation between adjacent observations is  , regardless of whether the pair

of observations is the first and second pair, the second and third pair, and so on. The correlation is 2 for
any pair
of observations two units apart, and  for any pair of observations d units apart. Notice that the AR(1)
d

model requires only estimates for just two parameters,  2 and  , whereas the unstructured models
(1  T )* T
require estimates for parameters (where T is the number of time points). One shortcoming
2
is that the correlation decays very quickly as the spacing between measurements increases (Davis 2002).

The assumption in the AR(1) model is that the longitudinal data are equally spaced (Littell et al. 1996).
This means that the distance between time 1 and 2 is the same as time 2 and 3, time 3 and 4, and
so on. The AR(1) structure also assumes that the correlation structure does not change appreciably over
time (Littell et al. 2002). Therefore, the AR(1) structure might not be appropriate for the CD4+ cell study
because the repeated measures are unequally spaced.

In some circumstances the AR(1) model might be justified empirically where the observations are not
evenly spaced. When the adjoining observations show similar covariances, despite unequal time periods,
with exponentially decreasing covariances for increasingly separated measurement time points, then
the AR(1) model might be warranted (Brown and Prescott 2001). However, these circumstances are
unlikely for the CD4+ cell study.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-18 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Toeplitz
Time Point
1 2 3 4

1.0 1 2 3 1

1.0 1 2 2
Time
 2
Point
1.0 1 3

1.0 4

27
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The Toeplitz covariance structure is similar to the AR(1) covariance structure in that the pairs
of observations separated by a common distance share the same correlation. However, observations d
units apart have correlation  d instead of  . The Toeplitz structure requires the estimation of T
d

parameters instead of just two parameters.


You can also specify a banded Toeplitz structure in which you specify the number of time points apart
the measurements are still correlated. For example, a TOEP(3) (Toeplitz with 3 bands) structure would
indicate that measurements are correlated if they are three or fewer time points apart. If they are four
or more time points apart, the correlation is zero.

As with the AR(1) structure, the Toeplitz structure assumes that the observations are equally spaced
and the correlation structure does not change appreciably over time (Littell et al. 2002). Therefore,
the Toeplitz covariance structure is not an appropriate structure for the CD4+ cell study.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-19

Spatial Power
Time Point
1 2 3 4

1.0  t t  t t  t t 1 2 1 3 1 4
1

t t
1.0   t t 2 3 2 4
2
Time
 2
Point
t t
1.0  3 4
3

1.0 4

28
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Covariance structures that allow for unequal spacing are the spatial covariance structures. These
structures are mainly used in geostatistical models, but they are very useful for unequally spaced
longitudinal measurements where the correlations decline as a function of time. The connection between
geostatistics and longitudinal data is that the unequally spaced data can be viewed as a spatial process
in one dimension (Littell et al. 1996).

The spatial power structure provides a direct generalization of the AR(1) structure for equally spaced data.
Only two parameters are estimated (  2 and  ).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-20 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Spatial Gaussian
Time Point
1 2 3 4

  t t 2    t t 2    t t 2 
 1 2   1 3   1 4 
1.0  2 
   2 
 
 2 
  1
e e e
  t t 2    t t 2 
 2 3   2 4  2
 2   2 
1.0 e  
e  
 2
Time
  t t 2 
Point
 3 4 
1.0  2  3
 
e

1.0 4

29
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The spatial Gaussian structure is a frequently used covariance structure for unequally spaced
measurements. The difference between the spatial covariance structures is the assumptions made on how
the correlation between the error terms decreases as the length of the time interval increases. To determine
which correlation function is the best fit for your data, the sample variogram (which will be discussed
in a later section) could be used.

 Other spatial structures used later in the course include spatial linear, spatial exponential,
and spatial spherical.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-21

2.04 Multiple Choice Poll

Which one of the following covariance structures is not appropriate for


unequally spaced time points in a balanced design?
a. AR(1)
b. Compound Symmetry
c. Unstructured
d. Spatial Power

30
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Issues with Degrees of Freedom Estimates

• For unbalanced data, the denominator degrees of freedom must be


estimated from the data.
• The default degrees of freedom provided by the MIXED procedure might
not always be appropriate.
• The Kenward-Roger (KR) method of computing the denominator degrees of
freedom is recommended as the standard operating procedure for
longitudinal models.

32
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC MIXED provides the following methods for estimating the approximate denominator degrees
of freedom: containment, between-within, residual, Satterthwaite’s, and KENWARDROGER.
KENWARDROGER is considered by many to be the most appropriate for longitudinal models.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-22 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Kenward-Roger DF Adjustment

Kenward-Roger (KR) method of computing the denominator degrees of


freedom
• adjusts the variance-covariance matrix of fixed and random effects
• was shown to be superior or, at worst, equal to the other degrees of
freedom adjustments in terms of Type I error control
• should be used for the more complex covariance structures because the
Type I error rate inflation was extremely severe without the KR adjustment

33
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The Kenward-Roger degrees of freedom adjustment uses an approximation that involves inflating
the estimated variance-covariance matrix of the fixed and random effects. Satterthwaite-type degrees
of freedom are then computed based on this adjustment. By default, the observed information matrix
of the covariance parameter estimates is used in the calculations.

 The KENWARDROGER method uses more computer resources. It can take a long time and
extensive memory for large data sets.

 In a simulation study performed by Guerin and Stroup (2000), the Kenward-Roger degrees
of freedom adjustment was shown to be superior or, at worst, equal to the Satterthwaite and
default DDFM options. They strongly recommend the KR adjustment as the standard operating
procedure for longitudinal models.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-23

Kenward-Roger DF Adjustment

The KR adjustment might have undesirable consequences when covariance


matrices have nonzero second derivatives.
• Adjustment can lead to shrinkage of standard errors.
• An adjusted covariance matrix might not be positive definite.
• Results are not invariant under reparameterization.

34
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

For covariance structures that have nonzero second derivatives with respect to the covariance parameters,
the Kenward-Roger covariance matrix adjustment includes a second-order term. This term can result
in standard error shrinkage. Also, the resulting adjusted covariance matrix can then be indefinite
and is not invariant under reparameterization.

 The following are examples of covariance structures that generally lead to nonzero second
derivatives: First-order antedependence (TYPE=ANTE(1)), First-order autoregressive
(TYPE=AR(1)), Heterogeneous AR(1) (TYPE=ARH(1)), First-order autoregressive
moving average (TYPE=ARMA(1,1)), Heterogeneous CS (TYPE=CSH), Factor-Analytic
(TYPE=FA), No diagonal Factor-Analytic ( TYPE=FA0( )),Heterogeneous Toeplitz
(TYPE=TOEPH), Unstructured Correlations (TYPE=UNR), and all Spatial covariance structures
(TYPE=SP()).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-24 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

DDFM=KR(FIRSTORDER)

The FIRSTORDER suboption


• eliminates the second derivatives from the calculation of the covariance
matrix adjustment
• might be preferred for covariance structures that have nonzero second
derivatives such as the spatial covariance structures.

35
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The FIRSTORDER suboption of the DDFM=KR option is recommended for the spatial covariance
structures because these covariance structures generally lead to nonzero second derivatives.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-25

Fitting a Longitudinal Model in PROC MIXED

Example: Fit a longitudinal model to the long.aids data set. Rescale the response variable by dividing
CD4 by 100. Include all the two-factor interactions with time and the time quadratic and cubic
effects. Use the Kenward-Roger degrees of freedom calculations and use the compound
symmetry covariance structure.
/* long02d01.sas */
data aids;
set long.aids;
cd4_scale=cd4/100;
run;
A common recommendation is to rescale the response and explanatory variables if they have relatively
large values compared to the other variables in the model. This creates a more stable model and decreases
the likelihood of convergence problems in PROC MIXED. Because the response variable CD4 has
relatively large values, a new rescaled variable was created. If time were measured in days, then that
variable would also be rescaled.
/* The program below assumes the data is sorted by id and time */
proc mixed data=aids;
model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*partners time*drug time*cigarettes
time*time time*time*time
/ solution ddfm=kr;
repeated / type=cs subject=id r rcorr;
title 'Longitudinal Model with Compound Symmetry '
'Covariance Structure';
run;
Selected MODEL statement options:
DDFM=KR performs the degrees of freedom calculations proposed by Kenward and Roger (1997).
SOLUTION requests estimates for all fixed effects in the model, together with the standard errors,
t-statistics, and p-values.

Selected REPEATED statement options:


R requests that the residual covariance matrix (R matrix) be displayed. By default, the
covariance matrix for the first subject is displayed. You can also request covariance
matrices for specific subjects.
RCORR requests the correlation matrix corresponding to the blocks of the estimated
covariance matrix be displayed. By default, the correlation matrix for the first subject
is displayed. You can also request correlation matrices for specific subjects.
SUBJECT=ID identifies the subjects in the mixed model. This defines the block diagonality
of the covariance matrix. The identification variable can be continuous or categorical.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-26 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

TYPE= specifies the covariance structure for the error components. The default structure is
the simple or variance components structure.

 When the subject’s identification number is treated as continuous, PROC MIXED considers
a record to be from a new subject whenever the value of the identification number changes from
the previous record. Therefore, you should first sort the data by the values of the identification
number if they are not already sorted. The long.aids data set is sorted by ID. Using a continuous
ID variable reduces the execution time for models with a large number of subjects.
No repeated effects are specified in the REPEATED statement because the data are similarly
ordered within each subject and there are no missing time values. If the measurements were not
similarly ordered within subject, then the time variable would have to be used as the repeated
effect. If there were missing measurements, then you must indicate all missing response variable
values with periods in the data set unless they all fall at the end of the subject’s response profile.
This requirement is necessary in order to inform PROC MIXED of the proper location of the
observed repeated responses.
Repeated effects must be classification variables, so you could use two versions of the time
variable. A continuous time could be used in the MODEL statement as well as the RANDOM
statement, and a classification time could be used in the REPEATED statement.
Longitudinal Model with Compound Symmetry Covariance Structure

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structure Compound Symmetry
Subject Effect id
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

The Model Information table shows the name of the data set, the dependent variable, the covariance
structure used in the model, the subject effect, the estimation method to compute the parameters for the
covariance structure, and the method to compute the degrees of freedom. The default estimation method
is REML. The METHOD= option can be used in the PROC MIXED statement to specify other estimation
methods.

There are four methods for handling the residual variance in the model. The profile method factors out
the residual variance out of the optimization problem, whereas the fit method retains the residual variance
as a parameter in the optimization. The factor method keeps the residual fixed, and none is displayed
when a residual variance is not a part of the model. The NOPROFILE option in the PROC MIXED
statement changes the method based on the chosen covariance structure.

The fixed effects standard error method describes the method used to compute the approximate standard
errors for the fixed-effects parameter estimates and related functions of them. The default method can be
changed using the EMPIRICAL option in the PROC MIXED statement. This option requests robust
standard errors obtained from using the sandwich estimator, which has been shown to be consistent
as long as the mean model is correctly specified. However, if there are any missing observations, the
EMPIRICAL option provides only valid inferences for the fixed effects under the MCAR assumption.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-27

The EMPIRICAL option is not used here because it cannot be used with the Kenward-Roger degrees
of freedom calculation.
Dimensions

Covariance Parameters 2
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12

Number of Observations

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

The Dimensions table lists the sizes of the relevant matrices. This table can be useful in determining CPU
time and memory requirements. The Number of Observations table shows the number of observations
read, used, and not used. Because there are no missing observations, all the observations are used.
Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 12668.04910184
1 2 11846.03145506 0.00000217
2 1 11846.02324942 0.00000000

Convergence criteria met.

The Iteration History table describes the optimization of the residual log likelihood. The minimization
is performed using a ridge-stabilized Newton-Raphson algorithm, and the rows of the table describe
the iterations that this algorithm takes in order to minimize the objective function.
Estimated R Matrix for Subject 1

Row Col1 Col2 Col3

1 12.0198 5.7939 5.7939


2 5.7939 12.0198 5.7939
3 5.7939 5.7939 12.0198

Because the R option is used in the REPEATED statement, the residual covariance matrix is displayed for
the first subject by default. The diagonal shows the variance while the off-diagonals show the covariances.
Estimated R Correlation
Matrix for Subject 1

Row Col1 Col2 Col3

1 1.0000 0.4820 0.4820


2 0.4820 1.0000 0.4820
3 0.4820 0.4820 1.0000

The RCORR option displays the correlation matrix for the first subject. The estimated correlation among
the measurements is 0.4820. The correlations are the same regardless of which pair of measurements is
examined because the compound symmetry covariance structure was requested.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-28 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Covariance Parameter Estimates

Cov Parm Subject Estimate

CS id 5.7939
Residual 6.2259

The Covariance Parameter Estimates table shows the parameter estimates for the compound symmetry
covariance structure. In this example, the estimated covariance is 5.7939 and the estimated residual
variance is 6.2259. Adding the values together gives the estimated variance (12.0198).
Fit Statistics

-2 Res Log Likelihood 11846.0


AIC (Smaller is Better) 11850.0
AICC (Smaller is Better) 11850.0
BIC (Smaller is Better) 11857.8

The Fit Statistics table provides information that you can use to select the most appropriate covariance
structure. Akaike’s Information Criterion (AIC) (Akaike 1974) penalizes the –2 residual log likelihood
by twice the number of covariance parameters in the model. The smaller the value is, the better the model
is. The finite-sample version of the AIC (AICC) is also included. It is recommended for small sample
sizes to use the AICC rather than the AIC. The Schwarz’s Bayesian Information Criterion (BIC) (Schwarz
1978) also penalizes the –2 residual log likelihood, but the penalty is more severe. Therefore, BIC tends
to choose less complex models than AIC or AICC.
Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

1 822.03 <.0001

The Null Model Likelihood Ratio Test table shows a test that determines whether it is necessary to model
the covariance structure of the data at all. The test statistic is –2 times the log likelihood from the null
model (model with an independent covariance structure) minus –2 times the log likelihood from the fitted
model. The p-value can be used to assess the significance of the model fit.
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 8.0594 0.2362 1076 34.13 <.0001


time -1.0430 0.08359 2249 -12.48 <.0001
age 0.01554 0.01885 375 0.82 0.4104
cigarettes 0.4605 0.07203 1328 6.39 <.0001
drug 0.1295 0.2017 2339 0.64 0.5209
partners 0.03450 0.02237 2360 1.54 0.1231
depression -0.02638 0.008662 2326 -3.05 0.0024
time*age -0.01598 0.004560 2258 -3.50 0.0005
time*depression 0.000784 0.003357 2234 0.23 0.8153
time*partners -0.00584 0.009560 2230 -0.61 0.5410
time*drug -0.04277 0.07641 2233 -0.56 0.5757
time*cigarettes -0.1520 0.02454 2244 -6.19 <.0001
time*time -0.1518 0.02400 2149 -6.32 <.0001
time*time*time 0.05458 0.006254 2119 8.73 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-29

The SOLUTION option in the MODEL statement requested a table for the fixed effects parameter
estimates. Notice that the quadratic and cubic time effects are significant (which agrees with the average
trend curve of the CD4+ cell count) and the time*age and time*cigarettes interactions are significant.
Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 2249 155.67 <.0001


age 1 375 0.68 0.4104
cigarettes 1 1328 40.88 <.0001
drug 1 2339 0.41 0.5209
partners 1 2360 2.38 0.1231
depression 1 2326 9.27 0.0024
time*age 1 2258 12.27 0.0005
time*depression 1 2234 0.05 0.8153
time*partners 1 2230 0.37 0.5410
time*drug 1 2233 0.31 0.5757
time*cigarettes 1 2244 38.37 <.0001
time*time 1 2149 39.99 <.0001
time*time*time 1 2119 76.17 <.0001

The Type 3 Tests of Fixed Effects table shows the hypothesis tests for the significance of each of the fixed
effects. A p-value is computed from an F distribution with the numerator and denominator degrees
of freedom. You can use the HTYPE= option in the MODEL statement to obtain tables of Type I
(sequential) tests and TYPE II (adjusted) tests in addition to or instead of the table of TYPE III (partial)
tests. You can also use the CHISQ option to obtain Wald chi-square tests of the fixed effects.

Example: Fit a longitudinal model using the spatial power covariance structure and use FIRSTORDER
suboption in the Kenward-Roger degrees of freedom adjustment. Request the covariance
matrix and the correlation matrix for the 13 th subject.
proc mixed data=aids;
model cd4_scale=time age cigarettes drug partners depression
time*age time*depression time*partners time*drug
time*cigarettes time*time time*time*time
/ solution ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id r=13 rcorr=13;
title 'Longitudinal Model with Spatial Power Covariance Structure';
run;
Selected REPEATED statement option:
LOCAL adds a measurement error component to the serial correlation component.
This option is useful when you model a time series covariance structure.

Selected DDFM= suboption:


FIRSTORDER eliminates the second derivatives from the calculation of the covariance matrix
adjustment.

 The variable time in the TYPE= option is used to calculate the time differences between repeated
measurements.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-30 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Longitudinal Model with Spatial Power Covariance Structure

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structure Spatial Power
Subject Effect id
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

Dimensions

Covariance Parameters 3
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12

Number of Observations

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 12668.04910184
1 3 11883.08815296 0.32992483
2 1 11881.79852820 0.00348677
3 2 11864.84042331 0.10490545
4 2 11801.90993395 2.88713335
5 2 11734.85393060 0.00204795
6 2 11731.57580732 0.00054912
7 1 11729.33587289 0.00001849
8 1 11729.26578521 0.00000003
9 1 11729.26567357 0.00000000

Convergence criteria met.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-31

Estimated R Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 12.1853 7.2673 6.7060 6.2226 5.7297 5.2850 4.8914 4.5252 4.1739


2 7.2673 12.1853 7.2487 6.7261 6.1934 5.7126 5.2872 4.8914 4.5117
3 6.7060 7.2487 12.1853 7.2891 6.7118 6.1908 5.7297 5.3008 4.8893
4 6.2226 6.7261 7.2891 12.1853 7.2332 6.6717 6.1749 5.7126 5.2692
5 5.7297 6.1934 6.7118 7.2332 12.1853 7.2456 6.7060 6.2040 5.7224
6 5.2850 5.7126 6.1908 6.6717 7.2456 12.1853 7.2704 6.7261 6.2040
7 4.8914 5.2872 5.7297 6.1749 6.7060 7.2704 12.1853 7.2673 6.7032
8 4.5252 4.8914 5.3008 5.7126 6.2040 6.7261 7.2673 12.1853 7.2456
9 4.1739 4.5117 4.8893 5.2692 5.7224 6.2040 6.7032 7.2456 12.1853
10 3.8615 4.1739 4.5233 4.8747 5.2940 5.7396 6.2013 6.7032 7.2673
11 3.5831 3.8730 4.1972 4.5233 4.9124 5.3258 5.7543 6.2199 6.7434
12 3.3050 3.5724 3.8714 4.1722 4.5310 4.9124 5.3076 5.7371 6.2199

Estimated R Matrix for Subject 13

Row Col10 Col11 Col12

1 3.8615 3.5831 3.3050


2 4.1739 3.8730 3.5724
3 4.5233 4.1972 3.8714
4 4.8747 4.5233 4.1722
5 5.2940 4.9124 4.5310
6 5.7396 5.3258 4.9124
7 6.2013 5.7543 5.3076
8 6.7032 6.2199 5.7371
9 7.2673 6.7434 6.2199
10 12.1853 7.2891 6.7233
11 7.2891 12.1853 7.2456
12 6.7233 7.2456 12.1853

The Estimated R Matrix table shows the residual covariance matrix for the 13 th subject who had 12
repeated measurements.
Estimated R Correlation Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 1.0000 0.5964 0.5503 0.5107 0.4702 0.4337 0.4014 0.3714 0.3425


2 0.5964 1.0000 0.5949 0.5520 0.5083 0.4688 0.4339 0.4014 0.3703
3 0.5503 0.5949 1.0000 0.5982 0.5508 0.5080 0.4702 0.4350 0.4012
4 0.5107 0.5520 0.5982 1.0000 0.5936 0.5475 0.5067 0.4688 0.4324
5 0.4702 0.5083 0.5508 0.5936 1.0000 0.5946 0.5503 0.5091 0.4696
6 0.4337 0.4688 0.5080 0.5475 0.5946 1.0000 0.5967 0.5520 0.5091
7 0.4014 0.4339 0.4702 0.5067 0.5503 0.5967 1.0000 0.5964 0.5501
8 0.3714 0.4014 0.4350 0.4688 0.5091 0.5520 0.5964 1.0000 0.5946
9 0.3425 0.3703 0.4012 0.4324 0.4696 0.5091 0.5501 0.5946 1.0000
10 0.3169 0.3425 0.3712 0.4000 0.4345 0.4710 0.5089 0.5501 0.5964
11 0.2941 0.3178 0.3444 0.3712 0.4031 0.4371 0.4722 0.5104 0.5534
12 0.2712 0.2932 0.3177 0.3424 0.3718 0.4031 0.4356 0.4708 0.5104

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-32 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Estimated R Correlation
Matrix for Subject 13

Row Col10 Col11 Col12

1 0.3169 0.2941 0.2712


2 0.3425 0.3178 0.2932
3 0.3712 0.3444 0.3177
4 0.4000 0.3712 0.3424
5 0.4345 0.4031 0.3718
6 0.4710 0.4371 0.4031
7 0.5089 0.4722 0.4356
8 0.5501 0.5104 0.4708
9 0.5964 0.5534 0.5104
10 1.0000 0.5982 0.5517
11 0.5982 1.0000 0.5946
12 0.5517 0.5946 1.0000

The Estimated R Correlation Matrix table shows the correlation matrix for the 13 th subject. Notice how
the correlation coefficients decrease as the time interval increases.
Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance id 7.8554
SP(POW) id 0.8554
Residual 4.3300

The estimated correlation coefficient used in the spatial power covariance structure is 0.8554.
The LOCAL option adds an additional variance parameter (labeled “Variance”). The parameter labeled
“Residual” represents the measurement error.
Fit Statistics

-2 Res Log Likelihood 11729.3


AIC (Smaller is Better) 11735.3
AICC (Smaller is Better) 11735.3
BIC (Smaller is Better) 11747.0

The AIC and BIC values are lower than the model using the compound symmetry covariance structure
(11850.0 versus 11735.3 for the AIC).
Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 938.78 <.0001

The model with the spatial power covariance structure is significantly different from the model with
the independent covariance structure.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-33

Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 8.0939 0.2434 1100 33.25 <.0001


time -1.1385 0.1007 991 -11.30 <.0001
age 0.01736 0.01918 385 0.90 0.3661
cigarettes 0.4203 0.07447 1297 5.64 <.0001
drug 0.1522 0.2034 2331 0.75 0.4544
partners 0.04586 0.02291 2245 2.00 0.0454
depression -0.02620 0.008670 2338 -3.02 0.0025
time*age -0.01451 0.006072 617 -2.39 0.0172
time*depression 0.001513 0.003823 1644 0.40 0.6924
time*partners -0.01312 0.01060 1790 -1.24 0.2161
time*drug 0.01618 0.08757 1616 0.18 0.8535
time*cigarettes -0.1383 0.02984 1032 -4.63 <.0001
time*time -0.1753 0.02758 966 -6.35 <.0001
time*time*time 0.06103 0.006930 1114 8.81 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 991 127.79 <.0001


age 1 385 0.82 0.3661
cigarettes 1 1297 31.85 <.0001
drug 1 2331 0.56 0.4544
partners 1 2245 4.01 0.0454
depression 1 2338 9.13 0.0025
time*age 1 617 5.71 0.0172
time*depression 1 1644 0.16 0.6924
time*partners 1 1790 1.53 0.2161
time*drug 1 1616 0.03 0.8535
time*cigarettes 1 1032 21.47 <.0001
time*time 1 966 40.38 <.0001
time*time*time 1 1114 77.57 <.0001

The inferences for the fixed effects in the model using the spatial power covariance structure are similar
to the model using the compound symmetry covariance structure.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-34 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Exercises

A pharmaceutical firm conducted a clinical trial to examine heart rates among patients. Each patient was
subjected to one of three possible drug treatment levels: drug a, drug b, and a placebo. A baseline
measurement was taken and the heart rates were recorded at five unequally spaced time intervals:
1 minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour. The data are stored in the SAS data set
long.heartrate.
These are the variables in the data set:
heartrate heart rate

patient patient identification number


drug drug treatment level (a, b, and p)

hours time point heart rate was recorded (0.01677, 0.08333, 0.25000, 0.5000, 1.000)
baseline baseline heart rate.
1. Fitting a General Linear Mixed Model
a. Fit a general linear mixed model with the three main effects, the three two-factor interactions,
and the quadratic and cubic effects of hours. Request the parameter estimates and the Kenward-
Roger method for computing the degrees of freedom. In the REPEATED statement, request
the unstructured covariance structure and the R matrix along with the correlations computed from
the R matrix.
1) Is the unstructured covariance structure legitimate in this example?
2) What does the R matrix represent?
3) What does the R correlation matrix represent? What is the general pattern among
the correlations?
4) Interpret the results of the null likelihood ratio test.
5) Are there any higher-order terms significant at the 0.05 level?
b. Fit the same model but with the compound symmetry covariance structure.
1) Is the compound symmetry covariance structure legitimate in this example?

2) Why is the AICC statistic much lower for the model with the compound symmetry covariance
structure compared to the model with the unstructured covariance structure?
3) Are there differences in the inferences for the fixed effects compared to the model with
the unstructured covariance structure? What is a possible reason for these differences?
c. Fit the same model but with the spatial power covariance structure. Because you are using
the spatial power covariance structure, add a measurement error component, and use
the FIRSTORDER suboption.

1) Interpret the covariance parameter estimates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.1 General Linear Mixed Model 2-35

2) Why is the AICC statistic lower for this model compared to the model with compound
symmetry covariance structure and the model with the unstructured covariance structure?
3) Are there differences in the inferences for the fixed effects compared to the model with
the compound symmetry covariance structure? What is a possible reason for these
differences?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-36 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.05 Multiple Choice Poll

What happens to the inferences in the model when a covariance structure is


too complex given the relationships in the data?
a. The inferences have a larger Type I error rate.
b. The inferences have less power and efficiency.
c. The inferences are biased.
d. The inferences are not affected.

38
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.2 Evaluating Covariance Structures

Objectives

• Learn the concepts regarding the sample variogram.


• Create a plot of a sample variogram.
• Plot the goodness-of-fit statistics for the appropriate covariance structures.

42
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-37

Importance of Covariance Structures

Covariance structures
• model all the variability in the data, which cannot be explained by the fixed
effects
• represent the background variability that the fixed effects are tested against
• must be carefully selected to obtain valid inferences for the parameters of
the fixed effects.

43
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Obtaining valid inferences in a mixed model is much more complex than in a general linear model.
For example, inferences are obtained in the GLM procedure by testing the fixed effects against the error
variance (residual variance). However, in PROC MIXED the inferences are obtained by testing the fixed
effects against the appropriate background variability, which is modeled by the covariance structure. This
background variability might consist of several sources of error, so selecting the appropriate covariance
structure is not a trivial task.

Sources of Error
Random Effects Reflects how much subject-specific profiles deviate
from the average profile, or the between-subject
variability.
Serial Correlation Is usually a decreasing function of the time separation
between measurements. This represents the within
subject variability.
Measurement Error For some measurements, there might be a certain level
of variation in the measurement process itself.

44
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-38 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Longitudinal models usually have three sources of random variation. The between-subject variability
is represented by the random effects. The within-subject variability is represented by the serial
correlation. The correlation between the measurements within subject usually depends on the time
interval between the measurements and decreases as the length of the interval increases. A common
assumption is that the serial effect is a population phenomenon independent of the subject. Finally, there
is potentially also measurement error in the measurement process.

The covariance structure that is appropriate for your model is directly related to which component
of variability is the dominant component. For example, if the serial correlation among the measurements
is minimal, then the random effects will probably account for most of the variability in the data and
the remaining error components will have a very simple covariance structure. Diggle, Heagerty, Liang,
and Zeger (2002) believe that in most applications, the serial correlation is very often dominated by the
combination of random effects and measurement error. Furthermore, Chi and Reinsel (1989) found that
models with random effects and serial correlation might sometimes over-parameterize the covariance
structure because the random effects are often able to represent the serial correlations among the
measurements. They conclude that methods for determining the best combination of serial correlation
components and random effects are an important topic that deserves further consideration.

However, suppose the autocorrelation among the measurements is relatively large, and the between-
subject variability not explained by the fixed effects is relatively small. Then choosing the appropriate
serial correlation function in the covariance structure becomes important.

 In this course, serial correlation will be used to describe correlation structures that allow
the correlations to change over time.

Selecting Appropriate Covariance Structures

• Select a covariance structure that best fits the true covariance of the data.
• Create a scatter plot called the sample variogram.
• Use likelihood ratio tests to test whether adding parameters to the
covariance structure causes a statistically significant improvement in the
model.
• Compare models based on measures of fit that are adjusted for the number
of covariance parameters.

45
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-39

Because the covariance structure models the variability not explained by the fixed effects, selecting the
appropriate mean model is critical. For models dealing with data collected in an experiment, a saturated
model is usually recommended. However, for models dealing with observational data, saturated models
are not feasible. Therefore, it is important to include all the important main effects and interactions.

The choice of the covariance structure should be consistent with the empirical correlations. Examining a
plot of the autocorrelation function of the residuals might be useful for this purpose when you have
equally spaced data that are approximately stationary. (The residuals have constant mean and variance
and the correlations depend only on the length of the time interval.) However, the aids data set has
irregularly spaced data that might not be stationary. The variogram is an alternative function that
describes the association among repeated measurements and is easily estimated with irregular observation
times (Diggle 1990).

Likelihood ratio tests can be used to compare covariance structures provided that the same mean model is
fitted and the covariance parameters are nested. Nesting of covariance parameters occurs when the
covariance parameters in the simpler model can be obtained by restricting some of the parameters in the
more complex model. For example, a compound symmetry structure is nested within a Toeplitz structure,
but is not nested within an AR(1) structure. It is recommended to compare simple structures to more
complex structures, and the complex structures should be accepted only if they lead to a significant
improvement in the likelihood (Brown and Prescott 2001).

You can also use the information criteria (such as the AIC and BIC) produced by PROC MIXED as a tool
to help you select the most appropriate covariance structure. The smaller the information criteria value,
the better the model.

Data Values in Sample Variogram

1
vijk  (rij  rik ) 2
2
by

uijk  tij  tik


46
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-40 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The variogram is used extensively in the field of geostatistics, which is concerned primarily with the
estimation of spatial variation. In longitudinal data analysis, the empirical counterpart of the variogram
is called the sample variogram. The data values in the sample variogram are calculated from the observed
half-squared differences between pairs of residuals, where the residuals are ordinary least squares
residuals based on the mean model, and the corresponding time differences. The vertical axis
in the variogram represents the residual variability within subject over time.

The scatter plot also contains a smoothed nonparametric curve, which estimates the general pattern
in the sample variogram. This curve can be used to decide whether the mixed model should include serial
correlation. If a serial correlation component is warranted, then the fitted curve can be used in selecting
the appropriate serial correlation function.

Process Variance

1
 2 ij lk
( r  r ) 2

total number of comparisons

with il

47
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The process variance, ˆ 2 , is estimated as the average of all half-squared-differences of the residuals
1
(rij  rlk )2 , with i  l (i and l are subscripts for subject, and j and k are subscripts for time points).
2

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-41

Autocorrelation Function

 (u)  1   (u) /  2

48
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The autocorrelation function can be estimated from the sample variogram by the formula
 (u)  1   (u) /  2 , where  (u )is the average of the observed half-squared differences between pairs
of residuals corresponding to that particular value of u. With highly irregular sampling times, the averages
for the sample variogram might be estimated by fitting a nonparametric curve.

2.06 Multiple Choice Poll

Which one of the following statements is true regarding serial correlation?


a. It represents the between-subject variability.
b. It is usually an increasing function of the time separation between
measurements.
c. It can be approximated by the autocorrelation function.
d. It can be accounted for by the compound symmetry covariance
structures.

49
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-42 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Example Calculations
Subject Time Response Residual
1 1 4 2
1 3 5 -1
1 4 9 1

Comparison Variogram Value Time Interval


T(1) – T(2) 4.5 2.0
T(1) – T(3) 0.5 3.0
T(2) – T(3) 2.0 1.0

51
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

To illustrate how the data values in a variogram are calculated, consider the above slide. The variogram
(2  (1))2 9
value for the first comparison of the first time point to the second time point is   4.5 ,
2 2
with a time interval of 1 3  2 .

Example Calculations
Subject Time Residual
1 1 2
1 3 -1
1 4 1
2 1 -2
2 2 3

8  0.5  0.5  8  4.5  2


Variance   3.92
6

52
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-43

The value of the variance calculation that compares the first residual for Subject 1 to the first residual for
(2  (2)) 2
Subject 2 is  8.
2

Variogram with Serial Correlation Only

 (u)   w2 (1   (u))
Var( )   w
2

Process
Serial Variance
Correlation

Time Interval
53
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The fitted nonparametric curve in the sample variogram can also be used to determine which error
components need to be addressed in the covariance structure. For example, suppose the variance
of the error terms is due only to within-subject variability. The corresponding variogram,  (u ) with
u representing the time interval, would be based on the autocorrelation function. At a time interval
of 0, the autocorrelation is 1 and  (u )  0 . As the time interval approaches infinity, the autocorrelation
approaches 0 and  (u ) approaches the process variance. Typically,  (u ) is an increasing function
of u because the autocorrelation is positive and decreases as the time interval increases.
Sample variograms are better mechanisms to examine serial correlation compared to autocorrelation
functions created from the CORR procedure because the nonparametric smoothing of the variogram
recognizes the scarcity of the data at the higher time intervals and incorporates information from the
sample variogram at smaller time intervals. In comparison, autocorrelation functions might become very
unstable with sparse data and give a misleading impression about the serial correlation. Furthermore,
the autocorrelation function is most effective for studying equally spaced data that are approximately
stationary. Autocorrelations are more difficult to estimate with irregularly spaced data unless you round
the observation times, in other words, rounding the CD4+ observation time values to the nearest year
(Diggle, Heagerty, Liang, and Zeger 2002).

 As was seen earlier, the variogram is related to the autocorrelation function:

 ( )   2 (1   ( ))

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-44 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Variogram with Serial Correlation


and Measurement Error
 (u)   2   w2 (1   (u))
Var( )   w   2
2

Process
Variance
Serial
Correlation
Measurement
Error

Time Interval
54
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In some situations the measurement process introduces a component of random variation. Now
the variance of the error terms includes not only the within-subject variability, but also the measurement
error. A characteristic property of models with measurement error is that  (u ) does not tend
to 0 as u tends to 0. If the data include duplicate measurements at the same time, then you can estimate
the measurement error directly as one-half the average squared differences between such duplicates.
In the CD4+ example, there are no duplicate measurements within subject. Therefore, the estimation
of the measurement error involves the extrapolation of the nonparametric curve, and this estimate
of the measurement error might be strongly model-dependent (Diggle, Heagerty, Liang, and Zeger 2002).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-45

Variogram with Serial Correlation, Measurement Error,


and Random Effects
 (u)   2   w2 (1   (u))
Var( )   w   2  2
2

Process
Random Variance
Effects
Serial
Correlation
Measurement
Error

Time Interval
55
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In some situations the model might include all three components of error. Now the variance of the error
terms includes the within-subject variability, the between-subject variability, and the measurement error.
The corresponding variogram has the same form as the variogram for the model with serial correlation
and measurement error. However, as the time interval approaches infinity,  (u ) approaches a value less
than the variance of the error terms (which is approximately equal to the estimate of the process variance).
The difference between the plateau of the fitted line and the process variance is the error pertaining
to between-subject variability or random effects.
Therefore, the sample variogram can indicate whether the model fitted in PROC MIXED needs
the LOCAL option (to account for measurement error), a covariance structure that incorporates the serial
correlation, and/or a RANDOM statement to specify random effects. Although serial correlation would
appear to be a natural feature of any longitudinal model, in some situations the serial correlation might
be dominated by the combination of random effects and measurement error. The fitted nonparametric
curve in the sample variogram would have a slope near 0, which would indicate that a covariance
structure incorporating serial correlation would be an unnecessary refinement of the model (Diggle,
Heagerty, Liang, and Zeger 2002). A covariance structure such as compound symmetry would
be sufficient.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-46 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

If serial correlation is evident in the sample variogram, two popular choices of covariance structures
for unequally spaced longitudinal data are the spatial exponential structure, which incorporates
the exponential serial correlation function, and the spatial Gaussian structure, which incorporates
the Gaussian serial correlation function. However, precise characterization of the serial correlation
function is extremely difficult in the presence of several random effects. You should not ignore
the possible presence of any serial correlation, because this might result in less efficient model-based
inferences.

 Verbeke and Molenberghs (2000) suggest that including serial correlation, if present, is more
important than correctly specifying the serial correlation function. They recommend that your
efforts should be in the detection of serial correlation, rather than specifying the actual shape
of the serial correlation function, which seems to be of minor importance.

2.07 Multiple Choice Poll

What can you conclude if the intercept of the fitted nonparametric curve in
the sample variogram has values much greater than 0?
a. Serial correlation error needs to be addressed in the covariance
structure.
b. Measurement error needs to be addressed in the covariance structure.
c. Random effects error needs to be addressed in the covariance structure.
d. It is irrelevant because the slope of the fitted nonparametric curve
determines the source of the error component.

56
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-47

Sample Variogram

Example: Create a sample variogram with the aids data set. First include the VARIOGRAM
and VARIANCE macros (programs long02d02a.sas and long02d02b.sas). Then use the
VARIOGRAM macro to create the data set varioplot and use the VARIANCE macro to
estimate the process variance. Use PROC SGPLOT to display the sample variogram with a
scatter plot of the variogram values by time interval values as the background and a penalized
B-spline curve in the foreground. Fit a vertical reference line at the process variance.
/* long02d02.sas */

%include ".\long02d02a.sas";

%include ".\long02d02b.sas";

%variogram (data=aids,resvar=cd4_scale,clsvar=,
expvars=time age cigarettes drug partners
depression time*age time*cigarettes time*drug
time*partners time*depression time*time
time*time*time,id=id,time=time,maxtime=12);

%variance(data=aids,id=id,resvar=cd4_scale,clsvar=,
expvars=time age cigarettes drug partners
depression time*age time*cigarettes time*drug
time*partners time*depression time*time
time*time*time,subjects=369,maxtime=12);

Variogram-Based Estimate of the Process Variance

Obs nonmissing total average

1 2813683 32950045.18 11.7106

The variogram-based estimate of the process variance is 11.71.


proc sgplot data=varioplot noautolegend;
scatter y=variogram x=time_interval / markerattrs=(color=cyan
symbol=circle);
pbspline y=variogram x=time_interval / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
refline 11.71 / label="Process Variance";
xaxis values=(0 to 6 by .5) label='Time Interval';
yaxis values=(0 to 30 by 2) label='Variogram Values';
title 'Sample Variogram of CD4+ Data';
run;
In the data set VARIOPLOT, the variogram values are in the variable variogram while the time interval
values are in the variable time_interval.

 The code for both macros is provided in an appendix.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-48 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Because the fitted penalized B-spline curve does not tend toward zero as the time interval tends to zero, the
sample variogram clearly shows that the model has some measurement error (error in the measurement
process itself). Furthermore, the fitted line does not have a slope of zero, which indicates that there is serial
correlation in the model (cd4 cell counts vary over time within subject). The serial correlation function
appears to be relatively linear. Finally, because the fitted line does not reach the process variance, some error
due to random effects is evident in the model (unexplained between-subject variability).
Example: Create a plot of the autocorrelation function using PROC SGPLOT.
data varioplot;
set varioplot;
autocorr=1-(variogram/11.71);
run;

proc sgplot data=varioplot noautolegend;


pbspline y=autocorr x=time_interval / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(0 to 6 by .5) label='Time Interval';
yaxis values=(0 to 1 by .1) label='Autocorrelation Values';
title 'Autocorrelation Plot of CD4+ Data';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-49

The graph of the autocorrelation function shows that the correlation within subject decreases from
approximately 0.60 to 0.10 within the range of the data. Therefore, there is error associated with serial
correlation evident in the model and a structure that allows for this decreasing correlation should
be selected.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-50 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Information Criteria

• Akaike Information Criterion (AIC) tends to choose more complex models.


• Schwarz’s Bayesian Information Criterion (BIC) tends to choose simpler
models.
• Because excessively simple models have inflated Type I error rates, AIC
appears to be the most desirable in practice.

59
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In a simulation study conducted by Guerin and Stroup (2000), several information criteria were compared
in terms of their ability to choose the right covariance structure. In terms of Type I error control, assuming
that the Kenward-Roger (KR) adjustment is used, Guerin and Stroup showed that it is better to err
in the direction of a more complex covariance structure. More complex covariance structures tend to have
inflated Type I error rates only if you fail to use the KR adjustment, while excessively simple covariance
structures have inflated Type I error rates that the degrees of freedom adjustment cannot correct.
However, because complex covariance structures reduce power, erring too far in the direction
of complexity is also not recommended. Guerin and Stroup believe that the AIC is the most desirable
compromise in practice. However, if the sample size is relatively small, the finite-sample corrected
version of AIC, called AICC, might be the most desirable.

 Information criteria provide only rules of thumb to discriminate between several models.
These criteria should never be used or interpreted as formal statistical tests of significance.
When comparing several models with the same mean model but with different covariance
structures, use REML as the estimation method.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-51

Selecting Covariance Structures


Equal Unequal Different time points
Spacing Spacing across subjects
Compound
Yes Yes Yes
Symmetry

Unstructured Yes Yes No

AR(1) Yes No No

Toeplitz Yes No No

Spatial
Covariance Yes Yes Yes
Structures 60
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

A common recommendation is to graph the information criteria by covariance structure. However, choose
only the covariance structures that make sense given the data. For example, because the aids data set has
unequally spaced time points and different time points across subjects, only compound symmetry
and the spatial covariance structures are appropriate covariance structures. If the time points were equally
spaced, then the AR(1) and Toeplitz covariance structures could have been examined. If the time points
were unequally spaced but had the same time points across subjects, then the unstructured covariance
structure could have been examined.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-52 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Information Criteria

Example: Calculate and plot the AIC, AICC, and BIC information criteria for the models and use the
covariance structures of compound symmetry, spatial power, spatial linear, spatial exponential,
spatial Gaussian, and spatial spherical. The tables in ODS that contain the information criteria
are fitstatistics.

/* long02d03.sas */

ods select none;


proc mixed data=aids;
model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*partners time*drug time*cigarettes
time*time time*time*time;
repeated / type=cs subject=id;
ods output fitstatistics=csmodel;
run;

proc mixed data=aids;


model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*partners time*drug time*cigarettes
time*time time*time*time;
repeated / type=sp(pow)(time) local subject=id;
ods output fitstatistics=powmodel;
run;

proc mixed data=aids;


model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*partners time*drug time*cigarettes
time*time time*time*time;
repeated / type=sp(lin)(time) local subject=id;
ods output fitstatistics=linmodel;
run;

proc mixed data=aids;


model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*partners time*drug time*cigarettes
time*time time*time*time;
repeated / type=sp(exp)(time) local subject=id;
ods output fitstatistics=expmodel;
run;
proc mixed data=aids;
model cd4_scale=time age cigarettes drug partners
depression time*age time*depression

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-53

time*partners time*drug time*cigarettes


time*time time*time*time;
repeated / type=sp(gau)(time) local subject=id;
ods output fitstatistics=gaumodel;
run;

proc mixed data=aids;


model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*partners time*drug time*cigarettes
time*time time*time*time;
repeated / type=sp(sph)(time) local subject=id;
ods output fitstatistics=sphmodel;
run;

ods select all;

data model_fit;
length model $ 7 type $ 4;
set csmodel (in=cs)
powmodel (in=pow)
linmodel (in=lin)
expmodel (in=exp)
gaumodel (in=gau)
sphmodel (in=sph);
if substr(descr,1,1) in ('A','B');
if substr(descr,1,3) = 'AIC' then type='AIC';
if substr(descr,1,4) = 'AICC' then type='AICC';
if substr(descr,1,3) = 'BIC' then type='BIC';
if cs then model='CS';
if pow then model='SpPow';
if lin then model='SpLin';
if exp then model='SpExp';
if gau then model='SpGau';
if sph then model='SpSph';
run;
The IN= option in the DATA step detects whether the data set contributed to an observation when you
read multiple SAS data sets in one DATA step. The specified variable is a temporary numeric variable
with values of 0 (indicates that the data set did not contribute to the current observation) or 1 (indicates
that the data set did contribute to the current observation). The SUBSTR function extracts from the
variable descr the necessary information to put in the variable type that identifies the information criteria.
proc sgplot data=model_fit;
scatter y=value x=model / group=type;
xaxis label='Covariance Structure';
yaxis values=(11700 to 11900 by 20) label='Model Fit Values';
title 'Model Fit Statistics by Covariance Structure';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-54 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The covariance structures, spatial exponential, spatial linear, spatial power, and spatial spherical, all seem
to have the best fit. The spatial Gaussian model fit statistics are somewhat higher than the other spatial
structures. The compound symmetry covariance structure is clearly inferior. The AIC and AICC values
are identical across covariance structures because of the large sample size. For small sample sizes,
the AICC model fit statistic might be useful.

In the simulation study performed by Guerin and Stroup (2000), most of the gain from modeling
the covariance structures comes from “getting close”. Therefore, there will probably be a trivial impact
on the analysis if any of the four spatial covariance structures with the smallest information criteria are
used. Their simulation study focused on the Type I error rates, where the effects of simplistic covariance
structures tend to be more obvious.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-55

2.08 Multiple Choice Poll

Which of the following structures is not appropriate in an unbalanced design


with unequally spaced time points and different time points across subjects?
a. Compound symmetry
b. Unstructured
c. Spatial power
d. Spatial Gaussian

62
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Summary of Selecting Covariance Structures

• Results from the sample variogram indicate that measurement error, serial
correlation, and error associated with random effects are evident in the
model.
• Spatial exponential, spatial linear, spatial power, and spatial spherical all
seem to have the best model fit statistics.
• Spatial power is the selected covariance structure.

64
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-56 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

In conclusion, the sample variogram is a useful graph in the selection of a covariance structure. It is
constructed from the ordinary least squares residuals from a complex mean model. For this model,
the results of the sample variogram clearly show that the LOCAL option is needed in the REPEATED
statement. This option adds an additional variance parameter to the R matrix. The results also show that
serial correlation is evident (meaning that the correlations change over time) and that the pattern seems
to be linear. However, the model fit statistics show that the spatial exponential, spatial linear, spatial
power, and spatial spherical covariance structures all seem to have the best fit. Although the spatial power
covariance structure will be selected, any of the other three spatial covariance structures would be
appropriate.

The results of the sample variogram also show that some error associated with random effects is evident
in the model. Therefore, a RANDOM statement might be needed. Models with RANDOM statements will
be examined in a later section.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.2 Evaluating Covariance Structures 2-57

Exercises

2. Evaluating Covariance Structures


a. Include the VARIOGRAM and VARIANCE macros (programs long02d02a.sas and
long02d02b.sas). Pass the necessary information to the macros to create the varioplot data set
and to estimate the process variance. Specify as explanatory variables the three main effects,
the three two-factor interactions, and the quadratic and cubic effects of hours. Create a plot of the
sample variogram using PROC SGPLOT and fit a penalized B-spline curve with a smoothing
factor of 50 and 5 knots, fit a vertical reference line at the estimate of the process variance,
specify a vertical axis of 0 to 100, and specify a horizontal axis of 0 to 1.
1) Interpret the graph. What sources of error are evident in the model?
2) What specifications in PROC MIXED might be useful in dealing with these sources of error?
b. Plot the autocorrelation function by time interval using PROC SGPLOT with the penalized
B-spline curve.
1) What information can be gleaned from this plot that might be useful in building a model in
PROC MIXED?
c. Generate a graph of the model fit statistics by covariance structure. Select the following
covariance structures: compound symmetry, unstructured, spatial power, spatial exponential,
spatial Gaussian, spatial spherical, and spatial linear. Use ODS to save the model fit statistics
and graph the AIC, AICC, and BIC statistics.
1) Which covariance structures appear to have the best fit?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-58 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.09 Multiple Choice Poll

Which of the following is indicated if the fitted nonparametric curve has a


slope of 0 in the sample variogram?
a. There is no measurement error component.
b. There is no random error component.
c. There is no serial correlation component (correlations do not change
over time).
d. Nothing is indicated, because the slope is irrelevant to the identification
of the source of error.

66
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.3 Model Development and Interpretation

Objectives
• Illustrate how to specify heterogeneity in the residual covariance parameters.
• Fit a parsimonious mean model.
• Create an interaction plot.

70
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-59

Heterogeneity in the Covariance Parameters


Time from Seroconversion

Before After

1 b
T 12
 bT  bT
13 14
1 a a a
T12T T13 14

1 b b 1 a a
T T23 24 T T23 24

 2B 1  bT 34  2A 1 a
T 34

1 1
71
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The linear mixed models presented thus far assume that the covariance parameters are the same across
subgroups of subjects. However, PROC MIXED has the flexibility of allowing heterogeneity in the
covariance parameters across subgroups of subjects. For example, suppose there is evidence that the
variance of the CD4+ cell counts is much greater before seroconversion compared to after seroconversion.
A better fitting model might have covariance parameters defined before seroconversion and after
seroconversion. The covariance structure still remains the same (in this example the spatial power
covariance structure), but the covariance parameters are allowed to change across the two subgroups.

GROUP= Option

The GROUP= option


• defines the effect specifying heterogeneity in the residual covariance
parameters
• can result in strange covariance patterns. Therefore, you must exercise
caution using the GROUP effect
• can greatly increase the number of estimated covariance parameters,
which might adversely affect the optimization process.

72
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-60 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

PROC MIXED allows heterogeneity in the residual covariance parameters with the GROUP= option.
All observations having the same level of the GROUP effect have the same covariance parameters. Each
new level of the GROUP effect produces a new set of covariance parameters with the same structure as
the original group.

Example for One Subject


T1 T2 T3 T4

T1  B2  B2 BT 12

T2  B2 BT 12
 B2
0
T3  A2  A2 AT 34

T4
0  A2 AT 34
 A2

73
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The covariance structure for repeated measurements is still a block-diagonal covariance structure where
the block corresponds to the covariance structure for each subject. However, in this example the
GROUP= option now subdivides the block based on the GROUP effect. For example, suppose one
subject had four measurements. Two measurements were before seroconversion and two were after
seroconversion. Furthermore, you define the GROUP effect as the time before and after seroconversion.
The covariance structure within the block for this subject now has variance and covariance parameter
estimates before seroconversion and after seroconversion. For two measurements where one is before
and one is after seroconversion, the covariance is 0. In this example, the GROUP= option indicates
a covariance structure such that observations within subject and with a different GROUP effect value are
assumed to be independent.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-61

Heterogeneity in the Covariance Parameters

Example: Create a plot of the variance of CD4+ cells over time using PROC SGPLOT. Fit a penalized
B-spline curve with a smoothness of 50 and 25 knots. Then create the variable timegroup that
groups the observations into the appropriate time groups. Finally, fit a longitudinal model
in PROC MIXED that allows the covariance parameters to vary by timegroup.
/* long02d04.sas */

proc rank data=aids groups=50 out=ranks;


var time;
ranks timegrp;
run;

proc means data=ranks nway noprint;


var cd4_scale;
class timegrp;
output out=means var=var mean(time)=meantime;
run;

proc sgplot data=means noautolegend;


pbspline y=var x=meantime / nomarkers smooth=50 nknots=25;
yaxis values=(0 to 20 by 1) label="Variance";
xaxis label="Mean Time";
title 'Variance of Scaled CD4 by Time';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-62 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The graph shows that the variance of CD4+ cells is greater before seroconversion compared to after
seroconversion. This makes sense from a subject matter point of view because healthy people usually
have more variability in their immune cells than unhealthy people. This is a useful graph to create during
your initial data exploration.
data aids;
set aids;
timegroup=1*(time le 0)+2*(time gt 0);
run;

proc mixed data=aids;


class timegroup;
model cd4_scale = time age cigarettes drug partners depression
time*age time*cigarettes time*drug time*partners
time*depression time*time time*time*time
/ ddfm=kr(firstorder) solution;
repeated / type=sp(pow)(time) local subject=id group=timegroup
r=13 rcorr=13;
title 'Longitudinal Model with Heterogeneity in the '
'Covariance Parameters';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-63

Selected REPEATED statement option:


GROUP= defines an effect specifying heterogeneity in the residual covariance structure.
Continuous variables are permitted as arguments to the GROUP=option. PROC MIXED
does not sort by the values of the continuous variable; rather, it considers the data
to be from a new subject or group whenever the value of the continuous variable changes
from the previous observation.
Longitudinal Model with Heterogeneity in the Covariance Parameters

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structure Spatial Power
Subject Effect id
Group Effect timegroup
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

The Model Information table shows the group effect is timegroup.


Class Level Information

Class Levels Values

timegroup 2 1 2

Dimensions

Covariance Parameters 5
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12

There are now five covariance parameters being estimated rather than three.
Number of Observations

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-64 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 12668.04910184
1 2 11838.24404707 123.48030790
2 2 11731.67873760 20.79399338
3 2 11675.26706982 2.60777627
4 2 11623.53197659 0.00151927
5 2 11620.88033698 0.00038964
6 1 11619.31012203 0.00001388
7 1 11619.25830218 0.00000002
8 1 11619.25821833 0.00000000

Convergence criteria met.

Estimated R Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 15.8963 9.2300 6.4196


2 9.2300 15.8963 9.1237
3 6.4196 9.1237 15.8963
4 9.6153 6.4553 6.1021 5.7820 5.4771 5.1774
5 6.4553 9.6153 6.4629 6.1239 5.8010 5.4836
6 6.1021 6.4629 9.6153 6.4784 6.1367 5.8010
7 5.7820 6.1239 6.4784 9.6153 6.4764 6.1221
8 5.4771 5.8010 6.1367 6.4764 9.6153 6.4629
9 5.1774 5.4836 5.8010 6.1221 6.4629 9.6153
10 4.9044 5.1944 5.4950 5.7992 6.1221 6.4764
11 4.6554 4.9307 5.2161 5.5048 5.8113 6.1477
12 4.4007 4.6610 4.9307 5.2037 5.4934 5.8113

Estimated R Matrix for Subject 13

Row Col10 Col11 Col12

1
2
3
4 4.9044 4.6554 4.4007
5 5.1944 4.9307 4.6610
6 5.4950 5.2161 4.9307
7 5.7992 5.5048 5.2037
8 6.1221 5.8113 5.4934
9 6.4764 6.1477 5.8113
10 9.6153 6.4899 6.1349
11 6.4899 9.6153 6.4629
12 6.1349 6.4629 9.6153

PROC MIXED estimates the variance and correlation coefficient for the subjects before seroconversion
and after seroconversion. The variance estimates (15.90 for time group 1 and 9.62 for time group 2) seem
to be quite different across time groups. With timegroup as the GROUP= variable, the measurements
before seroconversion are assumed to be independent of the measurements after seroconversion.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-65

Estimated R Correlation Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 1.0000 0.5806 0.4038


2 0.5806 1.0000 0.5739
3 0.4038 0.5739 1.0000
4 1.0000 0.6714 0.6346 0.6013 0.5696 0.5385
5 0.6714 1.0000 0.6722 0.6369 0.6033 0.5703
6 0.6346 0.6722 1.0000 0.6738 0.6382 0.6033
7 0.6013 0.6369 0.6738 1.0000 0.6736 0.6367
8 0.5696 0.6033 0.6382 0.6736 1.0000 0.6722
9 0.5385 0.5703 0.6033 0.6367 0.6722 1.0000
10 0.5101 0.5402 0.5715 0.6031 0.6367 0.6736
11 0.4842 0.5128 0.5425 0.5725 0.6044 0.6394
12 0.4577 0.4847 0.5128 0.5412 0.5713 0.6044

Estimated R Correlation
Matrix for Subject 13

Row Col10 Col11 Col12

1
2
3
4 0.5101 0.4842 0.4577
5 0.5402 0.5128 0.4847
6 0.5715 0.5425 0.5128
7 0.6031 0.5725 0.5412
8 0.6367 0.6044 0.5713
9 0.6736 0.6394 0.6044
10 1.0000 0.6750 0.6380
11 0.6750 1.0000 0.6722
12 0.6380 0.6722 1.0000

Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Variance id timegroup 1 13.1181


SP(POW) id timegroup 1 0.4939
Variance id timegroup 2 6.8370
SP(POW) id timegroup 2 0.8970
Residual 2.7783

The correlations of the measurements within subject after seroconversion are larger.
Fit Statistics

-2 Res Log Likelihood 11619.3


AIC (Smaller is Better) 11629.3
AICC (Smaller is Better) 11629.3
BIC (Smaller is Better) 11648.8

The AIC information criterion is lower than the full model without the group effect (11735.3 to 11629.3).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-66 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 1048.79 <.0001

Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.5629 0.2212 1575 34.19 <.0001


time -0.8874 0.1054 1355 -8.42 <.0001
age 0.01604 0.01582 717 1.01 0.3109
cigarettes 0.4577 0.06747 1438 6.78 <.0001
drug 0.4348 0.2075 2001 2.10 0.0362
partners 0.05472 0.02357 2031 2.32 0.0204
depression -0.01769 0.008670 2091 -2.04 0.0415
time*age -0.01248 0.006109 595 -2.04 0.0415
time*cigarettes -0.09965 0.02990 1101 -3.33 0.0009
time*drug -0.03838 0.08663 1537 -0.44 0.6578
time*partners -0.01153 0.01049 1769 -1.10 0.2718
time*depression -0.00029 0.003729 1569 -0.08 0.9370
time*time -0.1040 0.03145 1103 -3.31 0.0010
time*time*time 0.03702 0.006842 1103 5.41 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 1355 70.83 <.0001


age 1 717 1.03 0.3109
cigarettes 1 1438 46.01 <.0001
drug 1 2001 4.39 0.0362
partners 1 2031 5.39 0.0204
depression 1 2091 4.16 0.0415
time*age 1 595 4.17 0.0415
time*cigarettes 1 1101 11.11 0.0009
time*drug 1 1537 0.20 0.6578
time*partners 1 1769 1.21 0.2718
time*depression 1 1569 0.01 0.9370
time*time 1 1103 10.93 0.0010
time*time*time 1 1103 29.28 <.0001

The higher order terms for time and the time by age and time by cigarettes interactions are still
significant.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-67

2.10 Multiple Choice Poll

Which one of the following statements is true regarding PROC MIXED?


a. Continuous variables are not permitted as arguments to the
GROUP=option.
b. PROC MIXED has the flexibility of allowing the type of covariance
structure to change across subgroups of subjects within the same model.
c. Observations with a different GROUP effect value are assumed to be
independent even if they are within-subject observations.
d. The number of values for the effect in the GROUP= option does not
impact the number of estimated covariance parameters.

75
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Model Development

• Use REML and a complex mean model to choose the appropriate


covariance structure.
• Use ML to eliminate unnecessary terms one at a time from the complex
mean model.
• Refit the final model using REML.

77
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

After an appropriate covariance structure is selected, model-building efforts should be directed at


simplifying the mean structure of the model. Because the model should be hierarchically well-formulated,
the first step is to evaluate the interactions. One recommended approach is to eliminate the interactions
one at a time, starting with the least significant interaction. If you use the model fit statistics such as AIC,
then you must use the ML estimation method. However, after the final model is chosen, refit the model
using REML because REML estimators are usually preferred.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-68 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Another approach is to compute a likelihood ratio test that compares two models, the full model with all
of the interactions and the reduced model with just a subset of terms. The difference between the –2 log
likelihoods for the full and reduced models is the value of the test statistic. The likelihood ratio test
comparing the full and reduced models is valid only under ML estimation.

ML versus REML

• Differences in the model fit statistics under REML reflect differences in the
covariance parameter estimates.
• Differences in the model fit statistics under ML reflect differences in all the
parameter estimates.
• When comparing different mean models, differences under ML are a better
reflection of the importance of the fixed effects than differences under
REML.

78
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

If you reduce the mean model simply by examining p-values, then either estimation method is
appropriate. However, if you reduce the mean model using model fit statistics such as AIC and BIC,
then the estimation method must be ML. Model fit statistics under REML are used to select the
covariance structure. Likelihood ratio tests under REML can be used to assess the importance
of the covariance parameter estimates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-69

Evaluating Fixed Effects

Example: Using the spatial power covariance structure, fit the full model that allows the covariance
parameters to vary by timegroup with all of the main effects, the time by main effect
interactions, and the quadratic and cubic effects for time. Use the ML estimation method.
/* long02d05.sas */
proc mixed data=aids method=ml;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*depression time*partners time*drug
time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;

Longitudinal Model with Heterogeneity in the Spatial Power Covariance Parameters

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structure Spatial Power
Subject Effect id
Group Effect timegroup
Estimation Method ML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

The estimation method is now maximum likelihood.


Class Level Information

Class Levels Values

timegroup 2 1 2

Dimensions

Covariance Parameters 5
Columns in X 14
Columns in Z 0
Subjects 369
Max Obs per Subject 12

Number of Observations

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-70 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Log Like Criterion

0 1 12584.82997708
1 2 11760.17221976 125.93204406
2 2 11652.56198545 21.25711718
3 2 11595.75541363 3.10321895
4 2 11543.42016538 0.00176337
5 2 11540.49577186 0.00047876
6 1 11538.58522785 0.00001937
7 1 11538.51370640 0.00000004
8 1 11538.51355638 0.00000000

Convergence criteria met.

Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Variance id timegroup 1 13.0016


SP(POW) id timegroup 1 0.4925
Variance id timegroup 2 6.7492
SP(POW) id timegroup 2 0.8992
Residual 2.7836

Fit Statistics

-2 Log Likelihood 11538.5


AIC (Smaller is Better) 11576.5
AICC (Smaller is Better) 11576.8
BIC (Smaller is Better) 11650.8

The model fit statistics are not comparable to the ones produced under REML.
Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 1046.32 <.0001

Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.5606 0.2204 1584 34.31 <.0001


time -0.8866 0.1049 1358 -8.45 <.0001
age 0.01604 0.01575 720 1.02 0.3088
cigarettes 0.4583 0.06720 1446 6.82 <.0001
drug 0.4354 0.2068 2012 2.11 0.0354
partners 0.05473 0.02350 2042 2.33 0.0199
depression -0.01767 0.008643 2102 -2.04 0.0410
time*age -0.01250 0.006069 588 -2.06 0.0399

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-71

time*depression -0.00033 0.003714 1562 -0.09 0.9299


time*partners -0.01150 0.01045 1763 -1.10 0.2713
time*drug -0.03889 0.08627 1528 -0.45 0.6522
time*cigarettes -0.09966 0.02975 1092 -3.35 0.0008
time*time -0.1034 0.03132 1110 -3.30 0.0010
time*time*time 0.03690 0.006810 1112 5.42 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 1358 71.37 <.0001


age 1 720 1.04 0.3088
cigarettes 1 1446 46.52 <.0001
drug 1 2012 4.43 0.0354
partners 1 2042 5.43 0.0199
depression 1 2102 4.18 0.0410
time*age 1 588 4.24 0.0399
time*depression 1 1562 0.01 0.9299
time*partners 1 1763 1.21 0.2713
time*drug 1 1528 0.20 0.6522
time*cigarettes 1 1092 11.22 0.0008
time*time 1 1110 10.90 0.0010
time*time*time 1 1112 29.37 <.0001

The time*drug, time*depression, and time*partners interactions are not significant. The first
interaction to eliminate is the least significant interaction, which in this case is time*depression.
Example: Refit the model without time*depression.
proc mixed data=aids method=ml;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*drug time*partners time*cigarettes time*time
time*time*time / solution ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Variance id timegroup 1 13.0027


SP(POW) id timegroup 1 0.4926
Variance id timegroup 2 6.7504
SP(POW) id timegroup 2 0.8991
Residual 2.7831

Fit Statistics

-2 Log Likelihood 11538.5


AIC (Smaller is Better) 11574.5
AICC (Smaller is Better) 11574.8

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-72 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

BIC (Smaller is Better) 11644.9

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 1046.33 <.0001

The AIC information criterion decreased from 11576.5 to 11574.5, which indicates that this model is a
better fitting model.
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.5613 0.2203 1588 34.33 <.0001


time -0.8869 0.1049 1361 -8.46 <.0001
age 0.01600 0.01574 720 1.02 0.3098
cigarettes 0.4584 0.06718 1442 6.82 <.0001
drug 0.4357 0.2068 2011 2.11 0.0352
partners 0.05487 0.02344 2034 2.34 0.0194
depression -0.01809 0.007226 2155 -2.50 0.0124
time*age -0.01246 0.006052 582 -2.06 0.0400
time*drug -0.03903 0.08625 1531 -0.45 0.6509
time*partners -0.01157 0.01042 1758 -1.11 0.2673
time*cigarettes -0.09982 0.02969 1088 -3.36 0.0008
time*time -0.1035 0.03130 1109 -3.31 0.0010
time*time*time 0.03691 0.006811 1112 5.42 <.0001

The next interaction term to be eliminated is time*drug.


Example: Refit the model without time*drug.
proc mixed data=aids method=ml;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*partners time*cigarettes time*time
time*time*time / solution ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Variance id timegroup 1 12.9929


SP(POW) id timegroup 1 0.4921
Variance id timegroup 2 6.7571
SP(POW) id timegroup 2 0.8986
Residual 2.7802

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-73

Fit Statistics

-2 Log Likelihood 11538.7


AIC (Smaller is Better) 11572.7
AICC (Smaller is Better) 11573.0
BIC (Smaller is Better) 11639.2

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 1046.89 <.0001

The AIC information criterion decreased from 11574.5 to 11572.7, which indicates that this model is a
better fitting model.
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.6025 0.2009 1603 37.84 <.0001


time -0.9161 0.08322 1164 -11.01 <.0001
age 0.01553 0.01570 718 0.99 0.3232
cigarettes 0.4617 0.06684 1430 6.91 <.0001
drug 0.3807 0.1671 2097 2.28 0.0229
partners 0.05573 0.02335 2049 2.39 0.0171
depression -0.01811 0.007227 2155 -2.51 0.0123
time*age -0.01216 0.006021 581 -2.02 0.0439
time*partners -0.01245 0.01024 1728 -1.22 0.2243
time*cigarettes -0.1013 0.02954 1091 -3.43 0.0006
time*time -0.1042 0.03128 1110 -3.33 0.0009
time*time*time 0.03709 0.006806 1111 5.45 <.0001

The next interaction term to be eliminated is the time*partners interaction.


Example: Refit the model without time*partners.
proc mixed data=aids method=ml;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Variance id timegroup 1 12.9299


SP(POW) id timegroup 1 0.4905
Variance id timegroup 2 6.7770
SP(POW) id timegroup 2 0.8984

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-74 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Residual 2.7846

Fit Statistics

-2 Log Likelihood 11540.2


AIC (Smaller is Better) 11572.2
AICC (Smaller is Better) 11572.4
BIC (Smaller is Better) 11634.8

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 1046.33 <.0001

The AIC criterion decreased from 11572.7 to 11572.2, which indicates that this model is a better fitting
model. However, do not eliminate terms simply on the basis of the AIC information criterion. Variables
with subject matter importance should be kept in the model. Sometimes nonsignificant results are just as
important as significant results with regard to the importance to the field of research.
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.6235 0.2003 1602 38.06 <.0001


time -0.9199 0.08316 1167 -11.06 <.0001
age 0.01604 0.01569 716 1.02 0.3071
cigarettes 0.4607 0.06679 1425 6.90 <.0001
drug 0.3695 0.1670 2100 2.21 0.0270
partners 0.04155 0.02028 2157 2.05 0.0406
depression -0.01791 0.007229 2156 -2.48 0.0133
time*age -0.01234 0.006023 581 -2.05 0.0410
time*cigarettes -0.1007 0.02954 1089 -3.41 0.0007
time*time -0.09778 0.03077 1086 -3.18 0.0015
time*time*time 0.03619 0.006763 1080 5.35 <.0001

All of the interaction terms are now significant at the 0.05 significance level. The variable age should not
be eliminated because it is involved in an interaction.
Example: Refit the final model using the REML estimation.
proc mixed data=aids;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Heterogeneity in the '
'Spatial Power Covariance Parameters';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-75

Partial Output
Covariance Parameter Estimates

Cov Parm Subject Group Estimate

Variance id timegroup 1 13.0227


SP(POW) id timegroup 1 0.4921
Variance id timegroup 2 6.8546
SP(POW) id timegroup 2 0.8969
Residual 2.7800

Fit Statistics

-2 Res Log Likelihood 11601.2


AIC (Smaller is Better) 11611.2
AICC (Smaller is Better) 11611.2
BIC (Smaller is Better) 11630.7

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 1049.09 <.0001

Notice the information criteria are quite different using REML versus ML.
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.6252 0.2009 1594 37.95 <.0001


time -0.9200 0.08350 1161 -11.02 <.0001
age 0.01605 0.01575 713 1.02 0.3086
cigarettes 0.4600 0.06702 1419 6.86 <.0001
drug 0.3696 0.1674 2090 2.21 0.0274
partners 0.04155 0.02034 2148 2.04 0.0411
depression -0.01789 0.007246 2146 -2.47 0.0136
time*age -0.01233 0.006054 585 -2.04 0.0422
time*cigarettes -0.1007 0.02966 1094 -3.39 0.0007
time*time -0.09826 0.03087 1081 -3.18 0.0015
time*time*time 0.03628 0.006788 1073 5.34 <.0001

The model with the six main effects, two interactions with time, the quadratic effect of time, and the
cubic effect of time is your final model. The results show that recreational drug use has a positive effect
on the CD4+ cell count. The number of partners also has a positive relationship. There is also a negative
relationship between depression and CD4+ cell count. Finally, time has a cubic relationship with CD4+
cell count, which is what you observed in the graph showing the average trend.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-76 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.11 Multiple Choice Poll

Which one of the following statements is true regarding ML and REML


estimation methods?
a. Use ML to choose the appropriate covariance structure.
b. Differences in the model fit statistics under REML reflect differences in
the covariance estimates.
c. The likelihood ratio test comparing the full and reduced mean models is
valid only under REML estimation.
d. You can use either estimation method when you reduce the mean model
using model fit statistics such as AIC and BIC.

80
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Illustrating Interactions

The model has two significant interactions:


• ti me*age
• ti me*cigarettes

How do you interpret them?

82
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

A useful way to explain significant interactions is to graph them. The steps below show how to visualize
the interaction between time and age.
1. Create a data set with plotting points. These points should include the median for each explanatory
variable not involved in the interaction, and the 5th , 25th , 50th, 75th , and 95th percentiles of time
and age.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-77

2. Concatenate the plotting points data set with the aids data set.

3. Create an output data set in PROC MIXED with the predictions based on ̂ .
4. Graph the predictions of the observations with the plotting points by time and age to illustrate how
the slope for time differs by the level of age.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-78 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Illustrating Interactions

Example: Illustrate the time*cigarettes and time*age interactions. Use all of the values of cigarettes.
/* long02d06.sas */
proc means data=aids noprint;
var time age;
output out=percentiles p5=time_p5 age_p5 p25=time_p25 age_p25
p50=time_p50 age_p50 p75=time_p75 age_p75 p95=time_p95
age_p95;
run;
The values of interest are the 5th , 25th , 50th, 75th , and 95th percentiles.
data _null_;
set percentiles;
call symput('time_p5',time_p5);
call symput('time_p25',time_p25);
call symput('time_p50',time_p50);
call symput('time_p75',time_p75);
call symput('time_p95',time_p95);
run;
Macro variables are created for the percentiles of interest.
proc means data=aids noprint;
var age drug partners depression;
output out=plot median=age drug partners depression;
run;
The MEANS procedure is used to create a data set with the medians of the numeric variables not involved
in the interaction.
data plot;
set plot;
do cigarettes = 0 to 4;
do time = &time_p5,&time_p25,&time_p50,&time_p75,&time_p95;
timegroup=1*(time le 0) + 2*(time gt 0);
id+1;
output;
end;
end;
run;
A DATA step with two DO loops creates a data set with the plotting points for the time by cigarette
interaction. The data points include the median for each explanatory variable not involved in the
interaction, the 5th , 25th , 50th, 75th, and 95th percentiles of time, all the values of cigarettes, and two values
of timegroup. An ID variable is also created with values 1 through 25.
data cigplot;
set aids plot;
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-79

ods select none;


proc mixed data=cigplot;
class timegroup;
model cd4_scale=time age cigarettes drug partners
depression time*age time*cigarettes time*time time*time*time
/ outpm=cigpred;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
run;
Selected MODEL statement option:
OUTPM= creates an output data set that contains predicted means and related quantities. The
predicted means are the sum of the xbeta values based on the fixed effects.

 The observations with the plotting points will not be used when PROC MIXED fits the
model. However, the output data set will have predicted means for these observations.
ods select all;
proc sgplot data=cigpred;
pbspline y=pred x=time / group=cigarettes;
where id le 25;
yaxis label="Predicted CD4+ Cell Counts in hundreds";
xaxis label="Time since Seroconversion";
keylegend / title="Packs of Cigarettes Smoked per Day";
title 'Interaction Plot of Time by Cigarette Usage';
run;
Only the observations with the plotting points are plotted by using the WHERE statement.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-80 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The graph shows that heavier smokers have a more precipitous decline in CD4+ cell counts than light or
nonsmokers. Patients who smoked four packs or more a day had the highest predicted CD4+ cell counts
before seroconversion. However, after four years the predicted CD4+ cell counts were nearly equal across
the four cigarette groups. These results agree with the individual profiles in the cigarette usage subgroups
graph in the exploratory data analysis section.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-81

data _null_;
set percentiles;
call symput('age_p5',age_p5);
call symput('age_p25',age_p25);
call symput('age_p50',age_p50);
call symput('age_p75',age_p75);
call symput('age_p95',age_p95);
run;

proc means data=aids noprint;


var cigarettes drug partners depression;
output out=plot1 median= cigarettes drug partners depression;
run;

data plot1;
set plot1;
do age= &age_p5,&age_p25,&age_p50,&age_p75,&age_p95;
do time = &time_p5,&time_p25,&time_p50,&time_p75,&time_p95;
timegroup= 1*(time le 0) + 2*(time gt 0);
id+1;
output;
end;
end;
run;
The next set of programs creates the plotting points for the time*age interaction.
data ageplot;
set aids plot1;
run;

ods select none;


proc mixed data=ageplot;
class timegroup;
model cd4_scale=time age cigarettes drug partners
depression time*age time*cigarettes time*time time*time*time
/ outpm=agepred;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
run;

ods select all;


proc sgplot data=agepred;
pbspline y=pred x=time / group=age;
where id le 25;
yaxis label="Predicted CD4+ Cell Counts in hundreds";
xaxis label="Time since Seroconversion";
keylegend / title="Age in Years relative to arbitrary origin";
title 'Interaction Plot of Time by Age';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-82 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The graph shows that older men have a more precipitous decline in CD4+ cell counts than younger men.
It seems beyond one year after seroconversion, older men have predicted CD4+ cell counts that are lower
than younger men. These results are not consistent with the individual profiles with the age subgroups
graph in the exploratory data analysis section. The difference is the interaction plot is a multivariate plot
while the individual profiles plot is a univariate plot.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.3 Model Development and Interpretation 2-83

Summary of Model Development

• The interactions ti me*age and ti me*cigarettes are significant.


• The variable ti me appears to have a cubic relationship with CD4+ cell
counts.
• Measurements taken before seroconversion seem to have different
covariance parameter estimates (larger variances and smaller correlations)
than measurements taken after seroconversion (smaller variances and
larger correlations).

84
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The model development phase found that the time by age interaction and time by cigarettes interaction
are significant. The interaction plot of time by cigarettes showed that heavier smokers have a more
precipitous decline in CD4+ cell counts than light or nonsmokers. The interaction plot of time by age
showed that older men have a more precipitous decline in CD4+ cell counts than younger men.

The model also validated the graph of the average trend line in the exploratory data analysis section. The
graph of the average trend showed that time appeared to have a cubic relationship with CD4+ cell counts.
The model showed that the cubic effect of time is significant.
There also seems to be some heterogeneity in the covariance structure. The group effect of time before
seroconversion and time after seroconversion improved the fit of the model. The covariance parameter
estimates showed that the variance of the measurements before seroconversion is much larger than the
variance of the measurements after seroconversion. However, for equally spaced time intervals, the
correlation of the measurements before seroconversion is lower than the correlation of the measurements
after seroconversion.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-84 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Exercises

3. Developing and Interpreting Models


a. Reduce the mean model by eliminating unnecessary higher-order terms. Use the ML estimation
method and the spatial exponential covariance structure. Also add a measurement error
component. Use the p-values of the effects along with the AICC statistic to decide which terms to
eliminate. Do not eliminate the main effects.

1) Which reduced model did you choose? Why?


b. For the reduced model, generate another graph of the model fit statistics by covariance structure.
Use the REML estimation method and select only the five spatial covariance structures.

1) Is the spatial exponential covariance structure still one of the best fits?
2) Which spatial covariance structure is a good fit for the complex mean model but a relatively
poor fit for the reduced model?

c. Refit the reduced model using the REML estimation method and the spatial exponential
covariance structure. Also request the correlations from the R matrix and the parameter estimates
for the fixed effects.
1) Interpret the parameter estimates and inferences for the fixed effects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-85

2.4 Random Coefficient Models

Objectives

• Explain the concepts behind the random coefficient models.


• Fit a random coefficient model in PROC MIXED.
• Compute empirical best linear unbiased predictions (EBLUP).
• Examine the common causes of nonconvergence in PROC MIXED and some
solutions.
• Fit a model with both repeated and random effects in PROC MIXED.

88
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Models with Only the REPEATED Statement

• No random effects are included in the model.


• Covariance structure for the data is completely determined by the
covariance structure for the residual error.
• There is an R matrix but no G matrix.
• Usually, the model of choice when the longitudinal data are obtained at
fixed points in time and when the within-subject correlations are
adequately modeled using a specified covariance structure.

89
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-86 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Thus far the longitudinal models in this course all used the REPEATED statement. However, you should
not come to the conclusion that the REPEATED statement is used whenever you have longitudinal data.
Some longitudinal models fit the data better using the RANDOM statement. However, it is generally
recommended that you start with the REPEATED statement rather than the RANDOM statement because
this can reduce the computing time considerably.

Random Coefficient Models

• Random effects representing natural heterogeneity between subjects are


used to describe the covariance structure of the data.
• The regression coefficients for one or more covariates are assumed to be a
random sample from some population of possible coefficients.
• There is an R and a G matrix.
• These models are useful for highly unbalanced data with many repeated
measurements per subject.
• Usually, these are the model of choice when the longitudinal data are not
obtained at fixed points in time and the within-subject correlations are not
adequately modeled by a specified covariance structure.

90
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When the autocorrelation plot shows an autocorrelation function that cannot be easily modeled using
the covariance structures in PROC MIXED, a longitudinal model using the RANDOM statement might
be useful. These models are called random coefficient models because the regression coefficients for one
or more covariates are assumed to be a random sample from some population of possible coefficients.
In longitudinal models, the random coefficients are the subject-specific parameter estimates. Random
coefficient models are useful for highly unbalanced data with many repeated measurements per subject
(Verbeke and Molenberghs 1997).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-87

Random Coefficient Model

y      
where  represents:
• the population average
• parameters that are assumed to be the same for all subjects
and where represents:
• parameters that are allowed to vary over subjects
• subject-specific regression coefficients that reflect the natural
heterogeneity in the population

91
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The random coefficient model assumes that the vector of repeated measurements on each subject follows
a linear regression model where some of the regression parameters are population-specific (fixed-effects),
but other parameters are subject-specific (random-effects). The fixed effect parameter estimates represent
the population average. The subject-specific regression coefficients with time as a random effect reflect
how the response evolves over time for each subject. These subject-specific models can be very flexible,
but in practice polynomials involving time will often suffice. However, extensions of this flexibility, such
as fractional polynomial models or extended spline functions, can be considered as well (Verbeke and
Molenberghs 2000).

In random coefficient models, the covariance structure for the R matrix is the independent covariance
structure, which now accounts for the measurement error.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-88 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Random Coefficient Model

92
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In random coefficient models, the random regression lines deviate from the population regression line. If
you specify the intercept as a random variable, then you enable the intercept for each subject to deviate
from the population intercept. If you specify the slope as a random variable, then you enable the slope for
each subject to deviate from the population slope. For example, if you specify time as a random effect
and a fixed effect in the longitudinal model for the CD4+ cell count data, then you stipulate that there is a
relationship between CD4+ cell counts and time and that this relationship can vary across subjects.

Random Coefficient Model

Yij   0  1 xij  ai  bi xij  ij


* *

Subject-s pecific
Popul a tion i ntercept
devi a tion of i ntercept
a nd s l ope
a nd s l ope

93
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-89

In random coefficient models, the fixed effect parameter estimates represent the expected values of the
population of intercepts and slopes. The random effects for intercept represent the difference between the
intercept for the ith subject and the overall intercept. The random effects for slope represent the difference
between the slope for the ith subject and the overall slope. Random coefficient models also have a random
effect for the within-subject variation. Because there is not enough data on a single subject to estimate its
regression parameters, and to avoid theoretical obstacles, it is assumed that the random effects are
normally distributed random variables. The random effects and random errors are also independent of
each other.

Block-Diagonal Covariance Matrix

G1
0
G2

G3
0 G4

94
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When you specify the SUBJECT= option in the RANDOM statement, a block-diagonal covariance matrix
with identical blocks is created in the G matrix. Complete independence is assumed across subjects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-90 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Unstructured Covariance Matrix


with Two Random Effects
a1 b1 a2 b2
a1  2 a  ab
b1  ab  2b
0
a2  2 a  ab
b2
0  ab  2b

95
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In longitudinal models, it is recommended that the unstructured covariance structure be specified


in the RANDOM statement. PROC MIXED estimates the variances of the intercepts and slopes along
with the covariance between the intercepts and slopes in the G matrix. Specifying the unstructured
covariance structure indicates that you do not want to impose any structure on the variances for intercepts
and variances for slopes, and on the covariance between the intercepts and slopes.

The slide above has two random effects and two subjects, where each block corresponds to a subject.
Notice there is complete independence across subjects. If a represents the intercept and b represents time,
then the variance estimate for the intercept tells you how much the intercepts vary across subjects.
The variance estimate for time represents how much the slopes for time vary across subjects. The
covariance estimate between the intercept and time represents how the change in the intercepts affects
the slopes of time. In other words, it indicates whether the CD4+ cell count depletion over time is affected
by the subject’s CD4+ cell count at seroconversion.
In this example, the unstructured covariance structure is appropriate for the G matrix but not the R matrix
because the issue regarding unequal time intervals across subjects does not pertain to the G matrix.
The covariance structure for the G matrix models the error that represents the natural heterogeneity
between subjects. The within-subject variability, which is directly related to the spacing of measurements,
is modeled by the covariance structure in the R matrix.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-91

2.12 Multiple Choice Poll

Which one of the following statements is true regarding random coefficient


models in longitudinal data analysis?
a. The random effects and random errors are normally distributed and can
be correlated with each other.
b. The random coefficients are subject-specific deviations from the
population parameter estimates.
c. There is an R matrix but no G matrix.
d. There is a G matrix but no R matrix.

96
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Random Coefficient Models


versus Repeated Models
• Random intercept-only models do not enable correlations within subject to
change over time. The V matrix has compound symmetry structure.
• Models with random intercept and slope enable the correlations within
subject to change over time. The V matrix has a structure that enables
the correlations to change over time.

98
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-92 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

A common misconception is that random coefficient models do not take into account the serial correlation
error within subject. However, when you specify the intercept and slope (in this example, time) in a
RANDOM statement, the V matrix enables the correlations within subject to change over time. The
unequal time intervals are taken into account because the Z matrix is used in the computation of the
V matrix. The difference between models with random intercepts and slopes and models with a spatial
covariance structure for the R matrix is that the random coefficient model indirectly models the serial
correlation within subject with the variances and covariances of the intercept and slope. The model with
the REPEATED statement directly models the serial correlation within subject by specifying a spatial
covariance structure for the R matrix.

Eliminating Random Effects

• Include the relevant random effects.


• Delete one random effect from the model, in a hierarchical way, starting
from the highest-order effect.
• Construct a likelihood ratio test comparing the two models.
• Use the REML estimation method.

99
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When you build a random coefficient model, it is necessary to determine which random effects are needed
in the model. Examining the residual profile plots might be helpful, but with 369 subjects the plots can
be cumbersome. One recommended strategy is to include all the relevant random effects. This ensures
that the remaining variability is not due to any missing random effects. However, including high
dimensional random effects with an unstructured covariance matrix leads to complicated covariance
structures and might result in non-convergence of the algorithms in PROC MIXED (Verbeke
and Molenberghs 2000).
After a candidate model is selected, a likelihood ratio chi-square test can be computed by comparing
the candidate model with the reduced model. The mean structure of the model remains the same across
both models, but the number of random effects is reduced by one in the reduced model. Verbeke
and Molenberghs (2000) recommend using the REML estimation method because the REML test statistic
performed slightly better than the ML test statistic. A program illustrating the likelihood ratio test
is shown in an appendix.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-93

 The p-values computed for the likelihood ratio test in this scenario might be slightly biased
because the asymptotic null distribution for the likelihood ratio test statistic for testing hypotheses
regarding random effects is often a mixture of chi-squared distributions rather than the classical
single chi-squared distribution (Verbeke and Molenberghs 2000).

There is also a COVTEST option in PROC MIXED that produces asymptotic standard errors
and Wald Z-tests for the covariance parameter estimates. However, the sample size requirements
for these tests are excessive and often not met (approximately 400 or more subjects).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-94 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Random Coefficient Models

Example: Fit a random coefficient model with random intercepts, and random linear, quadratic, and
cubic slopes for time. Include all the two factor interactions with time as the fixed effects.
Specify the COVTEST, G, and GCORR options. Also specify the V and VCORR option for
subject 13.
/* long02d07.sas */
proc mixed data=aids covtest;
model cd4_scale=time age cigarettes drug partners depression
time*age time*depression time*drug time*partners
time*cigarettes time*time time*time*time / solution
ddfm=kr;
random intercept time time*time time*time*time / type=un subject=id
g gcorr v=13 vcorr=13;
title 'Random Coefficient Model with Cubic Effect of Time';
run;
Selected PROC MIXED statement option:
COVTEST produces asymptotic standard errors and Wald Z-tests for the covariance parameter
estimates.
Selected RANDOM statement options:
G requests that the estimated G matrix be displayed.
GCORR displays the correlation matrix corresponding to the estimated G matrix.
V requests that blocks of the estimated V matrix be displayed. Also, you can specify which
subject’s V matrix to display.
VCORR displays the correlation matrix corresponding to the blocks of the estimated V matrix.
Also, you can specify specify which subject’s correlation matrix to display.

In the RANDOM statement, you must specify INTERCEPT (or INT) as a random effect to indicate
the intercept. PROC MIXED does not include the intercept in the RANDOM statement by default
as it does in the MODEL statement. Furthermore, the effects in the RANDOM statement in combination
with the SUBJECT= option make these random effects deviations from the fixed means. The random
effects must be in the MODEL statement or else you might assume that the fixed effect’s parameter
estimate is 0, which is a questionable assumption.
Random Coefficients Model with Cubic Effect of Time

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structure Unstructured
Subject Effect id
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-95

Dimensions

Covariance Parameters 11
Columns in X 14
Columns in Z per Subject 4
Subjects 369
Max Obs per Subject 12

There are a total of four columns in the Z matrix. These columns represent the intercept, time, time*time,
and time*time*time random effects. The 14 columns in the X matrix represent the parameters in the
mean model.
Number of Observations

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 12668.04910184
1 2 11906.71433484 598549.15664
2 1 11826.06516534 1503342.7659
3 1 11770.20234989 1238618.9132
4 1 11760.73740046 0.01597010
5 1 11735.44729489 0.00600725
6 1 11710.08383953 0.00123057
7 1 11704.71596333 0.00016956
8 1 11704.02578248 0.00000604
9 1 11704.00292721 0.00000001
10 1 11704.00288278 0.00000000

Convergence criteria met.

Estimated G Matrix

Row Effect Subject Col1 Col2 Col3 Col4

1 Intercept 1 7.1562 -1.0342 -0.2397 0.07635


2 time 1 -1.0342 0.8308 0.09795 -0.04079
3 time*time 1 -0.2397 0.09795 0.02921 -0.00973
4 time*time*time 1 0.07635 -0.04079 -0.00973 0.003274

The Estimated G Matrix table shows the estimated variances and covariances of the random effects.
For example, 7.1562 (row 1, column 1) is the variance of the intercepts. The value 0.8308 (row 2, column
2) is the variance of the linear slopes of time. The value -1.0342 (row 1, column 2) is the covariance
of the intercepts and the linear slopes of time.
Estimated G Correlation Matrix

Row Effect Subject Col1 Col2 Col3 Col4

1 Intercept 1 1.0000 -0.4242 -0.5242 0.4988


2 time 1 -0.4242 1.0000 0.6288 -0.7821
3 time*time 1 -0.5242 0.6288 1.0000 -0.9946

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-96 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

4 time*time*time 1 0.4988 -0.7821 -0.9946 1.0000

The Estimated G Correlation Matrix table shows the correlations between random effects. The correlation
between the intercepts and the linear slopes of time is -0.4242 (row 1, column 2) while the correlation
between the linear slopes of time and the quadratic slopes of time is 0.6288 (row 2, column 3).
Estimated V Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 14.5368 9.1018 8.3506 7.4716 6.3868 5.2946 4.3027 3.4378 2.7642


2 9.1018 13.7358 8.1322 7.3743 6.4232 5.4576 4.5785 3.8147 3.2285
3 8.3506 8.1322 12.6479 7.0842 6.3338 5.5635 4.8591 4.2477 3.7836
4 7.4716 7.3743 7.0842 11.6660 6.1635 5.6156 5.1102 4.6706 4.3393
5 6.3868 6.4232 6.3338 6.1635 10.8896 5.6332 5.3689 5.1361 4.9617
6 5.2946 5.4576 5.5635 5.6156 5.6332 10.6001 5.5967 5.5677 5.5459
7 4.3027 4.5785 4.8591 5.1102 5.3689 5.5967 10.7624 5.9367 6.0494
8 3.4378 3.8147 4.2477 4.6706 5.1361 5.5677 5.9367 11.2207 6.4688
9 2.7642 3.2285 3.7836 4.3393 4.9617 5.5459 6.0494 6.4688 11.7569
10 2.4129 2.9402 3.5668 4.1917 4.8898 5.5441 6.1075 6.5768 6.9241
11 2.4242 2.9887 3.6310 4.2545 4.9382 5.5707 6.1108 6.5592 6.8924
12 2.9024 3.4769 4.0667 4.5989 5.1515 5.6419 6.0498 6.3846 6.6371

Estimated V Matrix for Subject 13

Row Col10 Col11 Col12

1 2.4129 2.4242 2.9024


2 2.9402 2.9887 3.4769
3 3.5668 3.6310 4.0667
4 4.1917 4.2545 4.5989
5 4.8898 4.9382 5.1515
6 5.5441 5.5707 5.6419
7 6.1075 6.1108 6.0498
8 6.5768 6.5592 6.3846
9 6.9241 6.8924 6.6371
10 12.0659 7.0550 6.7739
11 7.0550 12.0148 6.7938
12 6.7739 6.7938 11.6555

The Estimated V matrix shows the variances and covariances among the measurements (in this case,
subject 13). The V matrix is calculated by the formula ZGZ' +R . Since the Z matrix has the time values,
the variances and covariances estimated in the V matrix are based on the variances and covariances
of the random effects along with the time values of the measurements. Notice the variances along
the diagonal are not equal. This was not the case in the model with the spatial power covariance structure
(without the GROUP= option) and this illustrates the strength of the random coefficient models.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-97

Estimated V Correlation Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 1.0000 0.6441 0.6158 0.5737 0.5076 0.4265 0.3440 0.2692 0.2114


2 0.6441 1.0000 0.6170 0.5825 0.5252 0.4523 0.3766 0.3073 0.2541
3 0.6158 0.6170 1.0000 0.5832 0.5397 0.4805 0.4165 0.3566 0.3103
4 0.5737 0.5825 0.5832 1.0000 0.5468 0.5050 0.4561 0.4082 0.3705
5 0.5076 0.5252 0.5397 0.5468 1.0000 0.5243 0.4959 0.4646 0.4385
6 0.4265 0.4523 0.4805 0.5050 0.5243 1.0000 0.5240 0.5105 0.4968
7 0.3440 0.3766 0.4165 0.4561 0.4959 0.5240 1.0000 0.5402 0.5378
8 0.2692 0.3073 0.3566 0.4082 0.4646 0.5105 0.5402 1.0000 0.5632
9 0.2114 0.2541 0.3103 0.3705 0.4385 0.4968 0.5378 0.5632 1.0000
10 0.1822 0.2284 0.2887 0.3533 0.4266 0.4902 0.5360 0.5652 0.5813
11 0.1834 0.2326 0.2945 0.3594 0.4317 0.4936 0.5374 0.5649 0.5799
12 0.2230 0.2748 0.3349 0.3944 0.4573 0.5076 0.5402 0.5583 0.5670

Estimated V Correlation
Matrix for Subject 13

Row Col10 Col11 Col12

1 0.1822 0.1834 0.2230


2 0.2284 0.2326 0.2748
3 0.2887 0.2945 0.3349
4 0.3533 0.3594 0.3944
5 0.4266 0.4317 0.4573
6 0.4902 0.4936 0.5076
7 0.5360 0.5374 0.5402
8 0.5652 0.5649 0.5583
9 0.5813 0.5799 0.5670
10 1.0000 0.5859 0.5712
11 0.5859 1.0000 0.5741
12 0.5712 0.5741 1.0000

The Estimated V Correlation Matrix table shows the correlations among the measurements (in this case,
subject 13). Since the Z matrix has the time values, the correlations estimated from the V matrix are based
on the variances and covariances of the random effects along with the time values of the measurements.
The R matrix in this case has an independent covariance structure. Notice the correlations do not have
to decrease as the time interval increases. This was not the case in the model with the spatial power
covariance structure and this illustrates the flexibility of the random coefficient model.

 If intercept is the only random effect, then the correlations would be equal across the
measurements within time group (a compound symmetry covariance structure).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-98 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Covariance Parameter Estimates

Standard Z
Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) id 7.1562 0.6927 10.33 <.0001


UN(2,1) id -1.0342 0.2428 -4.26 <.0001
UN(2,2) id 0.8308 0.1507 5.51 <.0001
UN(3,1) id -0.2397 0.09042 -2.65 0.0080
UN(3,2) id 0.09795 0.03897 2.51 0.0120
UN(3,3) id 0.02921 0.01680 1.74 0.0411
UN(4,1) id 0.07635 0.02241 3.41 0.0007
UN(4,2) id -0.04079 0.01253 -3.26 0.0011
UN(4,3) id -0.00973 0.004414 -2.20 0.0276
UN(4,4) id 0.003274 0.001367 2.40 0.0083
Residual 4.9781 0.1831 27.18 <.0001

A total of 11 covariance parameters are estimated in this model. The values correspond to the values
in the G matrix. The COVTEST option displays the standard error, Z value, and p-value. The results
show that the variances and covariances of the random effects are significantly different from 0.
The residual value of 4.9781 corresponds to the variance estimate in the R matrix. The inferences are
unreliable for small sample sizes. With 369 subjects, the asymptotic results should be valid.
The recommended sample size to meet the asymptotic requirement is 400 or more.
Fit Statistics

-2 Res Log Likelihood 11704.0


AIC (Smaller is Better) 11726.0
AICC (Smaller is Better) 11726.1
BIC (Smaller is Better) 11769.0

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

10 964.05 <.0001

The Null Likelihood Ratio Test compares the fitted model to a model with a V independent covariance
structure (in this case, one without a RANDOM statement).
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 8.1186 0.2469 859 32.88 <.0001


time -1.1656 0.1072 523 -10.87 <.0001
age 0.01396 0.01951 327 0.72 0.4750
cigarettes 0.3640 0.07538 1001 4.83 <.0001
drug 0.1833 0.2063 1869 0.89 0.3744
partners 0.05915 0.02295 1940 2.58 0.0100
depression -0.02706 0.008799 2004 -3.08 0.0021
time*age -0.01401 0.006359 237 -2.20 0.0285
time*depression 0.002205 0.003901 688 0.57 0.5720
time*drug 0.007280 0.08934 699 0.08 0.9351
time*partners -0.01518 0.01063 715 -1.43 0.1539
time*cigarettes -0.1182 0.03107 506 -3.80 0.0002
time*time -0.1770 0.02806 186 -6.31 <.0001
time*time*time 0.06031 0.007156 146 8.43 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-99

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 523 118.12 <.0001


age 1 327 0.51 0.4750
cigarettes 1 1001 23.32 <.0001
drug 1 1869 0.79 0.3744
partners 1 1940 6.64 0.0100
depression 1 2004 9.46 0.0021
time*age 1 237 4.85 0.0285
time*depression 1 688 0.32 0.5720
time*drug 1 699 0.01 0.9351
time*partners 1 715 2.04 0.1539
time*cigarettes 1 506 14.46 0.0002
time*time 1 186 39.79 <.0001
time*time*time 1 146 71.03 <.0001

The inferences from the random coefficients model are very similar to the repeated effects model.
The time*age, time*cigarettes, quadratic effect and cubic effect of time are significant.
Example: To compare the random coefficient model with the last repeated effects model, fit a random
coefficients model without the time*drug, time*partners, and time*depression interactions
and with the GROUP= option.
proc mixed data=aids;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time /
solution ddfm=kr;
random intercept time time*time time*time*time / type=un subject=id
group=timegroup g gcorr v=13 vcorr=13;
title 'Random Coefficients Final Model';
run;

Random Coefficients Final Model

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structure Unstructured
Subject Effect id
Group Effect timegroup
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-100 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

timegroup 2 1 2

Dimensions

Covariance Parameters 21
Columns in X 11
Columns in Z per Subject 8
Subjects 369
Max Obs per Subject 12

Number of Observations

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 12650.28067427
1 2 11621.69179982 10525.543915
2 1 11615.72160553 0.00765038
3 1 11594.68913868 0.00372117
4 1 11577.99517787 0.00112286
5 1 11573.11206038 0.00019219
6 1 11572.33492058 0.00000888
7 1 11572.30167262 0.00000003
8 1 11572.30155569 0.00000000

Convergence criteria met.

Estimated G Matrix

Row Effect timegroup Subject Col1 Col2 Col3 Col4 Col5

1 Intercept 1 1 37.2267 77.9624 59.0335 12.8130


2 time 1 1 77.9624 246.90 196.87 43.1341
3 time*time 1 1 59.0335 196.87 165.14 37.3923
4 time*time*time 1 1 12.8130 43.1341 37.3923 8.6589
5 Intercept 2 1 11.5272
6 time 2 1 -9.1612
7 time*time 2 1 3.4119
8 time*time*time 2 1 -0.3790

Estimated G Matrix

Row Col6 Col7 Col8

1
2
3
4
5 -9.1612 3.4119 -0.3790
6 13.0029 -4.7445 0.5156
7 -4.7445 1.7910 -0.1968

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-101

8 0.5156 -0.1968 0.02173

The Estimated G Matrix shows the estimated variances and covariances of the random effects by time
groups. For example, 37.2267 (row 1, column 1) is the variance of the intercepts in time group 1 and
11.5272 (row 5, column 5) is the variance of the intercepts in time group 2. The value 246.90 (row 2,
column 2) is the variance of the linear slopes of time in time group 1 and 13.0029 (row 6, column 6)
is the variance of the linear slopes of time in time group 2. The value 77.9624 (row 1, column 2) is the
covariance of the intercepts and the linear slopes of time in time group 1 and -9.1612 (row 5, column 6)
is the covariance of the intercepts and the linear slopes of time in time group 2.
Estimated G Correlation Matrix

Row Effect timegroup Subject Col1 Col2 Col3 Col4 Col5

1 Intercept 1 1 1.0000 0.8132 0.7529 0.7137


2 time 1 1 0.8132 1.0000 0.9750 0.9329
3 time*time 1 1 0.7529 0.9750 1.0000 0.9888
4 time*time*time 1 1 0.7137 0.9329 0.9888 1.0000
5 Intercept 2 1 1.0000
6 time 2 1 -0.7483
7 time*time 2 1 0.7509
8 time*time*time 2 1 -0.7572

Estimated G Correlation Matrix

Row Col6 Col7 Col8

1
2
3
4
5 -0.7483 0.7509 -0.7572
6 1.0000 -0.9831 0.9700
7 -0.9831 1.0000 -0.9976
8 0.9700 -0.9976 1.0000

The Estimated G Correlation Matrix table shows the correlations between random effects by time groups.
The correlation between the intercepts and the linear slopes of time is 0.8132 (row 1, column 2) in time
group 1 and -0.7483 (row 5, column 6) in time group 2. This is an artifact of how time was coded. In time
group 1, the intercept is the last time point (time points are negative and the intercept is at time 0).
Therefore, negative slope coefficients lead to smaller intercepts. However, in time group 2, the intercept
is the first time point. Thus, negative slope coefficients lead to larger intercepts.
Estimated V Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 15.2504 11.1374 8.6274


2 11.1374 15.1591 9.4204
3 8.6274 9.4204 18.9172
4 11.3069 5.9626 4.7703 4.3101 4.3199 4.6196
5 5.9626 8.5403 5.1209 5.0603 5.1282 5.2643
6 4.7703 5.1209 8.5408 5.5883 5.7265 5.8110
7 4.3101 5.0603 5.5883 9.0802 6.1443 6.2553
8 4.3199 5.1282 5.7265 6.1443 9.5835 6.6234
9 4.6196 5.2643 5.8110 6.2553 6.6234 10.0702
10 4.9821 5.3898 5.8390 6.2852 6.7203 7.1265

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-102 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

11 5.2123 5.4426 5.8197 6.2690 6.7567 7.2475


12 5.1246 5.3558 5.7516 6.2312 6.7578 7.2936

Estimated V Matrix for Subject 13

Row Col10 Col11 Col12

1
2
3
4 4.9821 5.2123 5.1246
5 5.3898 5.4426 5.3558
6 5.8390 5.8197 5.7516
7 6.2852 6.2690 6.2312
8 6.7203 6.7567 6.7578
9 7.1265 7.2475 7.2936
10 10.5925 7.6528 7.7427
11 7.6528 11.0814 8.0615
12 7.7427 8.0615 11.3736

The Estimated V Matrix table shows the variances and covariances among the measurements by time
groups (in this case, for subject 13). For example, the variance of the first measurement in time group 1
is 15.2504 (row 1, column 1) and the covariance of the first and second measurements in time group 1
is 11.1374 (row 1, column 2). The variance of the first measurement in time group 2 is 11.3064 (row 4,
column 4) and the covariance of the first and second measurements in time group 2 is 5.9626 (row 4,
column 5). The model assumes that the measurements in time group 1 are independent of the
measurements in time group 2.
Estimated V Correlation Matrix for Subject 13

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9

1 1.0000 0.7325 0.5079


2 0.7325 1.0000 0.5563
3 0.5079 0.5563 1.0000
4 1.0000 0.6068 0.4854 0.4254 0.4150 0.4329
5 0.6068 1.0000 0.5996 0.5746 0.5669 0.5677
6 0.4854 0.5996 1.0000 0.6346 0.6330 0.6266
7 0.4254 0.5746 0.6346 1.0000 0.6587 0.6542
8 0.4150 0.5669 0.6330 0.6587 1.0000 0.6742
9 0.4329 0.5677 0.6266 0.6542 0.6742 1.0000
10 0.4552 0.5667 0.6139 0.6409 0.6670 0.6900
11 0.4657 0.5595 0.5982 0.6250 0.6557 0.6861
12 0.4519 0.5434 0.5836 0.6132 0.6473 0.6815

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-103

Estimated V Correlation
Matrix for Subject 13

Row Col10 Col11 Col12

1
2
3
4 0.4552 0.4657 0.4519
5 0.5667 0.5595 0.5434
6 0.6139 0.5982 0.5836
7 0.6409 0.6250 0.6132
8 0.6670 0.6557 0.6473
9 0.6900 0.6861 0.6815
10 1.0000 0.7064 0.7054
11 0.7064 1.0000 0.7181
12 0.7054 0.7181 1.0000

The Estimated V Correlation Matrix table shows the correlations among the measurements by time
groups (in this case, for subject 13). For example, the correlation between the first and second
measurements in time group 1 is 0.7325 (row 1, column 2) and the correlation between the first
and second measurements in time group 2 is 0.6068 (row 4, column 5). The correlation between
the measurements in time group 1 and time group 2 is 0.
Covariance Parameter Estimates

Cov Parm Subject Group Estimate

UN(1,1) id timegroup 1 37.2267


UN(2,1) id timegroup 1 77.9624
UN(2,2) id timegroup 1 246.90
UN(3,1) id timegroup 1 59.0335
UN(3,2) id timegroup 1 196.87
UN(3,3) id timegroup 1 165.14
UN(4,1) id timegroup 1 12.8130
UN(4,2) id timegroup 1 43.1341
UN(4,3) id timegroup 1 37.3923
UN(4,4) id timegroup 1 8.6589
UN(1,1) id timegroup 2 11.5272
UN(2,1) id timegroup 2 -9.1612
UN(2,2) id timegroup 2 13.0029
UN(3,1) id timegroup 2 3.4119
UN(3,2) id timegroup 2 -4.7445
UN(3,3) id timegroup 2 1.7910
UN(4,1) id timegroup 2 -0.3790
UN(4,2) id timegroup 2 0.5156
UN(4,3) id timegroup 2 -0.1968
UN(4,4) id timegroup 2 0.02173
Residual 3.1488

Fit Statistics

-2 Res Log Likelihood 11572.3


AIC (Smaller is Better) 11614.3
AICC (Smaller is Better) 11614.7
BIC (Smaller is Better) 11696.4

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-104 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

20 1077.98 <.0001

The AIC information criterion (11614.3) is very close to the AIC of the repeated-effects model (11611.2).
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.4522 0.2008 1360 37.11 <.0001


time -0.8090 0.08273 363 -9.78 <.0001
age 0.01739 0.01562 592 1.11 0.2663
cigarettes 0.4763 0.06558 1168 7.26 <.0001
drug 0.4204 0.1682 2004 2.50 0.0125
partners 0.03617 0.02052 2065 1.76 0.0781
depression -0.01847 0.007342 2002 -2.52 0.0119
time*age -0.01311 0.006324 241 -2.07 0.0393
time*cigarettes -0.1098 0.03013 584 -3.64 0.0003
time*time -0.1216 0.02755 175 -4.41 <.0001
time*time*time 0.03639 0.006627 133 5.49 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 363 95.63 <.0001


age 1 592 1.24 0.2663
cigarettes 1 1168 52.76 <.0001
drug 1 2004 6.25 0.0125
partners 1 2065 3.11 0.0781
depression 1 2002 6.33 0.0119
time*age 1 241 4.30 0.0393
time*cigarettes 1 584 13.27 0.0003
time*time 1 175 19.48 <.0001
time*time*time 1 133 30.16 <.0001

The inferences in the random coefficients model are very similar to the inference in the repeated-effects
model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-105

2.13 Multiple Choice Poll

When is the V matrix the same in the random coefficient model and a model
with the REPEATED statement and several time points?
a. Random coefficient model has a random intercept and slope, and the
repeated model has spatial power covariance structure.
b. Random coefficient model has a random intercept and slope, and the
repeated model has compound symmetry covariance structure.
c. Random coefficient model has only a random intercept, and the
repeated model has compound symmetry covariance structure.
d. Random coefficient model has only a random intercept, and the
repeated model has spatial power covariance structure.

101
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Empirical Best Linear Unbiased Predictions

• EBLUPs are predictions that take into account the residual variability and
between-subject variability.
• If the within-subject variability is large in comparison to between-subject
variability for an individual profile, then the response values are unreliable
and the predictions move toward the population mean.
• If the within-subject variability is small in comparison to between-subject
variability for an individual profile, then the response values are reliable and
the predictions move toward the observed data.
• This feature is useful for forecasting time series.

103
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

One objective in the AIDS study is to estimate the time course of CD4+ cell depletion for individual
subjects. However, the individual profile plots showed that the observed CD4+ levels are highly variable
over time. Part of the reason might be due to the large residual variability error component. Therefore,
estimating individual profiles without taking account of the error associated with residual variability
in CD4+ cell determinations might be unreliable.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-106 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

In PROC MIXED, you can compute predicted response values that are empirical best linear unbiased
predictions (EBLUPs). These predictions can be interpreted as a weighted mean of the population average
profile and the observed data profile. The general formula is

Yˆi  R i Vi-1i ˆ  ( ni  R i Vi-1 ) yi


1
Notice that the numerator of R i Vi is the residual covariance matrix and the denominator is the overall
covariance matrix. Therefore, if the residual variability is large in comparison to the between-subject
variability, more weight is given to the overall average profile compared to the observed data. However,
if the residual variability is small in comparison to the between-subject variability, more weight is given
to the observed data profile (Verbeke and Molenberghs 2000).

 EBLUPs are also called empirical Bayes estimators.

Empirical Best Linear Unbiased Predictions

PROC MIXED computes EBLUPs for the response variable in two ways:
• Using the RANDOM statement with the OUTP= option in the MODEL
statement.
• Using the REPEATED statement with the OUTP= option in the MODEL
statement and the SUBJECT= option in the REPEATED statement. Only
observations with missing response values will have EBLUPs.

104
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC MIXED computes EBLUPs for the response variable in two ways. When you use the RANDOM
statement with the OUTP= option in the MODEL statement, the predicted values from the original data
ˆ
are   ˆ . Predicted values for data points other than those observed can be obtained by using missing
dependent variables in your input data set.

Another way to compute EBLUPs for the response variable is to use the OUTP= option in the MODEL
statement with the REPEATED statement with the SUBJECT= option. Simply concatenate the original
data with the observations with missing response variable values. The predictions for these observations
are EBLUPs. However, if the new observation is independent of the data used in fitting the model (the
subject has no previous observations), then the EBLUP equals ̂ .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-107

The standard errors for EBLUPs with the REPEATED statement are larger than those for the RANDOM
statement (unless you use the NEWOBS option in the MODEL statement when you have a RANDOM
statement). The reason for this discrepancy is not that one is more accurate than the other. If you think
of an observation as Y = Signal + Noise (with noise representing measurement error), the RANDOM
statement predicts the signal (unless you use the NEWOBS option) while the REPEATED statement
predicts the sum of signal and noise. The signal is predicted with greater precision than the sum of signal
and noise.
EBLUPs are not only useful for forecasting time series, but also in generating predictions based
on changes in the covariate patterns. For example, you can generate predictions on CD4+ cell counts
based on the changes in cigarette consumption.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-108 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Computing EBLUPs

Example: Compute EBLUPs and Xbetas with the random coefficients model with the cubic effect
of time. Forecast the CD4+ cell count for subject 10145 at time 5.30 and graph the individual
profile of subject 10145 along with the EBLUPs and Xbetas.
/* long02d08.sas */
data aids1;
input time age cigarettes drug partners depression id
timegroup;
datalines;
5.30 4.4 0 1 -3 -7 10145 2
;
run;
The covariate values for subject 10145 were held fixed from their last observation period and timegroup
is set at 2.
data forecast;
set aids aids1;
run;

ods select none;


proc mixed data=forecast;
class timegroup id;
model cd4_scale=time age cigarettes drug partners
depression time*age time*cigarettes time*time
time*time*time / ddfm=res
outp=predblup(rename=(pred=eblup))
outpm=predxbeta(rename=(pred=xbeta));;
random intercept time time*time time*time*time /
type=un subject=id group=timegroup;
run;
Selected MODEL statement options:
OUTP= specifies an output data set with EBLUPs.
OUTPM= specifies an output data set with Xbetas.

 The degrees of freedom calculations are based on the residual method (DDFM=RES) to save on
computing time.
data predict;
merge predblup predxbeta;
run;

ods select all;


options nolabel;
proc sgplot data=predict;
series y=cd4_scale x=time / markers;
series y=eblup x=time / markers;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-109

series y=xbeta x=time / markers;


where id=10145;
yaxis label="Predicted CD4+ Cell Counts in hundreds";
xaxis label="Time since Seroconversion";
title 'Subject 10145 Response Profile';
title2 h=0.8 'with XBetas, Data Values, and EBLUPs';
title3 h=0.7 'Generated from Random Coefficients Model';
run;

The EBLUPs follow the data values before seroconversion indicating that the between-subject variability
is much greater than the within-subject variability. However, the EBLUPs follow the Xbetas after
seroconversion indicating that the between-subject variability is much smaller than the within-subject
variability. The EBLUP at time 5.3 seems to be very close to the Xbeta at 5.3.
Example: Compute EBLUPs and Xbetas with the model with the REPEATED statement that had
heterogeneity and spatial power covariance structure. Forecast the CD4+ cell count for subject
10145 at time 5.30 and graph the individual profile of subject 10145 along with the EBLUPs
and Xbetas.
ods select none;
proc mixed data=forecast;
class timegroup id;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-110 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

model cd4_scale=time age cigarettes drug partners


depression time*age time*cigarettes
time*time time*time*time / ddfm=res
outp=predblup(rename=(pred=eblup))
outpm=predxbeta(rename=(pred=xbeta));
repeated / type=sp(pow)(time) local subject=id
group=timegroup;
run;
Selected MODEL statement option:
OUTP= specifies an output data set containing predicted values and related quantities.
Specifications that have a REPEATED statement with the SUBJECT= option and
missing response variables compute predicted values using EBLUPs.
data predict;
merge predblup predxbeta;
run;

ods select all;


proc sgplot data=predict;
series y=cd4_scale x=time / markers;
series y=eblup x=time / markers;
series y=xbeta x=time / markers;
where id=10145;
yaxis label="Predicted CD4+ Cell Counts in hundreds";
xaxis label="Time since Seroconversion";
title 'Subject 10145 Response Profile';
title2 h=0.8 'with XBetas, Data Values, and EBLUPs';
title3 h=0.7 'Generated from Model with Repeated Statement';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-111

Recall that when using the OUTP= option with the SUBJECT= option in the REPEATED statement, the
EBLUPs are only computed for time points that have missing response values. Therefore, the EBLUP is
only computed for time point 5.3. The EBLUPs and Xbetas are different from the previous graph because
the V matrices for the two models are different. If the random coefficients model has the same V matrix
and the same mean model as the model with the REPEATED statement, then the EBLUPs at time 5.3
would be the same.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-112 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.14 Multiple Choice Poll

When computing EBLUPs for random coefficient models, if the within-subject


variability is large in comparison to the between-subject variability for an
individual profile, then which of the following is true?
a. The response values are unreliable and the predictions move toward the
population mean.
b. The response values are reliable and the predictions move toward the
observed data.
c. The predictions are the same as the population average because EBLUPs
do not take into account within-subject variability.
d. Only observations with missing response values will have EBLUPs.

106
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Models with Repeated and Random Effects

These models
• take into account random effects, serial correlation, and measurement
error
• enable the user to fit a large variety of covariance structures
• often have estimation and convergence problems
• are not generally recommended as a longitudinal model

108
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-113

You can also fit a model in PROC MIXED with both the RANDOM and REPEATED statements.
However, this model is generally not recommended in practice. Diggle, Heagerty, Liang, and Zeger
(2002) argue that, in applications, the effect of serial correlation is very often dominated by the
combination of random effects and measurement error. They recommend that no models simultaneously
include serial correlation as well as random effects other than intercepts. Verbeke and Molenberghs
(2000) also claim that models that include several random effects, serial correlation, and measurement
error will often have estimation problems.

Common Causes of Nonconvergence

• Two of the covariance parameters are several orders of magnitude apart.


• Data values are extremely large or extremely small.
• There is little variability in time effects.
• There is not enough data to estimate the specified covariance structure.
• Linear dependencies exist among parameters.
• There is a misspecified model or violation of model assumptions.

109
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Convergence problems in PROC MIXED arise from estimating the covariance parameters in the model,
not the fixed effects. For example, when the covariance parameters are on a different scale, the algorithm
in PROC MIXED might have trouble converging. Furthermore, if there is very little variability in the time
effects the variance of the random slopes might approach 0, which might generate numerical difficulties.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-114 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Recommendations to Deal with Nonconvergence

• Use the PARMS statement to specify initial values.


• Rescale the data to improve stability.
• Specify the SCORING= option to invoke Fisher’s scoring estimation method.
• Tune the MAXITER= and MAXFUNC= options in the PROC MIXED statement.
• Make sure no observations from the same subject are producing identical
rows in the R or V matrix.
• Reduce the number of terms in the model.

110
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When fitting complicated covariance structures, you often need to specify starting values (using the
PARMS statement) in order for PROC MIXED to converge. Requesting a grid search over several values
of these parameters is recommended. Sometimes it is useful to use the Fisher scoring method, which uses
the expected Hessian matrix, which consists of second derivatives of the objective function with respect
to the covariance parameters, instead of the observed one.

 Other recommendations can be found in the online SAS documentation in the Convergence
Problems section of PROC MIXED.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-115

Models with Random Effects and Serial Correlation

Example: Fit a longitudinal model with the RANDOM and REPEATED statements. Specify the group
effect and only the interactions that were significant in the random coefficients model.
/* long02d09.sas */
proc mixed data=aids covtest;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
random intercept time time*time time*time*time / type=un subject=id;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Random Effects and '
'Serial Correlation';
run;

Longitudinal Model with Random Effects and Serial Correlation

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structures Unstructured,
Spatial Power
Subject Effects id, id
Group Effect timegroup
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

The covariance structures are now an unstructured covariance structure and a spatial power covariance
structure.
Class Level Information

Class Levels Values

timegroup 2 1 2

Dimensions

Covariance Parameters 15
Columns in X 11
Columns in Z per Subject 4
Subjects 369
Max Obs per Subject 12

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-116 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Number of Observations

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 12650.28067427
1 2 12266.99621099 38594.655773
2 1 12189.83366851 74110.722854
3 1 12140.37942938 32264.055111
4 1 12095.41712550 642385.13078
5 1 12037.70974721 2649327.2638
6 1 11975.05255510 1356563.0471
7 3 11874.39052840 .
8 1 11825.73904305 .
9 1 11699.28700966 .
10 1 11594.57830896 .
11 3 11526.55157663 .
12 4 11511.03788261 .
13 3 11502.89876486 .
14 1 11496.21859975 .
15 1 11494.28586873 .
16 2 11493.40764086 0.00037097
17 2 11492.86168202 .
18 1 11491.90004455 0.00024427
19 2 11491.81327410 .
20 4 11490.43193681 .
21 2 11490.28644277 0.00000077
22 1 11490.28372598 0.00000000

Convergence criteria met.

Even with the complicated covariance structures, the model converged. However, the note in the log
indicates a potential problem.

Whenever the Log window shows the note that the estimated G matrix is not positive definite, you are
most likely to see a zero variance component estimate. Sometimes a zero variance component estimate
can indicate an inappropriate model, such as an over-parameterized model, and you might want to
respecify the model to make sure you are not accounting for the same variance in different parameters.
Covariance Parameter Estimates

Standard Z
Cov Parm Subject Group Estimate Error Value Pr Z

UN(1,1) id 5.7569 0.6924 8.31 <.0001


UN(2,1) id -0.3687 0.2466 -1.50 0.1348
UN(2,2) id 0.4004 0.1692 2.37 0.0090
UN(3,1) id -0.1603 0.08312 -1.93 0.0537

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-117

UN(3,2) id 0.08452 0.03535 2.39 0.0168


UN(3,3) id 0 . . .
UN(4,1) id 0.03732 0.01723 2.17 0.0303
UN(4,2) id -0.02035 0.01032 -1.97 0.0486
UN(4,3) id -0.00358 0.003668 -0.98 0.3287
UN(4,4) id 0.001305 0.001048 1.25 0.1065
Variance id timegroup 1 8.1452 1.0360 7.86 <.0001
SP(POW) id timegroup 1 0.2262 0.08215 2.75 0.0059
Variance id timegroup 2 1.1327 0.6201 1.83 0.0339
SP(POW) id timegroup 2 0.4702 0.3246 1.45 0.1475
Residual 2.6566 0.3407 7.80 <.0001

The variances for the intercepts and the linear effect of time are significant. However, the variance
estimate for the quadratic effect of time is 0. When you have a variance parameter estimate of 0, one
recommendation is to drop that random effect from the model. Because the quadratic effect of time will
be dropped, then the cubic effect of time should also be dropped.
Fit Statistics

-2 Res Log Likelihood 11490.3


AIC (Smaller is Better) 11518.3
AICC (Smaller is Better) 11518.5
BIC (Smaller is Better) 11573.0

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

13 1160.00 <.0001

The AIC information criterion is the lowest of any model thus far. The random coefficients model had
an AIC value of 11614.3.
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.7053 0.2259 805 34.11 <.0001


time -1.0408 0.08028 357 -12.96 <.0001
age 0.01409 0.01895 316 0.74 0.4578
cigarettes 0.3505 0.07400 864 4.74 <.0001
drug 0.2686 0.1699 2011 1.58 0.1140
partners 0.04775 0.02055 2056 2.32 0.0202
depression -0.01715 0.007419 2013 -2.31 0.0209
time*age -0.01247 0.006231 233 -2.00 0.0465
time*cigarettes -0.1049 0.03024 518 -3.47 0.0006
time*time -0.09233 0.02818 381 -3.28 0.0011
time*time*time 0.03767 0.006723 186 5.60 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 357 168.07 <.0001


age 1 316 0.55 0.4578
cigarettes 1 864 22.43 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-118 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

drug 1 2011 2.50 0.1140


partners 1 2056 5.40 0.0202
depression 1 2013 5.34 0.0209
time*age 1 233 4.01 0.0465
time*cigarettes 1 518 12.04 0.0006
time*time 1 381 10.74 0.0011
time*time*time 1 186 31.39 <.0001

The inferences for the fixed effects are similar to the random coefficients model.
Example: Refit the longitudinal model without the quadratic and cubic effects of time in the RANDOM
statement.
proc mixed data=aids covtest;
class timegroup;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time / solution
ddfm=kr(firstorder);
random intercept time / type=un subject=id;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title 'Longitudinal Model with Random Effects and '
'Serial Correlation';
run;

Longitudinal Model with Random Effects and Serial Correlation

The Mixed Procedure

Model Information

Data Set WORK.AIDS


Dependent Variable cd4_scale
Covariance Structures Unstructured,
Spatial Power
Subject Effects id, id
Group Effect timegroup
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

timegroup 2 1 2

Dimensions

Covariance Parameters 8
Columns in X 11
Columns in Z per Subject 2
Subjects 369
Max Obs per Subject 12

The Z matrix has only two columns in this model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-119

Number of Observations

Number of Observations Read 2376


Number of Observations Used 2376
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 12650.28067427
1 4 11860.81110568 196.02454031
2 1 11688.87754208 42.26586415
3 1 11566.25877989 .
4 2 11529.61932012 5.55286390
5 2 11519.58037978 2.07258899
6 2 11509.30705045 0.17852318
7 2 11505.84039590 0.00048014
8 2 11504.19719227 0.00023349
9 2 11503.52177894 0.00001620
10 1 11503.46258346 0.00000009
11 1 11503.46226161 0.00000000

Convergence criteria met.

Covariance Parameter Estimates

Standard Z
Cov Parm Subject Group Estimate Error Value Pr Z

UN(1,1) id 5.1340 0.6045 8.49 <.0001


UN(2,1) id -0.1736 0.1366 -1.27 0.2036
UN(2,2) id 0.2149 0.06590 3.26 0.0006
Variance id timegroup 1 7.9399 0.8688 9.14 <.0001
SP(POW) id timegroup 1 0.1971 0.06965 2.83 0.0047
Variance id timegroup 2 1.5141 0.5168 2.93 0.0017
SP(POW) id timegroup 2 0.5059 0.2173 2.33 0.0199
Residual 2.6003 0.3190 8.15 <.0001

The results of the Covariance Parameter Estimates table show that the variance of the intercepts is
significant. This indicates that there is significant variation of the intercepts between subjects. The
variance of the linear effect of time is also significant. This indicates that there is significant variation in
the slopes of time between subjects. However, the covariance of the intercepts and the linear effect of
time is not significant. Therefore, the subject’s CD4+ cell count depletion over time is not affected by the
subject’s CD4+ cell count at seroconversion.

The results also show that the variances of the residuals in the R matrix for time group 1 and time group 2
are significant. The variances appear to be different from each other across time groups. Furthermore, for
equally spaced time intervals, the correlation among measurements in time group 1 is much smaller than
the correlation among measurements in time group 2.
Fit Statistics

-2 Res Log Likelihood 11503.5


AIC (Smaller is Better) 11519.5
AICC (Smaller is Better) 11519.5
BIC (Smaller is Better) 11550.7

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-120 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

7 1146.82 <.0001

The AIC information criterion is very close to the model with the random effects of time*time and
time*time*time (11518.3). However, the BIC information criterion is lower (11550.7 versus 11573.0)
because the model with the four random effects had seven more covariance parameter estimates (15 to 8).
Solution for Fixed Effects

Standard
Effect Estimate Error DF t Value Pr > |t|

Intercept 7.7302 0.2223 875 34.78 <.0001


time -1.0433 0.07757 664 -13.45 <.0001
age 0.01524 0.01902 338 0.80 0.4236
cigarettes 0.3562 0.07391 895 4.82 <.0001
drug 0.2702 0.1697 2038 1.59 0.1115
partners 0.04505 0.02050 2099 2.20 0.0281
depression -0.01811 0.007396 2079 -2.45 0.0144
time*age -0.01326 0.006234 237 -2.13 0.0344
time*cigarettes -0.1081 0.03009 550 -3.59 0.0004
time*time -0.08501 0.02921 1005 -2.91 0.0037
time*time*time 0.03698 0.006630 941 5.58 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 664 180.88 <.0001


age 1 338 0.64 0.4236
cigarettes 1 895 23.22 <.0001
drug 1 2038 2.54 0.1115
partners 1 2099 4.83 0.0281
depression 1 2079 6.00 0.0144
time*age 1 237 4.53 0.0344
time*cigarettes 1 550 12.92 0.0004
time*time 1 1005 8.47 0.0037
time*time*time 1 941 31.12 <.0001

The inferences of the fixed effects are similar to the model with the four random effects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.4 Random Coefficient Models 2-121

Summary of Random Coefficient Models

• Random coefficient models are an alternative way to model longitudinal


data.
• Random coefficient models provide subject-specific parameter estimates.
• In the demonstration, the model fit statistics are similar for the random
coefficient model and the model with only the REPEATED statement.
• The model with both the RANDOM and REPEATED statements had the best
fit.
• The unstructured and spatial power covariance structures are selected with
intercept and time as the random effects.

112
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In conclusion, the random coefficient models might be useful to fit longitudinal models, especially when
there is a large error component due to random effects. The model still enables the correlations within
subject to change over time. However, the correlations are estimated using the variances and covariances
of the random effects along with the time values for the subjects.

The final CD4+ cell count model has both a RANDOM and REPEATED statement with an unstructured
G matrix and a spatial power covariance structure for the R matrix. The variances of the intercepts and
linear effects of time were significantly different from 0. This means that the CD4+ cell count values
at seroconversion vary across subjects and the depletion of CD4+ cell counts over time vary across
subjects. Heterogeneity in the R matrix is also evident in the model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-122 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Exercises

4. Fitting Random Coefficient Models


a. Fit a random coefficient model with a random intercept and hours. Specify the fixed effects
as hours, drug, and baseline. Use an unstructured covariance structure and print out the G
matrix, the correlation matrix based on the V matrix, the parameter estimates for the fixed effects,
and the parameter estimates for the random effects. Use the Kenward-Roger method for
computing degrees of freedom.
1) Interpret the G matrix. What conclusions can you reach regarding your random effects?

2) What does the residual covariance parameter estimate represent?


3) How does the AICC value compare to the reduced model fit in the last exercise?
4) How does the correlation matrix based on the V matrix compare to the correlation matrix
based on the R matrix for the reduced model in the last exercise?

5) Interpret the parameter estimates for the random effects for Subject 1.
b. Fit a model with both the REPEATED and RANDOM statements. Specify a random intercept
and hours, and use the unstructured covariance structure. Print the G matrix, the correlation
matrix based on the V matrix, and the parameter estimates for the fixed effects. Specify the
spatial exponential covariance structure for the R matrix, add a measurement error component,
and use the FIRSTORDER suboption.

1) Interpret the covariance parameter estimates.


2) Did the correlations based on the V matrix change?
3) Is this a better model than the random coefficients model?
4) Did the inferences from the fixed effects change from the random coefficients model?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-123

2.15 Multiple Choice Poll

What covariance structure does the R matrix have in the first random
coefficient model?
a. Unstructured
b. Independent
c. Compound symmetry
d. Spatial power

114
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.5 Model Assessment

Objectives

• Explain the linear mixed model residual and influence diagnostic statistics.
• Examine how the violation of assumptions regarding the random effects
influences the inference of the model.
• Create residual and influence diagnostic plots.

118
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-124 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Mixed Model Assessment

The following are common questions that deal with mixed model
assessment.
• Are the model assumptions validated?
• Is the covariation of the observations modeled properly?
• Are the results sensitive to specific data points and clusters?

119
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In ordinary least squares regression models, model assessment usually revolves around residual analysis,
overall measures of goodness-of-fit, and influence analysis. Model assessment is especially important
in linear mixed models because likelihood-based estimation methods are particularly sensitive to unusual
observations. After you detect these observations, you should examine them and determine whether they
are erroneous. If these observations are legitimate, then they might represent important new findings.
They also might indicate that your current model is inadequate.

Mixed Model Diagnostics

• Standard residual and influence diagnostics for linear models can now be
extended to linear mixed models.
• Diagnostics in linear mixed models are more complicated by the fact that
the estimates of the fixed effects depend on the estimates of the
covariance parameters.
• With longitudinal data, it is usually more important to measure the
influence of a set of observations on the analysis, not just the influence of
individual observations.

120
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-125

The differences between the influence and residual analysis in the ordinary least squares models
and the linear mixed model come from the fact that the estimates of the fixed effects and the predictions
of the random effects depend on the estimates of the covariance parameters. If there are no random effects
and the model uses an independent covariance structure, then the general linear mixed model reduces
to the ordinary least squares model and the residual and influence measures are well known.

Influence Diagnostics

Removing observations in linear mixed models can affect the following:


• the covariance parameters and their precision
• the fixed effects and their precision
• both covariance parameters and fixed effects.
To gauge the full impact of a set of observations on the analysis, covariance
parameters need to be updated, which requires refitting the model.

121
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The fixed effects are affected when you remove observations because of the change in covariance
parameters and the change in the regressor space.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-126 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Types of Residuals

Models with random effects can produce two types of residuals:


• A marginal residual is the difference between the observed data and the
estimated marginal mean

rmi  yi  i ˆ
• A conditional residual is the difference between the observed data and the
predicted value of the observation

rci  yi  i ˆ  iˆ

122
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Conditional residuals are subject-specific residuals that are useful in detecting outlying subjects
and in determining whether the random effects are selected properly. If you choose the right random
effects, the conditional residuals should be small. For example, if you choose a random intercept but you
should have a random slope in the model, the subject-specific residuals show the model misspecification.

Marginal residuals are population-averaged residuals that are helpful in diagnosing whether the fixed
effect part of the model is selected properly. They are also helpful in diagnosing the fit of the model
averaged across all subjects. For example, if you were to predict the response of the next subject in your
study, the only way of measuring the quality of the prediction is by using the marginal residuals.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-127

Types of Residuals

• Studentized residuals are computed by dividing a residual by an estimate of


its standard deviation.
• Internally studentized residuals uses all the observations in the standard
error computation.
• Externally studentized residuals excludes the removed observation when
computing the standard error.
• Pearson-type residuals are computed by dividing a residual by the
estimated standard deviation of the response variable.
• Scaled residuals are computed by multiplying the marginal residuals by the
inverse Cholesky root of the marginal variance-covariance matrix.

123
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The raw residuals are usually not well suited to examine model assumptions and to detect outliers
and influential observations. For example, if the variances of the observations differ, then a data point
with a smaller raw residual and the smaller variance might be more troublesome than a data point with
a large residual and the larger variance. To account for the unequal variance of the residuals, various
studentizations are applied (Schabenberger 2004).

Residual Analysis

• Studentized residuals and the Pearson residuals are useful for detecting
potential outliers.
• Scaled residuals are useful for evaluating the appropriateness of the
covariance structure of your model.

124
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-128 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

A common recommendation when detecting unusual observations is to use externally studentized


residuals with a benchmark value of plus or minus 2. Examination of the scaled residuals is also helpful in
diagnosing departures from normality (Schabenberger 2004).

Influence Diagnostics

The INFLUENCE option in the MODEL statement does the following:


1. Fits the model to the data and obtains estimates of all parameters
2. Removes one or more data points from the analysis and computes
updated estimates of model parameters
3. Contrasts quantities of interest to determine how the absence of the
observations changes the analysis, based on full- and reduced-data
estimates

125
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The basic procedure for quantifying influence is shown above. It is important to note that influence
analyses are performed under the assumption that the chosen model is correct. Changing the model
structure can alter the conclusions (Schabenberger 2004).

The Nature of the Influence


The Observation is Influential on Statistics
the overall objective function Likelihood distance

DFFITS and
the fitted and predicted values
PRESS residuals
Cook’s D or
the estimates
Multivariate DFFITS
fixed effects
COVTRACE or
the precision
COVRATIO
Cook’s D or
the estimates
covariance Multivariate DFFITS
parameters COVTRACE or
the precision
COVRATIO
126
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-129

An overall influence statistic measures the change in the objective function being minimized. In ordinary
least squares regression, the residual sums of squares serves that purpose. In linear mixed models fit
by maximum likelihood or restricted maximum likelihood, an overall influence measure is the likelihood
distance. This statistic gives the amount by which the log-likelihood of the full data changes if one were
to evaluate it at the reduced-data estimates.

The PRESS residual is the difference between the observed value and the predicted marginal mean, where
the predicted value is obtained without the observations in question. The sum of the PRESS residuals is
the PRESS statistic. The DFFITS statistic is the change in predicted values due to removal
of a single data point standardized by the externally estimated standard error of the predicted value
in the full data.
The primary difference between Cook’s D and Multivariate DFFITS (MDFFITS) is that MDFFITS uses
an externalized estimate of the variance of the parameter estimates while Cook’s D does not. For both
statistics, you are concerned about large values, indicating that the change in the parameter estimate is
large relative to the variability of the estimate.

The benchmarks of no influence for the COVTRACE and COVRATIO statistics are 0 for
the covariance trace and 1 for the covariance ratio. The variance matrix that is used in the computation
of COVTRACE and COVRATIO for covariance parameters is obtained from the inverse Hessian matrix.

Iterative and Noniterative Influence Analysis


Iterative influence analysis
• refits the model and iteratively re-estimates the covariance parameters when
the observations in questions are removed
• generally is a better approach but is computationally intensive.
Noniterative influence analysis
• relies on closed-form update formulas for the fixed effects without updating
the covariance parameters
• is computationally efficient and is the default analysis.

127
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Influence diagnostics are performed by noniterative or iterative methods. The noniterative diagnostics
rely on recomputation formulas under the assumption that covariance parameters or their ratios remain
fixed. With the possible exception of a profiled (factored out) residual variance, no covariance parameters
are updated. This is the default behavior because of its computational efficiency. However, the impact of
an observation on the overall analysis can be underestimated if its effect on covariance parameters is not
assessed. Toward this end, iterative methods can be applied to gauge the overall impact of observations
and to obtain influence diagnostics for the covariance parameter estimates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-130 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Assessing Normality of the Random Effects

Percent

0
Intercept Estimate
128
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When you use the SOLUTION option in the RANDOM statement, a table of the random effect parameter
estimates, which are deviations from the population parameter estimates, is produced. These estimates are
the empirical best linear unbiased predictors (EBLUPs). They can be interpreted as deviations from the
population average, which might be helpful for detecting subjects or groups of subjects that are having
a different time course. Furthermore, these estimates can be used in the prediction of subject-specific
profiles. If you use ODS and save the parameter estimates of the random effects to a SAS data set, you
can create histograms and scatter plots for diagnostic purposes.
The random effects for intercept represent the variability in subject-specific intercepts not explained
by the covariates included in the model. The distribution of the random effects is assumed to be normal.
You might be able to check this assumption by plotting the intercept parameter estimates.

However, both the residual error and the covariate structure play an important role in the shape
of the distribution of random effects. If the residual variability is large compared to the random effects
variability, then the observed distribution of the random effects might not reflect the true distributional
shape of the random effects. In fact, Verbeke and Molenberghs (2000) show that when the within-subject
variability is large in comparison to the between-subject variability, the histogram of random effect
parameter estimates shows less variability than is actually present in the population of random effects.
Therefore, these histograms might be misleading.

Verbeke and Molenberghs (2000) suggest that the nonnormality of the random effects can only
be detected by comparing the results obtained under the normality assumption with results obtained from
fitting a linear mixed model with relaxed distributional assumptions for the random effects. This will not
be a trivial task to accomplish. Therefore, what are the consequences of ignoring the normality
assumption of the random effects?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-131

Violation of Random Effect Assumptions

• Fixed effect parameter estimates and standard errors are robust with
respect to the misspecification of the random effects distribution.
• Violation of the normality assumption clearly affects the standard errors of
the random effects.
• Parameter estimates of the random effects are also affected by the
normality assumption.

129
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

If the model is correctly specified and the covariance structure is appropriate, then the violation
of the normality assumption of the random effects has little effect on the estimation of the fixed effect
parameter estimates and their standard errors. Verbeke and Lesaffre (1997) performed extensive
simulations comparing the corrected standard errors of the fixed effects (using an estimator that corrects
for possible nonnormality of the random effects) to the uncorrected standard errors of the fixed effects.
The results showed that the two standard errors were very similar regardless of the distribution. However,
when the normality assumption is violated, the corrected standard errors for the random effects are clearly
superior to the uncorrected ones. It is data dependent on whether the standard errors increase or decrease.

Therefore, if interest is only in the inference of the fixed effects, then valid inferences are obtained even
when the random effects are incorrectly assumed to be normally distributed. If interest is in the inference
of the random effects, then you should explore whether the assumed normal distribution is appropriate.
Verbeke and Molenberghs (2000) suggest that if you are interested in detecting subgroups in the random
effects population, then you should take as many measurements as possible, at the beginning and
at the end of the study to obtain maximal spread of the time points.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-132 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.16 Multiple Choice Poll

Which one of the following statements is true regarding mixed model


assessment?
a. A conditional residual is the difference between the observed data and
the estimated marginal mean.
b. Marginal residuals are useful in determining whether the random effects
are selected properly.
c. Violation of the normality assumption of the random effects will bias the
fixed effect parameter estimates and SEs.
d. Violation of the normality assumption of the random effects will bias the
random effect parameter estimates and SEs.

130
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-133

Model Assessment

Example: Fit the model with the REPEATED and RANDOM statements. The model includes the six
main effects, two interactions, and the two polynomial terms for time. Specify the spatial
power covariance structure and the group effect of timegroup in the REPEATED statement.
Specify time and intercept in the RANDOM statement along with the unstructured
covariance structure. Specify plots of the likelihood distances, the PRESS statistics, influence
statistics, and residuals (raw, student, Pearson, and scaled). Use iterative analysis with
the maximum number of iterations set to 5 and use the FIRSTORDER suboption. Identify
potentially influential subjects and observations.
/* long02d10.sas */
ods graphics / imagemap=on tipmax=2400;
ods output influence=influence;
proc mixed data=aids noclprint plots=(distance(useindex)
press(useindex) influenceestplot(useindex) residualpanel(box)
studentpanel(box) pearsonpanel(box) vcirypanel(box));
class timegroup id;
model cd4_scale=time age cigarettes drug partners depression
time*age time*cigarettes time*time time*time*time
/ solution ddfm=kr(firstorder) influence(effect=id iter=5) vciry;
random intercept time / type=un subject=id;
repeated / type=sp(pow)(time) local subject=id group=timegroup;
title "Longitudinal Model with Random Effects and Serial "
"Correlation";
run;
Select ODS GRAPHICS statement options:
IMAGEMAP=ON|OFF controls data tips and drill down generation. Data tips are pieces
of explanatory text that appear when you hold the mouse pointer
over the data portions of a graph contained in an HTML page.

TIPMAX=<n> specifies the maximum number of distinct mouse-over areas


allowed before data tips are disabled
Selected PROC MIXED statement options:
NOCLPRINT<=number> suppresses the display of the “Class Level Information” table
if you do not specify number. If you do specify number, only
levels with totals that are less than number are listed in the table.

PLOTS= requests that the MIXED procedure produce statistical graphics


via the Output Delivery System.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-134 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Selected PLOTS= suboptions:


DISTANCE<(option)> requests a plot of the likelihood or restricted likelihood distance.
When influence diagnostics are requested with set selection
according to an effect, the USEINDEX option enables you
to replace the formatted tick values on the horizontal axis with
integer indices of the effect levels in order to reduce the space
taken up by the horizontal plot axis.

INFLUENCEESTPLOT<(options)> requests panels of the fixed effect deletion estimates


in an influence analysis, provided that the INFLUENCE option
is specified in the MODEL statement.
RESIDUALPANEL<(options)> requests a panel of raw residuals. By default, the conditional
residuals are produced.
STUDENTPANEL<(options)> requests a panel of studentized residuals. By default,
the conditional residuals are produced.
PEARSONPANEL<(options)> requests a panel of Pearson residuals. By default, the conditional
residuals are produced.

PRESS<(option)> requests a plot of PRESS residuals or PRESS statistics. These


are based on “leave-one-out” or “leave-set-out” prediction of the
marginal mean.

VCIRYPANEL<(options)> requests a panel of residual graphics based on the scaled


residuals.

Residual plot option:


BOX replaces the inset of summary statistics in the lower right corner
of the panel with a box plot of the residual.
Selected MODEL statement options:

INFLUENCE<(options)> specifies that influence and case deletion diagnostics are to be


computed.
VCIRY requests that responses and marginal residuals be scaled
by the inverse Cholesky root of the marginal variance-covariance
matrix. The variables ScaledDep and ScaledResid are added
to the OUTPM= data set.

Selected INFLUENCE suboptions:


EFFECT= specifies an effect according to which observations are grouped.
Observations sharing the same level of the effect are removed
from the analysis as a group. The effect must contain only
classification variables, but they do not need to be contained
in the model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-135

ITER=n controls the maximum number of additional iterations PROC


MIXED performs to update the fixed-effects and covariance
parameter estimates following data point removal. If you specify
a number greater than 0, then statistics such as DFFITS,
MDFFITS, and the likelihood distances measure the impact
of observation(s) on all aspects of the analysis.

 Compared to noniterative updates, the computations for iterative influence analysis are more
involved. In particular for large data sets and/or a large number of random effects, iterative
updates require considerably more resources. A one-step (ITER=1) or two-step update might
be a good compromise. The output includes the number of iterations performed, which is less
than if the iteration converges. If the process does not converge in iterations, you should
be careful in interpreting the results, especially if n is fairly large.

Partial Output

The scaled residuals appear normally distributed with a few outliers. The random scatter around the zero
reference line indicates no problems with the choice of the c ovariance structure.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-136 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The conditional residuals appear normally distributed with a few extreme outliers.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-137

The conditional studentized residuals appear normally distributed with a few extreme outliers.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-138 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The conditional Pearson residuals appear normally distributed with a few extreme outliers.
Partial Output
Influence Diagnostics for Levels of id

Cook's
Number of D
Observations PRESS Cook's Cov
id in Level Iterations Statistic D MDFFITS COVRATIO COVTRACE Parms

10092 4 2 573.55 0.01029 0.01045 0.8989 0.1052 0.26788


10131 10 2 1018.82 0.01221 0.01231 0.8906 0.1129 0.63267
10132 8 2 74.94 0.00159 0.00158 1.0311 0.0308 0.00568
10135 4 2 51.99 0.00530 0.00527 0.9918 0.0080 0.01967
10145 12 2 92.61 0.00083 0.00083 1.0536 0.0524 0.03892

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-139

Influence Diagnostics for Levels of id

RMSE
MDFFITS without Restricted
Cov COVRATIO COVTRACE deleted Likelihood
id Parms CovParms CovParms level Distance

10092 0.28166 0.8069 0.1781 1.61803 0.3663


10131 0.69023 0.6142 0.4022 1.59015 0.8724
10132 0.00548 1.0643 0.0675 1.61062 0.0268
10135 0.02012 0.9588 0.0397 1.61083 0.0761
10145 0.03863 1.1020 0.0995 1.61888 0.0501

Since an iterative analysis was specified, the Influence Diagnostics for Levels of ID table shows the
overall impact of each cluster representing a subject and the influence diagnostics for the covariance
parameter estimates. Because the maximum number of iterations was set to five, for each deletion set the
covariance parameters were updated up to five times. It should be noted that for every deletion set,
PROC MIXED converged in less than 5 iterations (maximum number was 3).

 RMSE is an estimate of the root mean square error with the cluster deleted.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-140 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The plot of the restricted likelihood distance clearly shows several influential clusters. Cluster 30148 has
the largest restricted likelihood distance. You should examine influential clusters and determine whether
they are erroneous. If these clusters are legitimate, then they might represent important new findings.
They also might indicate that your current model is inadequate.

 By viewing the tooltip information, you can see that patient 30148 had extremely large CD4+ cell
counts.

Several clusters have a large effect on the fixed effects and covariance parameters. These clusters warrant
further investigation. They can point to a model breakdown and lead to the development of a better model
(Schabenberger 2004).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-141

The PRESS statistic measures the influence on the fitted and predicted values. The USEINDEX option
uses as the horizontal axis label the index of the effect level rather than the formatted value(s). Several
clusters appear influential.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-142 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The fixed effects deletion estimates plot gives a detailed picture on how the individual parameter
estimates react to the removal of each cluster. Some of the parameters clearly are affected.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-143

Some of the clusters clearly influenced the interactions in the model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-144 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The plot of the covariance parameter deletion estimates gives a detailed picture of how the individual
covariance parameters react to the removal of the clusters.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-145

Some of the clusters clearly influenced the covariance parameters.


Identify potentially influential clusters based on relatively extreme values for the press and likelihood
distance statistics.
data aids_inf(keep=id cd4_scale time age cigarettes
drug partners depression press rld);
merge aids influence;
by id;
if rld gt 1 | press gt 1000 ;
run;

data aids_id;
set aids_inf;
by id;
if first.id;
run;
The first DATA step merges the aids and influence data sets into the aids_inf data set and subsets to
observations with restricted likelihood distance statistics greater than 1 or PRESS statistics greater than
1000. The second DATA step, creates the aids_id data set and retains only the first observation for each
subject.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-146 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

proc print data=aids_id;


var id press rld;
title2 'Potentially Influential Subjects';
run;

proc print data=aids_inf;


where cd4_scale gt 20 |cd4_scale lt 5;
var id cd4_scale time press rld;
title2 'Potentially Influential Observations';
title3 'With CD4_Scale Counts above 20 or below 5';
run;
title;
The first PROC PRINT prints potentially influential subjects and the second PROC PRINT prints
potentially influential observations with extreme CD4_scale counts.
PROC PRINT Output

Potentially Influential Subjects

Obs id PRESS RLD

1 10131 1018.82 0.8724


2 10171 293.87 1.0724
3 10191 540.85 1.4686
4 10770 491.59 1.3524
5 11165 1189.43 1.4748
6 30148 629.13 3.8442
7 31036 371.33 1.4496
_______________________________________________________________________________________________

Potentially Influential Observations


With CD4_Scale Counts above 20 or below 5

cd4_
Obs id scale time PRESS RLD

2 10131 22.71 0.24914 1018.82 0.8724


11 10171 2.18 -1.27036 293.87 1.0724
12 10171 22.41 -0.73922 293.87 1.0724
15 10171 4.99 0.70910 293.87 1.0724
16 10171 3.82 1.74401 293.87 1.0724
18 10191 31.84 -0.24367 540.85 1.4686
23 10191 3.68 2.71595 540.85 1.4686
33 11165 23.35 -1.22656 1189.43 1.4748
38 11165 24.34 3.38672 1189.43 1.4748
41 30148 30.15 0.26010 629.13 3.8442
51 31036 27.02 -0.19165 371.33 1.4496

Based on the PRESS statistic and the restricted likelihood distance, seven subjects might be considered
influential. Two subjects, 10171 and 10191, have both extremely high and extremely low cd4_scale
counts. Subject 10770 has no extreme counts.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.5 Model Assessment 2-147

Summary of Model Assessment

• There are several influential clusters that should be investigated.


• There are no systematic trends in the residuals that indicate a misspecified
model.
• The distribution of residuals appears to be normally distributed.

133
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In conclusion, model assessment is a critical part of model building. Residual and influence statistic plots
can indicate whether you have a misspecified model, and can assist you in detecting erroneous data
or important new findings. If the objectives of your study are to obtain accurate inferences of the fixed
effects in your model, then the normality assumptions regarding the random effects are not important.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-148 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Exercises

5. Assessing the Model


a. Fit a repeated measures model with the main effects and use the spatial exponential covariance
structure with the local option. Specify plots of the likelihood distances, the PRESS statistics,
influence statistics, and marginal residuals (student, Pearson, and scaled) using the MARGINAL
and BOX residual plot options. Use iterative analysis and set the maximum number of iterations
to 5 and use the FIRSTORDER suboption.
1) Do the residual plots indicate model misspecification?

2) Are there any patients that should be investigated?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.6 Chapter Summary 2-149

2.6 Chapter Summary


The general linear mixed model extends the general linear model by the addition of random effect
parameters and by allowing a more flexible specification of the covariance matrix of the random errors.
For example, general linear mixed models allow for both correlated error terms and error terms with
heterogeneous variances.

The general linear mixed model can easily be fitted to longitudinal data. The model assumes that
the vector of repeated measurements on each subject follows a linear regression model where some
of the regression parameters are population-specific (fixed-effects) whereas other parameters are subject-
specific (random-effects). The subject-specific regression coefficients reflect how the response evolves
over time for each subject.

Estimation is more difficult in the mixed model than in the general linear model. Not only do you have
fixed effects as in the general linear model, but you also have to estimate the covariance matrix
of the random effects, and the covariance matrix of the random errors. Ordinary least squares is no longer
the best method because the distributional assumptions regarding the random error terms are too
restrictive. Generalized least squares is used because it takes into account the covariance structures
of the random effects and random errors.

PROC MIXED implements two likelihood-based methods to estimate the covariance parameters:
maximum likelihood (ML) and restricted maximum likelihood (REML). The difference between ML
and REML is the construction of the likelihood function. However, the two methods are asymptotically
equivalent and often give very similar results. The distinction between ML and REML becomes important
only when the number of fixed effects is relatively large. In that case, the comparisons unequivocally
favor REML.

When finding reasonable estimates for the covariance structures, if you choose a structure that is too
 simple, then you risk increasing the Type I error rate
 complex, then you sacrifice power and efficiency.
The Kenward-Roger degrees of freedom adjustment is superior, or at worst equal, to the Satterthwaite
and default DDFM options. For the more complex covariance structures, the Type I error rate inflation is
extremely severe unless the KR adjustment is used. It is recommended that the KR adjustment be used
along with the FIRSTORDER suboption as the standard operating procedure for longitudinal models.

Longitudinal models usually have three sources of random variation. The between-subject variability
is represented by the random effects. The within-subject variability is represented by the serial
correlation. The correlation between the measurements within subject usually depends on the time
interval between the measurements and decreases as the length of the interval increases. Finally, there
is potentially also measurement error in the measurement process.

The covariance structure that is appropriate for your model is directly related to which component
of variability is the dominant component. For example, if the serial correlation among the measurements
is minimal, then the random effects probably account for most of the variability in the data
and the remaining error components have a very simple covariance structure.
After a candidate-mean model is selected, fitting the model using ordinary least squares regression
and examining the residuals might help determine the appropriate covariance structure. A function
consisting of ordinary least squares that describes the association among repeated measurements
and is easily estimated with irregular observation times is the sample variogram.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-150 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

The data values in the sample variogram are calculated from the observed half-squared differences
between pairs of residuals within individuals, where the residuals are ordinary least squares residuals
based on the mean model, and the corresponding time differences. The vertical axis in the variogram
represents the residual variability within subject over time. The scatter plot contains a smoothed
nonparametric curve, which estimates the general pattern in the sample variogram. This curve can be used
to decide whether the mixed model should include serial correlation. If a serial correlation component
is warranted, the fitted curve can be used in selecting the appropriate serial correlation function. The fitted
curve can also be used to determine whether measurement error and random effects are evident
in the model.

You can also use the information criteria (such as the AIC and BIC) produced by PROC MIXED as a tool
to help you select the most appropriate covariance structure. The smaller the information criteria value,
the better the model. However, only choose the covariance structures that make sense given the data.
For data with unequally spaced time points and different time points across subjects, only compound
symmetry and the spatial covariance structures are appropriate covariance structures. If the time points
are equally spaced, then the AR(1) and Toeplitz covariance structures could be examined. If the time
points were unequally spaced but have the same time points across subjects, then the unstructured
covariance structure could be examined.

PROC MIXED allows heterogeneity in the residual covariance parameters with the GROUP= option.
All observations having the same level of the GROUP effect have the same covariance parameters. Each
new level of the GROUP effect produces a new set of covariance parameters with the same structure
as the original group.
After an appropriate covariance structure is selected, model-building efforts should be directed
at simplifying the mean structure of the model. Because the model should be hierarchically well
formulated, the first step is to evaluate the interactions. One recommended approach is to eliminate the
interactions one at a time, starting with the least significant interaction. If you use the model fit statistics
such as AIC, then you must use the ML estimation method. However, after the final model is chosen, refit
the model using REML because REML estimators are superior.
When the sample variogram clearly shows that the random effects error component is much larger than
the serial correlation error component, a longitudinal model using the RANDOM statement might
be useful. These models are called random coefficient models because the regression coefficients for one
or more covariates are assumed to be a random sample from some population of possible coefficients.
In longitudinal models, the random coefficients are the subject-specific parameter estimates. Random
coefficient models are useful for highly unbalanced data with many repeated measurements per subject.

In random coefficient models, the fixed effect parameter estimates represent the expected values
of the population of intercepts and slopes. The random effects for intercept represent the difference
between the intercept for the ith subject and the overall intercept. The random effects for slope represent
the difference between the slope for the ith subject and the overall slope. Random coefficient models also
have a random error term for the within-subject variation.

In longitudinal models, it is recommended that the unstructured covariance structure be specified


in the RANDOM statement. PROC MIXED estimates the variances of the intercepts and slopes along
with the covariance between the intercepts and slopes in the G matrix. Specifying the unstructured
covariance structure indicates that you do not want to impose any structure on the variances for intercepts
and variances for slopes, and on the covariance between the intercepts and slopes.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.6 Chapter Summary 2-151

In PROC MIXED, you can compute predicted response values using empirical best linear unbiased
predictions (EBLUPs). These predictions can be interpreted as a weighted mean of the population average
profile and the observed data profile. If the residual variability is large in comparison to the between-
subject variability, more weight is given to the overall average profile compared to the observed data.
However, if the residual variability is small in comparison to the between-subject variability, more weight
will be given to the observed data profile.

You can also fit a model in PROC MIXED with both the RANDOM and REPEATED statements.
However, this model is generally not recommended in practice. These models tend to have convergence
and estimation problems, especially with complex covariance structures.
The purpose of model diagnostics is to compare the data with the fitted model to highlight any systematic
discrepancies. Conditional residual plots can be used to detect outliers and whether the random effects are
properly selected. Marginal residual plots can be used to diagnose whether you selected the fixed effect
part of the model properly. Model diagnostics are especially important in linear mixed models because
likelihood-based estimation methods are particularly sensitive to unusual observations.

If the model is correctly specified and the covariance structure is appropriate, then the violation
of the normality assumption of the random effects has little effect on the estimation of the fixed effect
parameter estimates and their standard errors. However, violation of the normality assumption
of the random effects clearly affects the standard errors and parameter estimates of the random effects.
General form of the MIXED procedure:

PROC MIXED DATA=SAS-data-set <options>;


CLASS variables;
MODEL response=<fixed effects></options>;
RANDOM random effects </options>;
REPEATED <repeated effect> </options>;
RUN;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-152 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.7 Solutions
Solutions to Exercises
1. Fitting a General Linear Mixed Model

a. Fit a general linear mixed model with the three main effects, the three two-factor interactions,
and the quadratic and cubic effects of hours. Request the parameter estimates and the Kenward-
Roger method for computing the degrees of freedom. In the REPEATED statement, request
the unstructured covariance structure and the R matrix along with the correlations computed from
the R matrix.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours
/ solution ddfm=kr;
repeated / type=un subject=patient r rcorr;
title 'Longitudinal Model with Unstructured Covariance Structure';
run;

Longitudinal Model with Unstructured Covariance Structure

The Mixed Procedure

Model Information

Data Set LONG.HEARTRATE


Dependent Variable heartrate
Covariance Structure Unstructured
Subject Effect patient
Estimation Method REML
Residual Variance Method None
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

drug 3 a b p

Dimensions

Covariance Parameters 15
Columns in X 15
Columns in Z 0
Subjects 24
Max Obs per Subject 5

Number of Observations

Number of Observations Read 120


Number of Observations Used 120

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-153

Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 810.76784735
1 2 736.70254878 0.00009683
2 1 736.67527615 0.00000056
3 1 736.67512447 0.00000000

Convergence criteria met.

Estimated R Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 90.3624 62.4418 52.6105 44.2552 34.4153


2 62.4418 74.0500 58.1545 52.3782 30.2378
3 52.6105 58.1545 91.3261 70.5043 44.9735
4 44.2552 52.3782 70.5043 80.2893 44.6240
5 34.4153 30.2378 44.9735 44.6240 54.6143

Estimated R Correlation Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 1.0000 0.7633 0.5791 0.5196 0.4899


2 0.7633 1.0000 0.7072 0.6793 0.4755
3 0.5791 0.7072 1.0000 0.8234 0.6368
4 0.5196 0.6793 0.8234 1.0000 0.6739
5 0.4899 0.4755 0.6368 0.6739 1.0000

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) patient 90.3624


UN(2,1) patient 62.4418
UN(2,2) patient 74.0500
UN(3,1) patient 52.6105
UN(3,2) patient 58.1545
UN(3,3) patient 91.3261
UN(4,1) patient 44.2552
UN(4,2) patient 52.3782
UN(4,3) patient 70.5043
UN(4,4) patient 80.2893
UN(5,1) patient 34.4153
UN(5,2) patient 30.2378
UN(5,3) patient 44.9735
UN(5,4) patient 44.6240
UN(5,5) patient 54.6143

Fit Statistics

-2 Res Log Likelihood 736.7

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-154 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

AIC (Smaller is Better) 766.7


AICC (Smaller is Better) 771.9
BIC (Smaller is Better) 784.3

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

14 74.09 <.0001

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 14.5068 24.6470 21.5 0.59 0.5623


hours -1.5366 20.7420 40.1 -0.07 0.9413
drug a 15.9716 30.4845 18.4 0.52 0.6066
drug b 18.4247 29.1195 18.4 0.63 0.5347
drug p 0 . . . .
baseline 0.7847 0.2879 21.4 2.73 0.0125
hours*drug a -8.1887 4.4701 20 -1.83 0.0819
hours*drug b -4.5604 4.4364 20 -1.03 0.3162
hours*drug p 0 . . . .
hours*baseline -0.2567 0.1477 20 -1.74 0.0977
baseline*drug a -0.1430 0.3623 18 -0.39 0.6977
baseline*drug b -0.1327 0.3409 18 -0.39 0.7016
baseline*drug p 0 . . . .
hours*hours 65.4685 39.6707 23 1.65 0.1125
hours*hours*hours -46.4064 25.2000 23 -1.84 0.0785

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 39.3 0.08 0.7784


drug 2 18.5 0.21 0.8088
baseline 1 16.1 17.36 0.0007
hours*drug 2 20 1.69 0.2106
hours*baseline 1 20 3.02 0.0977
baseline*drug 2 18 0.10 0.9090
hours*hours 1 23 2.72 0.1125
hours*hours*hours 1 23 3.39 0.0785

1) The unstructured covariance structure is legitimate for this example because the time intervals
are the same across patients.
2) The R matrix represents the residual covariance matrix. The value in row 1 and column 1
represents the variance of the first measurement. The value in row 2 and column 2 represents
the variance of the second measurement. The value in row 1 and column 2 represents
the covariance of the first and second measurements.

3) The R correlation matrix consists of the correlations of the measurements within patient. It
seems that the autocorrelations decrease over time, especially early in the clinical trial.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-155

4) The null model likelihood ratio test compares the fitted model to a model with an independent
covariance structure. The test is significant, which indicates that the unstructured covariance
structure does a better job modeling the residual error compared to the independent
covariance structure.

5) There are no higher-order terms significant at the .05 level.


b. Fit the same model but with the compound symmetry covariance structure.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours
/ solution ddfm=kr;
repeated / type=cs subject=patient r rcorr;
title 'Longitudinal Model with Compound Symmetry Covariance '
'Structure';
run;

Longitudinal Model with Compound Symmetry Covariance Structure

The Mixed Procedure

Model Information

Data Set LONG.HEARTRATE


Dependent Variable heartrate
Covariance Structure Compound Symmetry
Subject Effect patient
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

drug 3 a b p

Dimensions

Covariance Parameters 2
Columns in X 15
Columns in Z 0
Subjects 24
Max Obs per Subject 5

Number of Observations

Number of Observations Read 120


Number of Observations Used 120
Number of Observations Not Used 0

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-156 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 810.76784735
1 1 754.42562477 0.00000000

Convergence criteria met.

Estimated R Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 77.7052 49.5195 49.5195 49.5195 49.5195


2 49.5195 77.7052 49.5195 49.5195 49.5195
3 49.5195 49.5195 77.7052 49.5195 49.5195
4 49.5195 49.5195 49.5195 77.7052 49.5195
5 49.5195 49.5195 49.5195 49.5195 77.7052

Estimated R Correlation Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 1.0000 0.6373 0.6373 0.6373 0.6373


2 0.6373 1.0000 0.6373 0.6373 0.6373
3 0.6373 0.6373 1.0000 0.6373 0.6373
4 0.6373 0.6373 0.6373 1.0000 0.6373
5 0.6373 0.6373 0.6373 0.6373 1.0000

Covariance Parameter Estimates

Cov Parm Subject Estimate

CS patient 49.5195
Residual 28.1857

Fit Statistics

-2 Res Log Likelihood 754.4


AIC (Smaller is Better) 758.4
AICC (Smaller is Better) 758.5
BIC (Smaller is Better) 760.8

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

1 56.34 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-157

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 22.3320 21.7045 19.1 1.03 0.3164


hours -1.2646 18.7375 90 -0.07 0.9463
drug a -2.8043 27.8947 18.1 -0.10 0.9210
drug b 13.4716 26.6388 18.1 0.51 0.6192
drug p 0 . . . .
baseline 0.6901 0.2538 19 2.72 0.0136
hours*drug a -8.5539 3.3601 90 -2.55 0.0126
hours*drug b -4.3841 3.3348 90 -1.31 0.1920
hours*drug p 0 . . . .
hours*baseline -0.2631 0.1110 90 -2.37 0.0199
baseline*drug a 0.08656 0.3326 18 0.26 0.7976
baseline*drug b -0.07400 0.3129 18 -0.24 0.8157
baseline*drug p 0 . . . .
hours*hours 66.4745 45.5629 90 1.46 0.1481
hours*hours*hours -47.2876 30.9261 90 -1.53 0.1298

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 90 0.09 0.7642


drug 2 18.1 0.26 0.7701
baseline 1 21.9 26.95 <.0001
hours*drug 2 90 3.24 0.0437
hours*baseline 1 90 5.62 0.0199
baseline*drug 2 18 0.16 0.8572
hours*hours 1 90 2.13 0.1481
hours*hours*hours 1 90 2.34 0.1298

1) The compound symmetry covariance structure can be used with any longitudinal data because
the covariance structure assumes equal correlations regardless of the time interval. Therefore,
it can handle equally or unequally spaced time intervals. However, it is usually a poor choice
because the correlations usually decrease with an increasing time interval.
2) The AICC statistic is much lower for the model with the compound symmetry covariance
structure because the penalty is much less severe. The model with the unstructured
covariance structure is estimating 15 covariance parameters while the model with the
compound symmetry covariance structure is estimating only 2. Obviously the 13 extra
covariance parameters do not add much to the model fit.
3) The model with the compound symmetry covariance structure has two higher-order terms
significant at the .05 level and the model with the unstructured covariance structure had no
significant higher-order terms at this alpha level. The differences between the two models
regarding inference are due to the fact that the unstructured covariance structure is probably
too complex for the longitudinal data in this example. Therefore, you sacrifice power and
efficiency. However, the compound symmetry covariance structure is probably too simple for
the longitudinal data in this example. Thus, you increase the Type I error.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-158 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

c. Fit the same model but with the spatial power covariance structure. Because you are using the
spatial power covariance structure, and add a measurement error component and use the
FIRSTORDER suboption.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours
/ solution ddfm=kr(firstorder);
repeated / type=sp(pow)(hours) local subject=patient r rcorr;
title 'Longitudinal Model with Spatial Power Covariance Structure';
run;

Longitudinal Model with Spatial Power Covariance Structure

The Mixed Procedure

Model Information

Data Set LONG.HEARTRATE


Dependent Variable heartrate
Covariance Structure Spatial Power
Subject Effect patient
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

drug 3 a b p

Dimensions

Covariance Parameters 3
Columns in X 15
Columns in Z 0
Subjects 24
Max Obs per Subject 5

Number of Observations

Number of Observations Read 120


Number of Observations Used 120
Number of Observations Not Used 0

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-159

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 810.76784735
1 2 774.20384832 1.92981521
2 4 761.08247978 1.03422286
3 4 759.04559900 2.98907683
4 1 758.17432643 0.31771101
5 1 757.90332970 0.01420522
6 2 756.59661708 0.03981076
7 2 754.17270042 0.18554890
8 2 750.40180026 0.11827784
9 2 748.22207185 0.00199477
10 2 747.61827563 0.00005801
11 1 747.60224474 0.00000001

Convergence criteria met.

Estimated R Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 75.5796 57.8958 52.1462 44.5744 32.5696


2 57.8958 75.5796 54.3741 46.4789 33.9611
3 52.1462 54.3741 75.5796 51.6036 37.7057
4 44.5744 46.4789 51.6036 75.5796 44.1106
5 32.5696 33.9611 37.7057 44.1106 75.5796

Estimated R Correlation Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 1.0000 0.7660 0.6900 0.5898 0.4309


2 0.7660 1.0000 0.7194 0.6150 0.4493
3 0.6900 0.7194 1.0000 0.6828 0.4989
4 0.5898 0.6150 0.6828 1.0000 0.5836
5 0.4309 0.4493 0.4989 0.5836 1.0000

Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 60.3694


SP(POW) patient 0.5339
Residual 15.2102

Fit Statistics

-2 Res Log Likelihood 747.6


AIC (Smaller is Better) 753.6
AICC (Smaller is Better) 753.8
BIC (Smaller is Better) 757.1

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-160 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 63.17 <.0001

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 18.8832 21.3867 22.1 0.88 0.3868


hours 0.2408 19.1330 78.8 0.01 0.9900
drug a 3.9404 26.7200 18.9 0.15 0.8843
drug b 15.7498 25.5218 18.9 0.62 0.5445
drug p 0 . . . .
baseline 0.7283 0.2498 21.9 2.92 0.0081
hours*drug a -8.8914 4.5933 29.4 -1.94 0.0626
hours*drug b -4.3778 4.5587 29.4 -0.96 0.3447
hours*drug p 0 . . . .
hours*baseline -0.2700 0.1518 29.4 -1.78 0.0857
baseline*drug a 0.01147 0.3178 18.6 0.04 0.9716
baseline*drug b -0.1019 0.2990 18.6 -0.34 0.7371
baseline*drug p 0 . . . .
hours*hours 64.4389 39.1773 69.4 1.64 0.1045
hours*hours*hours -46.0162 26.6628 70.8 -1.73 0.0887

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 81 0.05 0.8242


drug 2 18.9 0.23 0.7936
baseline 1 29.7 24.45 <.0001
hours*drug 2 29.4 1.87 0.1715
hours*baseline 1 29.4 3.16 0.0857
baseline*drug 2 18.6 0.10 0.9019
hours*hours 1 69.4 2.71 0.1045
hours*hours*hours 1 70.8 2.98 0.0887

1) The variance plus the residual is an estimate of the variance of the measurements.
The LOCAL option adds an additional variance parameter, which in general adds an
observational error to the time series structure. The spatial power parameter estimate becomes
a correlation coefficient when it is raised to the power of the value of the time interval.
2) The AICC statistic is lower because the spatial power covariance structure is a better fit
to the residual error. The unstructured covariance structure is too complex while the
compound symmetry covariance structure is too simple.
3) No higher-order terms are significant for this model. These inferences differ from the model
with the compound symmetry covariance structure because using the compound symmetry
covariance structure inflated the Type I error.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-161

2. Evaluating Covariance Structures

a. Include the VARIOGRAM and VARIANCE macros (programs long02d02a.sas and


long02d02b.sas). Pass the necessary information to the macros to create the varioplot data set
and to estimate the process variance. Specify as explanatory variables the three main effects,
the three two-factor interactions, and the quadratic and cubic effects of hours. Create a plot
of the sample variogram using PROC SGPLOT and fit a penalized B-spline curve with a
smoothing factor of 50 and 5 knots, fit a vertical reference line at the estimate of the process
variance, specify a vertical axis of 0 to 100, and specify a horizontal axis of 0 to 1.
%include ".\long02d02a.sas";
%include ".\long02d02b.sas";

%variogram(data=long.heartrate,resvar=heartrate,clsvar=drug,
expvars=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours,id=patient,
time=hours,maxtime=5);

%variance(data=long.heartrate,id=patient,resvar=heartrate,
clsvar=drug,expvars=hours drug baseline hours*drug
hours*baseline drug*baseline hours*hours hours*hours*hours,
subjects=24,maxtime=5);

Variogram-Based Estimate of the Process Variance

Obs nonmissing total average


1 6900 443706.95 64.3054

proc sgplot data=varioplot noautolegend;


scatter y=variogram x=time_interval / markerattrs=(color=cyan
symbol=circle);
pbspline y=variogram x=time_interval / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
refline 64.3 / label="Process Variance";
xaxis values=(0 to 1 by .1) label='Time Interval';
yaxis values=(0 to 100 by 10) label='Variogram Values';
title 'Sample Variogram of Heart Rate Data';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-162 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

1) The sample variogram clearly shows that the heart rate data have some measurement error,
a relatively small error component dealing with serial correlation, and a relatively large error
component dealing with the random effects.

2) The LOCAL option should be used along with a covariance structure that allows
the correlations to change over unequal time intervals. Random coefficients models might
be useful also.

b. Plot the autocorrelation function by time interval using PROC SGPLOT with the penalized
B-spline curve
data varioplot;
set varioplot;
autocorr=1-(variogram/64.31);
run;

proc sgplot data=varioplot noautolegend;


pbspline y=autocorr x=time_interval / nomarkers smooth=50 nknots=5
lineattrs=(color=blue pattern=1 thickness=3);
xaxis values=(0 to 1 by .1) label='Time Interval';
yaxis values=(0 to 1 by .1) label='Autocorrelation Values';
title 'Autocorrelation Plot of Heart Rate Data';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-163

1) The autocorrelation function clearly decreases over time and it does not decrease to 0 within
the range of the data. This supports the recommendation that a covariance structure that
handles serial correlation is needed in the model.

c. Generate a graph of the model fit statistics by covariance structure. Select the following
covariance structures: compound symmetry, unstructured, spatial power, spatial exponential,
spatial Gaussian, spatial spherical, and spatial linear. Use ODS to save the model fit statistics
and graph the AIC, AICC, and BIC statistics.
ods select none;
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=cs subject=patient;
ods output fitstatistics=csmodel;
run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=un subject=patient;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-164 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

ods output fitstatistics=unstmodel;


run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=sp(pow)(hours) local subject=patient;
ods output fitstatistics=powmodel;
run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=sp(lin)(hours) local subject=patient;
ods output fitstatistics=linmodel;
run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=sp(gau)(hours) local subject=patient;
ods output fitstatistics=gaumodel;
run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline hours*drug hours*b aseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=sp(exp)(hours) local subject=patient;
ods output fitstatistics=expmodel;
run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours;
repeated / type=sp(sph)(hours) local subject=patient;
ods output fitstatistics=sphmodel;
run;

ods select all;

data model_fit;
length model $ 7 type $ 4;
set csmodel (in=cs)
unstmodel (in=un)
powmodel (in=pow)
linmodel (in=lin)

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-165

gaumodel (in=gau)
expmodel (in=exp)
sphmodel (in=sph);
if substr(descr,1,1) in ('A','B');
if substr(descr,1,3) = 'AIC' then type='AIC';
if substr(descr,1,4) = 'AICC' then type='AICC';
if substr(descr,1,3) = 'BIC' then type='BIC';
if cs then model='CS';
if un then model='UNSTR';
if pow then model='SpPow';
if lin then model='SpLin';
if exp then model='SpExp';
if gau then model='SpGau';
if sph then model='SpSph';
run;

proc sgplot data=model_fit;


scatter y=value x=model / group=type;
xaxis label='Covariance Structure';
yaxis values=(750 to 800 by 10) label='Model Fit Values';
title 'Model Fit Statistics by Covariance Structure';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-166 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

1) The spatial exponential, spatial power, and spatial spherical covariance structures appear
to model the residual error the best.
3. Developing and Interpreting Models

a. Reduce the mean model by eliminating unnecessary higher-order terms. Use the ML estimation
method and the spatial exponential covariance structure. Also add a measurement error
component and use the FIRSTORDER suboption. Use the p-values of the effects along with
the AICC statistic to decide which terms to eliminate. Do not eliminate the main effects.
proc mixed data=long.heartrate method=ml;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
drug*baseline hours*hours hours*hours*hours
/ solution ddfm=kr(firstorder);
repeated / type=sp(exp)(hours) local subject=patient;
title 'Longitudinal Model with Spatial Exponential '
'Covariance Structure';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 45.1236


SP(EXP) patient 1.4965
Residual 15.6998

Fit Statistics

-2 Log Likelihood 777.5


AIC (Smaller is Better) 807.5
AICC (Smaller is Better) 812.1
BIC (Smaller is Better) 825.2

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 59.27 <.0001

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 19.1258 18.6882 29.7 1.02 0.3144


hours -0.04792 18.2905 89.1 -0.00 0.9979
drug a 3.4651 23.2660 24.9 0.15 0.8828
drug b 15.6648 22.2231 25 0.70 0.4874
drug p 0 . . . .
baseline 0.7259 0.2182 29.4 3.33 0.0024
hours*drug a -8.8224 4.2468 30.6 -2.08 0.0462
hours*drug b -4.3803 4.2148 30.6 -1.04 0.3068
hours*drug p 0 . . . .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-167

hours*baseline -0.2688 0.1403 30.6 -1.92 0.0649


baseline*drug a 0.01602 0.2767 24.6 0.06 0.9543
baseline*drug b -0.1007 0.2603 24.6 -0.39 0.7021
baseline*drug p 0 . . . .
hours*hours 64.8515 38.5571 69.4 1.68 0.0971
hours*hours*hours -46.2738 26.2275 70.8 -1.76 0.0820

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 91.2 0.06 0.8049


drug 2 25 0.31 0.7334
baseline 1 39.4 31.53 <.0001
hours*drug 2 30.6 2.16 0.1328
hours*baseline 1 30.6 3.67 0.0649
baseline*drug 2 24.6 0.14 0.8688
hours*hours 1 69.4 2.83 0.0971
hours*hours*hours 1 70.8 3.11 0.0820

The first term to eliminate is baseline*drug.


proc mixed data=long.heartrate method=ml;
class drug;
model heartrate=hours drug baseline hours*drug hours*baseline
hours*hours hours*hours*hours
/ solution ddfm=kr(firstorder);
repeated / type=sp(exp)(hours) local subject=p atient;
title 'Longitudinal Model with Spatial Exponential '
'Covariance Structure';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 45.6064


SP(EXP) patient 1.5098
Residual 15.6751

Fit Statistics

-2 Log Likelihood 777.8


AIC (Smaller is Better) 803.8
AICC (Smaller is Better) 807.2
BIC (Smaller is Better) 819.1

The AICC statistic went down compared to the last model (807.2 versus 812.1).
Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 60.36 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-168 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 22.4088 10.7112 40.3 2.09 0.0428


hours -0.04391 18.2957 89 -0.00 0.9981
drug a 4.6062 3.7021 39.9 1.24 0.2207
drug b 7.2213 3.6742 39.9 1.97 0.0564
drug p 0 . . . .
baseline 0.6872 0.1223 39.9 5.62 <.0001
hours*drug a -8.8234 4.2516 30.7 -2.08 0.0464
hours*drug b -4.3803 4.2195 30.7 -1.04 0.3073
hours*drug p 0 . . . .
hours*baseline -0.2688 0.1405 30.7 -1.91 0.0651
hours*hours 64.8465 38.5412 69.6 1.68 0.0969
hours*hours*hours -46.2707 26.2169 71 -1.76 0.0819

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 91.1 0.06 0.8051


drug 2 39.9 1.98 0.1520
baseline 1 39.9 31.55 <.0001
hours*drug 2 30.7 2.15 0.1333
hours*baseline 1 30.7 3.66 0.0651
hours*hours 1 69.6 2.83 0.0969
hours*hours*hours 1 71 3.11 0.0819

The next term to eliminate is hours*drug.


proc mixed data=long.heartrate method=ml;
class drug;
model heartrate=hours drug baseline hours*baseline hours*hours
hours*hours*hours / solution ddfm=kr(firstorder);
repeated / type=sp(exp)(hours) local subject=patient;
title 'Longitudinal Model with Spatial Exponential '
'Covariance Structure';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 48.2952


SP(EXP) patient 1.2718
Residual 14.8558

Fit Statistics

-2 Log Likelihood 781.9


AIC (Smaller is Better) 803.9
AICC (Smaller is Better) 806.4

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-169

BIC (Smaller is Better) 816.9

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 59.08 <.0001

The AICC continues to decrease from the last model (806.4 versus 807.2).
Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 25.9221 10.7828 40.9 2.40 0.0208


hours -7.3938 18.3709 93.4 -0.40 0.6883
drug a 0.4899 3.1441 25 0.16 0.8774
drug b 5.1518 3.1204 25 1.65 0.1112
drug p 0 . . . .
baseline 0.6695 0.1247 41.3 5.37 <.0001
hours*baseline -0.2310 0.1464 38.3 -1.58 0.1228
hours*hours 64.3983 38.8316 72.8 1.66 0.1015
hours*hours*hours -45.9905 26.4297 74.2 -1.74 0.0860

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 93.4 0.16 0.6883


drug 2 25 1.66 0.2095
baseline 1 41.3 28.83 <.0001
hours*baseline 1 38.3 2.49 0.1228
hours*hours 1 72.8 2.75 0.1015
hours*hours*hours 1 74.2 3.03 0.0860

The next term to eliminate is hours*baseline.


proc mixed data=long.heartrate method=ml;
class drug;
model heartrate=hours drug baseline hours*hours hours*hours*hours
/ solution ddfm=kr(firstorder);
repeated / type=sp(exp)(hours) local subject=patient;
title 'Longitudinal Model with Spatial Exponential '
'Covariance Structure';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 50.0626


SP(EXP) patient 1.1649
Residual 14.3339

Fit Statistics

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-170 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

-2 Log Likelihood 784.3


AIC (Smaller is Better) 804.3
AICC (Smaller is Better) 806.4
BIC (Smaller is Better) 816.1

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 58.47 <.0001

The AICC statistic remains unchanged.


Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 35.0154 9.1766 26 3.82 0.0008


hours -26.4309 13.8591 75.8 -1.91 0.0603
drug a 0.4797 3.1547 25.2 0.15 0.8804
drug b 5.1328 3.1309 25.2 1.64 0.1135
drug p 0 . . . .
baseline 0.5598 0.1043 25.2 5.37 <.0001
hours*hours 64.1033 39.0188 74.1 1.64 0.1046
hours*hours*hours -45.8060 26.5669 75.5 -1.72 0.0888

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 75.8 3.64 0.0603


drug 2 25.2 1.64 0.2133
baseline 1 25.2 28.84 <.0001
hours*hours 1 74.1 2.70 0.1046
hours*hours*hours 1 75.5 2.97 0.0888

The next term to eliminate is the cubic effect of hours.


proc mixed data=long.heartrate method=ml;
class drug;
model heartrate=hours drug baseline hours*hours
/ solution ddfm=kr(firstorder);
repeated / type=sp(exp)(hours) local subject=patient;
title 'Longitudinal Model with Spatial Exponential '
'Covariance Structure';
run;
Partial Output
Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 50.0158


SP(EXP) patient 1.1361
Residual 14.9260

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-171

Fit Statistics

-2 Log Likelihood 787.3


AIC (Smaller is Better) 805.3
AICC (Smaller is Better) 806.9
BIC (Smaller is Better) 815.9

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq


2 56.54 <.0001

The AICC statistic increases from the last model (806.9 versus 806.4). Although the AICC statistic
increases, evaluate only the main effects model. If the AICC statistic is higher for the main effects model
compared to the model with the cubic effect of hours, then the quadratic and cubic effect of hours will
remain in the final model.
Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 34.1108 9.1472 25.8 3.73 0.0010


hours -4.5579 5.6500 62.8 -0.81 0.4229
drug a 0.4782 3.1488 25.2 0.15 0.8805
drug b 5.1382 3.1251 25.2 1.64 0.1126
drug p 0 . . . .
baseline 0.5601 0.1041 25.2 5.38 <.0001
hours*hours -2.5806 5.1737 65.2 -0.50 0.6196

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 62.8 0.65 0.4229


drug 2 25.2 1.65 0.2114
baseline 1 25.2 28.97 <.0001
hours*hours 1 65.2 0.25 0.6196

The next term to eliminate is the quadratic effect of hours.


proc mixed data=long.heartrate method=ml;
class drug;
model heartrate=hours drug baseline / solution ddfm=kr (firstorder);
repeated / type=sp(exp)(hours) local subject=patient;
title 'Longitudinal Model with Spatial Exponential '
'Covariance Structure';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-172 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Partial Output
Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 50.1438


SP(EXP) patient 1.1235
Residual 14.8717

Fit Statistics

-2 Log Likelihood 787.5


AIC (Smaller is Better) 803.5
AICC (Smaller is Better) 804.8
BIC (Smaller is Better) 813.0

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 56.42 <.0001

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 34.3564 9.1304 25.7 3.76 0.0009


hours -7.2144 1.8866 40.3 -3.82 0.0004
drug a 0.4770 3.1472 25.2 0.15 0.8807
drug b 5.1364 3.1235 25.2 1.64 0.1125
drug p 0 . . . .
baseline 0.5600 0.1040 25.2 5.38 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 40.3 14.62 0.0004


drug 2 25.2 1.65 0.2112
baseline 1 25.2 28.99 <.0001

1) The AICC statistic is the smallest of any model (804.8 versus 806.4). Therefore, the main
effects model will be the final model. Furthermore, none of the higher-order terms are
significant in the reduced models.
b. For the reduced model, generate another graph of the model fit statistics by covariance structure.
Use the REML estimation method and only select the five spatial covariance structures.
ods select none;
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline;
repeated / type=sp(pow)(hours) local subject=patient;
ods output fitstatistics=powmodel;
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-173

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline;
repeated / type=sp(lin)(hours) local subject=patient;
ods output fitstatistics=linmodel;
run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline;
repeated / type=sp(gau)(hours) local subject=patient;
ods output fitstatistics=gaumodel;
run;
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline;
repeated / type=sp(exp)(hours) local subject=patient;
ods output fitstatistics=expmodel;
run;

proc mixed data=long.heartrate;


class drug;
model heartrate=hours drug baseline;
repeated / type=sp(sph)(hours) local subject=patient;
ods output fitstatistics=sphmodel;
run;
ods select all;

data model_fit1;
length model $ 7 type $ 4;
set powmodel (in=pow)
linmodel (in=lin)
gaumodel (in=gau)
expmodel (in=exp)
sphmodel (in=sph);
if substr(descr,1,1) in ('A','B');
if substr(descr,1,3)='AIC' then type='AIC';
if substr(descr,1,4)='AICC' then type='AICC';
if substr(descr,1,3)='BIC' then type='BIC';
if pow then model='SpPow';
if lin then model='SpLin';
if exp then model='SpExp';
if gau then model='SpGau';
if sph then model='SpSph';
run;

proc sgplot data=model_fit1;


scatter y=value x=model / group=type;
xaxis label='Covariance Structure';
yaxis values=(750 to 820 by 10) label='Model Fit Values';

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-174 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

title 'Model Fit Statistics by Covariance Structure';


run;

1) The spatial exponential covariance structure is still one of the best fits.
2) The spatial power covariance structure appears to be a relatively poor fit in the reduced model
compared to its fit for the complex mean model.

c. Refit the reduced model using the REML estimation method and the spatial exponential
covariance structure. Also request the correlations from the R matrix and the parameter estimates
for the fixed effects.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline / solution ddfm=kr (firstorder);
repeated / type=sp(exp)(hours) local subject=patient rcorr;
title 'Reduced Model with Spatial Exponential '
'Covariance Structure';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-175

Reduced Model with Spatial Exponential Covariance Structure

The Mixed Procedure

Model Information

Data Set LONG.HEARTRATE


Dependent Variable heartrate
Covariance Structure Spatial Exponential
Subject Effect patient
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

drug 3 a b p

Dimensions

Covariance Parameters 3
Columns in X 6
Columns in Z 0
Subjects 24
Max Obs per Subject 5

Number of Observations

Number of Observations Read 120


Number of Observations Used 120
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 837.26507957
1 3 785.92913529 0.02297363
2 1 785.86160625 0.00010724
3 1 785.86105917 0.00000068
4 1 785.85806809 0.00000064
5 1 785.82588950 0.00000074
6 2 784.63431676 0.01037039
7 2 782.40898562 0.03544906
8 2 779.55195994 0.08082419
9 2 777.85987594 0.00284395
10 1 776.84834470 0.00088709
11 1 776.54727936 0.00014216
12 1 776.50260059 0.00000542
13 1 776.50102828 0.00000001

Convergence criteria met.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-176 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Estimated R Correlation Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 1.0000 0.7564 0.6659 0.5501 0.3754


2 0.7564 1.0000 0.7007 0.5789 0.3950
3 0.6659 0.7007 1.0000 0.6575 0.4487
4 0.5501 0.5789 0.6575 1.0000 0.5431
5 0.3754 0.3950 0.4487 0.5431 1.0000

Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 58.1964


SP(EXP) patient 1.3084
Residual 14.9215

Fit Statistics

-2 Res Log Likelihood 776.5


AIC (Smaller is Better) 782.5
AICC (Smaller is Better) 782.7
BIC (Smaller is Better) 786.0

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 60.76 <.0001

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 34.4030 9.9663 21.2 3.45 0.0024


hours -7.2180 1.9221 37.4 -3.76 0.0006
drug a 0.4884 3.4374 20.9 0.14 0.8884
drug b 5.1223 3.4115 20.9 1.50 0.1482
drug p 0 . . . .
baseline 0.5594 0.1136 20.9 4.92 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 37.4 14.10 0.0006


drug 2 20.9 1.38 0.2744
baseline 1 20.9 24.25 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-177

1) The parameter estimate for hours indicates that for every one-unit increase in hours, the heart
rate decreases 7.218. The linear effect of hours is significant at the .05 level. The parameter
estimates for drug are contrasts of drug a to the placebo and drug b to the placebo. Both
parameter estimates are not significant. Finally, the parameter estimate for baseline indicates
that for every one-unit increase in baseline, the heart rate increases 0.5594. The linear effect
of baseline is significant.

4. Fitting Random Coefficient Models

a. Fit a random coefficient model with a random intercept and hours. Specify the fixed effects as
hours, drug, and baseline. Use an unstructured covariance structure and print out the G matrix,
the correlation matrix based on the V matrix, the parameter estimates for the fixed effects, and the
parameter estimates for the random effects. Use the Kenward-Roger method for computing
degrees of freedom.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline / solution ddfm=kr;
random intercept hours / solution type=un subject=patient g vcorr;
title 'Random Coefficients Model for Heart Rate Data';
run;

Random Coefficients Model for Heart Rate Data

The Mixed Procedure

Model Information

Data Set LONG.HEARTRATE


Dependent Variable heartrate
Covariance Structure Unstructured
Subject Effect patient
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

drug 3 a b p

Dimensions

Covariance Parameters 4
Columns in X 6
Columns in Z per Subject 2
Subjects 24
Max Obs per Subject 5

Number of Observations

Number of Observations Read 120


Number of Observations Used 120
Number of Observations Not Used 0

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-178 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 837.26507957
1 2 779.82841658 0.00000027
2 1 779.82833922 0.00000000

Convergence criteria met.

Estimated G Matrix

Row Effect Subject Col1 Col2

1 Intercept 1 61.5600 -28.6486


2 hours 1 -28.6486 40.9628

Estimated V Correlation Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 1.0000 0.7063 0.6814 0.6218 0.4307


2 0.7063 1.0000 0.6803 0.6279 0.4508
3 0.6814 0.6803 1.0000 0.6387 0.5010
4 0.6218 0.6279 0.6387 1.0000 0.5700
5 0.4307 0.4508 0.5010 0.5700 1.0000

Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) patient 61.5600


UN(2,1) patient -28.6486
UN(2,2) patient 40.9628
Residual 24.3608

Fit Statistics

-2 Res Log Likelihood 779.8


AIC (Smaller is Better) 787.8
AICC (Smaller is Better) 788.2
BIC (Smaller is Better) 792.5

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

3 57.44 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-179

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 36.0465 10.7606 20.4 3.35 0.0031


hours -7.0273 1.8179 23 -3.87 0.0008
drug a -0.4524 3.7153 20 -0.12 0.9043
drug b 4.9365 3.6873 20 1.34 0.1957
drug p 0 . . . .
baseline 0.5434 0.1228 20 4.43 0.0003

Solution for Random Effects

Std Err
Effect Subject Estimate Pred DF t Value Pr > |t|

Intercept 1 -0.5824 3.9258 47.8 -0.15 0.8827


hours 1 6.2144 5.1155 19.1 1.21 0.2392
Intercept 2 -9.3745 5.1353 28.8 -1.83 0.0783
hours 2 6.9608 5.0798 20.1 1.37 0.1857
Intercept 3 3.6104 3.8344 50.4 0.94 0.3509
hours 3 1.9287 5.1178 19 0.38 0.7104
Intercept 4 -2.6542 3.9673 46.7 -0.67 0.5068
hours 4 5.4088 5.1144 19.1 1.06 0.3034
Intercept 5 4.6528 3.8521 49.9 1.21 0.2328
hours 5 -3.2198 5.1173 19 -0.63 0.5367
Intercept 6 -3.0242 4.3040 39.2 -0.70 0.4864
hours 6 -0.4872 5.1054 19.4 -0.10 0.9250
Intercept 7 9.0722 4.4444 36.8 2.04 0.0484
hours 7 -13.8765 5.1014 19.5 -2.72 0.0134
Intercept 8 -9.6469 4.5356 35.4 -2.13 0.0405
hours 8 2.6089 5.0987 19.6 0.51 0.6146
Intercept 9 8.7755 3.9258 47.8 2.24 0.0301
hours 9 -4.9120 5.1155 19.1 -0.96 0.3489
Intercept 10 -8.8075 3.9673 46.7 -2.22 0.0313
hours 10 4.1213 5.1144 19.1 0.81 0.4303
Intercept 11 -6.1570 3.8733 49.3 -1.59 0.1183
hours 11 2.3786 5.1168 19.1 0.46 0.6473
Intercept 12 -1.8738 4.3308 38.7 -0.43 0.6677
hours 12 3.5793 5.1046 19.4 0.70 0.4915
Intercept 13 12.1104 3.8521 49.9 3.14 0.0028
hours 13 -5.6917 5.1173 19 -1.11 0.2799
Intercept 14 -1.4728 4.5356 35.4 -0.32 0.7473
hours 14 -2.7726 5.0987 19.6 -0.54 0.5927
Intercept 15 -11.7795 4.0524 44.6 -2.91 0.0057
hours 15 -0.5029 5.1122 19.2 -0.10 0.9227
Intercept 16 -0.8941 3.9258 47.8 -0.23 0.8208
hours 16 3.4455 5.1155 19.1 0.67 0.5087
Intercept 17 6.7537 3.9673 46.7 1.70 0.0953
hours 17 -0.1474 5.1144 19.1 -0.03 0.9773
Intercept 18 12.4426 4.0524 44.6 3.07 0.0036
hours 18 -5.7778 5.1122 19.2 -1.13 0.2723
Intercept 19 1.0815 3.8733 49.3 0.28 0.7812
hours 19 2.3698 5.1168 19.1 0.46 0.6485
Intercept 20 -1.9984 4.1126 43.2 -0.49 0.6295
hours 20 1.9023 5.1106 19.2 0.37 0.7138

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-180 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Intercept 21 6.5694 3.8733 49.3 1.70 0.0962


hours 21 0.1229 5.1168 19.1 0.02 0.9811
Intercept 22 2.6070 3.8733 49.3 0.67 0.5041
hours 22 -2.4711 5.1168 19.1 -0.48 0.6346
Intercept 23 -7.3873 3.8521 49.9 -1.92 0.0609
hours 23 -3.1270 5.1173 19 -0.61 0.5484
Intercept 24 -2.0229 3.8468 50 -0.53 0.6013
hours 24 1.9448 5.1175 19 0.38 0.7081

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 23 14.94 0.0008


drug 2 20 1.32 0.2903
baseline 1 20 19.59 0.0003

1) The G matrix consists of the variances and covariances of the random effects. The value in
column 1 and row 1 represents the variance of the intercepts. The value in column 2 and row
2 represents the variance of the slopes for hours. The value in column 2 and row 1 represents
the covariance of the intercepts and the slopes of hours.
The information gleaned from the G matrix is that the intercepts and the slopes for hours are
negatively correlated.
2) The residual covariance estimate represents the error that remains after the fixed effects
and random effects are accounted for. This will be modeled by the R matrix, which has
an independent covariance structure.
3) The AICC statistic is slightly larger than the reduced model in the last exercise (788.2 versus
782.7).
4) The correlations from the V matrix appear to decrease at a slower rate when compared
to the correlations from the R matrix from the reduced model in the last exercise.
5) The parameter estimates for the random effects represents deviations from the fixed effects.
Therefore, subject 1 deviates –0.5824 from the population intercept and 6.2144 from
the population slope for hours. The equation for subject 1 (who is taking the placebo)
is Y = 35.46 – 0.8129 * hours + 0.5434 * baseline.

b. Fit a model with both the REPEATED and RANDOM statements. Specify a random intercept
and hours, and use the unstructured covariance structure. Print the G matrix, the correlation
matrix based on the V matrix, and the parameter estimates for the fixed effects. Specify the
spatial exponential covariance structure for the R matrix, add a measurement error component,
and use the FIRSTORDER suboption.
proc mixed data=long.heartrate;
class drug;
model heartrate=hours drug baseline / solution ddfm=kr (firstorder);
random intercept hours / type=un subject=patient g vcorr;
repeated / type=sp(exp)(hours) local subject=patient;
title 'Model with REPEATED and RANDOM statements for '
'Heart Rate Data';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-181

Model with REPEATED and RANDOM statements for Heart Rate Data

The Mixed Procedure

Model Information

Data Set LONG.HEARTRATE


Dependent Variable heartrate
Covariance Structures Unstructured, Spatial
Exponential
Subject Effects patient, patient
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Levels Values

drug 3 a b p

Dimensions

Covariance Parameters 6
Columns in X 6
Columns in Z per Subject 2
Subjects 24
Max Obs per Subject 5

Number of Observations

Number of Observations Read 120


Number of Observations Used 120
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 837.26507957
1 4 779.85976042 0.00000144
2 1 779.82838462 0.00000000

Convergence criteria met but final hessian is not positive


definite.

Estimated G Matrix

Row Effect Subject Col1 Col2

1 Intercept 1 61.5916 -28.7131


2 hours 1 -28.7131 41.1190

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-182 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Estimated V Correlation Matrix for Subject 1

Row Col1 Col2 Col3 Col4 Col5

1 1.0000 0.7065 0.6815 0.6218 0.4301


2 0.7065 1.0000 0.6804 0.6279 0.4503
3 0.6815 0.6804 1.0000 0.6388 0.5007
4 0.6218 0.6279 0.6388 1.0000 0.5700
5 0.4301 0.4503 0.5007 0.5700 1.0000
Covariance Parameter Estimates

Cov Parm Subject Estimate

UN(1,1) patient 61.5916


UN(2,1) patient -28.7131
UN(2,2) patient 41.1190
Variance patient 23.3153
SP(EXP) patient 0
Residual 1.0317

Fit Statistics

-2 Res Log Likelihood 779.8


AIC (Smaller is Better) 789.8
AICC (Smaller is Better) 790.4
BIC (Smaller is Better) 795.7

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

4 57.44 <.0001

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 36.0411 10.7621 20.4 3.35 0.0031


hours -7.0273 1.8194 22.9 -3.86 0.0008
drug a -0.4507 3.7158 20 -0.12 0.9047
drug b 4.9374 3.6878 20 1.34 0.1957
drug p 0 . . . .
baseline 0.5435 0.1228 20 4.43 0.0003

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 22.9 14.92 0.0008


drug 2 20 1.32 0.2904
baseline 1 20 19.59 0.0003

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-183

1) The first covariance parameter estimate represents the variance of the intercepts. The second
covariance parameter estimate represents the covariance of the intercepts and the linear effect
of hours. The third covariance parameter estimate represents the variance of the linear effect
of hours. Adding the fourth and sixth estimates represents the variance of the residuals
in the spatial exponential covariance structure. Finally, the fifth estimate is the parameter
estimate in the spatial exponential covariance structure, which is used to compute the
correlations within subject.
2) The correlations in the V matrix show very little change from the random coefficients model.
3) Because the parameter estimate for the spatial exponential covariance structure is essentially
0, the REPEATED statement is not needed. The AICC statistic also increased from the
random coefficients model (790.4 versus 788.2). The final Hessian matrix is also not positive
definite. Therefore, this model is an inferior model.
4) The inferences for the fixed effects have not changed.
5. Assessing the Model

a. Fit a repeated measures model with the main effects and use the spatial exponential covariance
structure with the local option. Specify plots of the likelihood distances, the PRESS statistics,
influence statistics, and marginal residuals (student, Pearson, and scaled) using the MARGINAL
and BOX residual plot options. Use iterative analysis and set the maximum number of iterations
to 5 and use the FIRSTORDER suboption
proc mixed data=long.heartrate plots=(distance press
studentpanel(marginal box) pearsonpanel(marginal box)
vcirypanel(box));
class drug patient;
model heartrate=hours drug baseline / solution ddfm=kr (firstorder)
influence(effect=patient iter=5) vciry ;
repeated / type=sp(exp)(hours) local subject=patient;
title 'Reduced Model with Spatial Exponential Covariance '
'Structure';
run;

Reduced Model with Spatial Exponential Covariance Structure

The Mixed Procedure

Model Information

Data Set LONG.HEARTRATE


Dependent Variable heartrate
Covariance Structure Spatial Exponential
Subject Effect patient
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Prasad-Rao-Jeske-
Kackar-Harville
Degrees of Freedom Method Kenward-Roger

Class Level Information

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-184 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Class Levels Values

drug 3 a b p
patient 24 201 202 203 204 205 206 207
208 209 210 211 212 214 215
216 217 218 219 220 221 222
223 224 232

Dimensions

Covariance Parameters 3
Columns in X 6
Columns in Z 0
Subjects 24
Max Obs per Subject 5

Number of Observations

Number of Observations Read 120


Number of Observations Used 120
Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 837.26507957
1 3 785.92913529 0.02297363
2 1 785.86160625 0.00010724
3 1 785.86105917 0.00000068
4 1 785.85806809 0.00000064

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

5 1 785.82588950 0.00000074
6 2 784.63431676 0.01037039
7 2 782.40898562 0.03544906
8 2 779.55195994 0.08082419
9 2 777.85987594 0.00284395
10 1 776.84834470 0.00088709
11 1 776.54727936 0.00014216
12 1 776.50260059 0.00000542
13 1 776.50102828 0.00000001

Convergence criteria met.

Covariance Parameter Estimates

Cov Parm Subject Estimate

Variance patient 58.1964


SP(EXP) patient 1.3084
Residual 14.9215

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-185

Fit Statistics

-2 Res Log Likelihood 776.5


AIC (Smaller is Better) 782.5
AICC (Smaller is Better) 782.7
BIC (Smaller is Better) 786.0

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

2 60.76 <.0001

Solution for Fixed Effects

Standard
Effect drug Estimate Error DF t Value Pr > |t|

Intercept 34.4030 9.9663 21.2 3.45 0.0024


hours -7.2180 1.9221 37.4 -3.76 0.0006
drug a 0.4884 3.4374 20.9 0.14 0.8884
drug b 5.1223 3.4115 20.9 1.50 0.1482
drug p 0 . . . .
baseline 0.5594 0.1136 20.9 4.92 <.0001

Type 3 Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

hours 1 37.4 14.10 0.0006


drug 2 20.9 1.38 0.2744
baseline 1 20.9 24.25 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-186 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-187

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-188 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-189

Influence Diagnostics for Levels of patient

Cook's
Number of D
Observations PRESS Cook's Cov
patient in Level Iterations Statistic D MDFFITS COVRATIO COVTRACE Parms

201 5 3 346.69 0.02039 0.02080 1.3429 0.3241 0.41778


202 5 2 582.64 0.12426 0.08446 1.6021 0.5888 0.05447
203 5 2 218.93 0.02589 0.02258 1.2847 0.2645 0.05289
204 5 2 145.46 0.00992 0.00965 1.4847 0.4236 0.04256
205 5 3 349.72 0.00541 0.00530 1.4103 0.3747 0.39051
206 5 1 79.88 0.01172 0.00903 1.5991 0.5143 0.08795
207 5 5 597.44 0.09034 0.11970 1.2459 0.3203 5.85585
208 5 2 846.67 0.14161 0.11394 1.0878 0.1169 0.07529
209 5 3 614.78 0.05875 0.05214 1.0983 0.1041 0.45325
210 5 2 473.65 0.04646 0.04045 1.1797 0.1783 0.03734
211 5 2 230.49 0.02326 0.02004 1.3174 0.2920 0.06368
212 5 2 113.60 0.00692 0.00652 1.6346 0.5410 0.03482
214 5 3 802.36 0.07050 0.06577 0.9321 0.0587 0.06502
215 5 2 153.51 0.02356 0.01823 1.6191 0.5411 0.02485
216 5 3 1492.50 0.18322 0.19012 0.5220 0.5858 0.38449
217 5 2 138.92 0.00775 0.00688 1.4847 0.4213 0.03722
218 5 3 369.64 0.04765 0.04158 1.1761 0.1769 0.05811
219 5 3 1098.13 0.13886 0.13179 0.7626 0.2470 0.15530
220 5 1 75.92 0.01136 0.00977 1.4442 0.3896 0.08556
221 5 2 51.80 0.00340 0.00294 1.5859 0.4979 0.07743
222 5 2 459.32 0.03493 0.03072 1.1915 0.1869 0.02644
223 5 2 78.40 0.00227 0.00197 1.5093 0.4375 0.07651
224 5 3 787.31 0.07198 0.06782 0.8914 0.1042 0.20543
232 5 2 123.15 0.00580 0.00496 1.4723 0.4114 0.04848

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-190 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Influence Diagnostics for Levels of patient

RMSE
MDFFITS without Restricted
Cov COVRATIO COVTRACE deleted Likelihood
patient Parms CovParms CovParms level Distance

201 0.18358 2.136 1.268 3.96249 0.4057


202 0.04808 1.525 0.467 4.00411 0.6873
203 0.05022 1.184 0.183 3.91143 0.1808
204 0.03515 1.632 0.563 3.94414 0.0913
205 0.19130 1.851 0.989 3.93270 0.2971
206 0.07894 1.289 0.272 3.91769 0.1402
207 0.53799 14.927 18.106 4.33641 2.3829
208 0.07591 0.988 0.005 3.93340 0.8413
209 0.52075 0.600 0.437 3.45189 0.8352
210 0.03392 1.200 0.193 3.93848 0.2693
211 0.06232 1.167 0.175 3.89764 0.1788
212 0.03094 1.168 0.168 3.81900 0.0658
214 0.07918 0.756 0.210 3.75219 0.4530
215 0.02229 1.260 0.242 3.88864 0.1391
216 0.56515 0.484 0.545 3.79909 1.6522
217 0.03398 1.150 0.153 3.80802 0.0723
218 0.07114 0.945 0.015 3.85246 0.3108
219 0.19650 0.571 0.453 3.74474 0.9723
220 0.07547 1.375 0.338 3.94864 0.1359
221 0.07225 1.099 0.125 3.78695 0.0896
222 0.02713 0.909 0.074 3.75556 0.1994
223 0.07063 1.082 0.111 3.76707 0.0865
224 0.21509 0.630 0.406 3.61617 0.6081
232 0.04247 1.545 0.495 3.95526 0.0735

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-191

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-192 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-193

1) None of the residual plots indicate model misspecification.


2) Patients 207, 216, and 219 warrant investigation.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-194 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Solutions to Student Activities (Polls/Quizzes)

2.01 Multiple Choice Poll – Correct Answer

Which of the following characteristics do general linear models and general


linear mixed models have in common?
a. Both models support fixed and random effects.
b. Both models can handle correlated error terms.
c. Both models assume that the error terms are normally distributed.
d. None of the above.

11
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.02 Multiple Choice Poll – Correct Answer

Why is ordinary least squares not the preferred estimation method for fixed
effects in general linear mixed models?
a. Ordinary least squares does not support random effects.
b. Ordinary least squares does not support correlated error terms.
c. Ordinary least squares does not support nonnormal distribution of error
terms.
d. Both a and b.

18
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-195

2.03 Multiple Choice Poll – Correct Answer

Which one of the following statements is true regarding the restricted


maximum likelihood (REML) method?
a. REML handles strong correlations among the responses less effectively
than maximum likelihood.
b. REML parameter estimates have a downward bias that the maximum
likelihood parameter estimates do not have.
c. REML parameter estimates approximate maximum likelihood parameter
estimates as the number of fixed effects becomes large.
d. REML parameter estimates are less sensitive to outliers in the data than
maximum likelihood parameter estimates.

24
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.04 Multiple Choice Poll – Correct Answer

Which one of the following covariance structures is not appropriate for


unequally spaced time points in a balanced design?
a. AR(1)
b. Compound Symmetry
c. Unstructured
d. Spatial Power

31
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-196 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.05 Multiple Choice Poll – Correct Answer

What happens to the inferences in the model when a covariance structure is


too complex given the relationships in the data?
a. The inferences have a larger Type I error rate.
b. The inferences have less power and efficiency.
c. The inferences are biased.
d. The inferences are not affected.

39
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.06 Multiple Choice Poll – Correct Answer

Which one of the following statements is true regarding serial correlation?


a. It represents the between-subject variability.
b. It is usually an increasing function of the time separation between
measurements.
c. It can be approximated by the autocorrelation function.
d. It can be accounted for by the compound symmetry covariance
structures.

50
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-197

2.07 Multiple Choice Poll – Correct Answer

What can you conclude if the intercept of the fitted nonparametric curve in
the sample variogram has values much greater than 0?
a. Serial correlation error needs to be addressed in the covariance
structure.
b. Measurement error needs to be addressed in the covariance structure.
c. Random effects error needs to be addressed in the covariance structure.
d. It is irrelevant because the slope of the fitted nonparametric curve
determines the source of the error component.

57
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.08 Multiple Choice Poll – Correct Answer

Which of the following structures is not appropriate in an unbalanced design


with unequally spaced time points and different time points across subjects?
a. Compound symmetry
b. Unstructured
c. Spatial power
d. Spatial Gaussian

63
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-198 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.09 Multiple Choice Poll – Correct Answer

Which of the following is indicated if the fitted nonparametric curve has a


slope of 0 in the sample variogram?
a. There is no measurement error component.
b. There is no random error component.
c. There is no serial correlation component (correlations do not change
over time).
d. Nothing is indicated, because the slope is irrelevant to the identification
of the source of error.

67
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.10 Multiple Choice Poll – Correct Answer

Which one of the following statements is true regarding PROC MIXED?


a. Continuous variables are not permitted as arguments to the
GROUP=option.
b. PROC MIXED has the flexibility of allowing the type of covariance
structure to change across subgroups of subjects within the same model.
c. Observations with a different GROUP effect value are assumed to be
independent even if they are within-subject observations.
d. The number of values for the effect in the GROUP= option does not
impact the number of estimated covariance parameters.

76
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-199

2.11 Multiple Choice Poll – Correct Answer

Which one of the following statements is true regarding ML and REML


estimation methods?
a. Use ML to choose the appropriate covariance structure.
b. Differences in the model fit statistics under REML reflect differences in
the covariance estimates.
c. The likelihood ratio test comparing the full and reduced mean models is
valid only under REML estimation.
d. You can use either estimation method when you reduce the mean model
using model fit statistics such as AIC and BIC.

81
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.12 Multiple Choice Poll – Correct Answer

Which one of the following statements is true regarding random coefficient


models in longitudinal data analysis?
a. The random effects and random errors are normally distributed and can
be correlated with each other.
b. The random coefficients are subject-specific deviations from the
population parameter estimates.
c. There is an R matrix but no G matrix.
d. There is a G matrix but no R matrix.

97
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-200 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

2.13 Multiple Choice Poll – Correct Answer

When is the V matrix the same in the random coefficient model and a model
with the REPEATED statement and several time points?
a. Random coefficient model has a random intercept and slope, and the
repeated model has spatial power covariance structure.
b. Random coefficient model has a random intercept and slope, and the
repeated model has compound symmetry covariance structure.
c. Random coefficient model has only a random intercept, and the
repeated model has compound symmetry covariance structure.
d. Random coefficient model has only a random intercept, and the
repeated model has spatial power covariance structure.

102
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.14 Multiple Choice Poll – Correct Answer

When computing EBLUPs for random coefficient models, if the within-subject


variability is large in comparison to the between-subject variability for an
individual profile, then which of the following is true?
a. The response values are unreliable and the predictions move toward the
population mean.
b. The response values are reliable and the predictions move toward the
observed data.
c. The predictions are the same as the population average because EBLUPs
do not take into account within-subject variability.
d. Only observations with missing response values will have EBLUPs.

107
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2.7 Solutions 2-201

2.15 Multiple Choice Poll – Correct Answer

What covariance structure does the R matrix have in the first random
coefficient model?
a. Unstructured
b. Independent
c. Compound symmetry
d. Spatial power

115
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

2.16 Multiple Choice Poll – Correct Answer

Which one of the following statements is true regarding mixed model


assessment?
a. A conditional residual is the difference between the observed data and
the estimated marginal mean.
b. Marginal residuals are useful in determining whether the random effects
are selected properly.
c. Violation of the normality assumption of the random effects will bias the
fixed effect parameter estimates and SEs.
d. Violation of the normality assumption of the random effects will bias the
random effect parameter estimates and SEs.

131
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
2-202 Chapter 2 Longitudinal Data Analysis w ith Continuous Responses

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Chapter 3 Longitudinal Data
Analysis with Discrete Responses

3.1 Generalized Linear Mixed Models............................................................................. 3-3


Demonstration: Exploratory Data Analysis Using Logit Plots ........................................ 3-31
Demonstration: Fitting Models with Binary Responses in PROC GLIMMIX .................... 3-36

Demonstration: Using the Sandwich Estimator in PROC GLIMMIX............................... 3-52

Exercises............................................................................................................. 3-57

3.2 Applications Using the GLIMMIX Procedure .......................................................... 3-58


Demonstration: Fitting Generalized Linear Mixed Models with an Ordinal Response ....... 3-64

Demonstration: Fitting Generalized Linear Mixed Models with Splines .......................... 3-76

Exercises............................................................................................................. 3-81

3.3 GEE Regression Models ......................................................................................... 3-82


Demonstration: Longitudinal Models Using GEE...................................................... 3-108

Exercises............................................................................................................3-115

3.4 Chapter Summary.................................................................................................. 3-116

3.5 Solutions ............................................................................................................... 3-119


Solutions to Exercises ..........................................................................................3-119
Solutions to Student Activities (Polls/Quizzes) ......................................................... 3-131
3-2 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-3

3.1 Generalized Linear Mixed Models

Objectives

• Explain the concepts behind generalized mixed linear models.


• Describe the GLIMMIX procedure syntax.
• Explain the estimation methods in PROC GLIMMIX.
• Illustrate the COVTEST statement.
• Fit a longitudinal data model with a binary response in the GLIMMIX
procedure.

3
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Limitations of Linear Mixed Models

• Normality assumptions limit general linear mixed models to continuous


responses.
• Different methodology must be used when the responses are discrete and
nonnormal.
• Generalized linear mixed models provide a practical method to analyze
nonnormal responses.

4
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-4 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Longitudinal models fit in the MIXED procedure have the assumption that the conditional responses are
normally distributed. However, the normality assumption might not always be reasonable, especially
when the response variable is discrete. Therefore, generalized linear mixed models will be used to analyze
nonnormal responses. For example, longitudinal data with response variables that are binary or discrete
counts can now be modeled using these models.

Generalized Linear Mixed Models

Generalized linear mixed models have the flexibility to model random effects
and correlated errors for nonnormal data.
• A linear predictor can contain random effects.
• The random effects are normally distributed.
• The conditional mean relates to the linear predictor through a link function:

g (( y |  ))    

• The conditional distribution (given ) of the data belongs to the exponential


family of distributions.

5
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Generalized linear mixed models can model data from an exponential family of distributions, as well
as models with random effects. In these models, you apply a link function to the conditional mean E(y|)
where  are the random effects. The conditional distribution of y| plays the same role as the distribution
of y in the fixed-effects generalized linear model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-5

The GLIMMIX Procedure

• Performs estimation and statistical inference for generalized linear mixed


models (GzLMM)
• Extends the SAS mixed model tools by modeling data from non-Gaussian
distributions
• Extends the generalized linear model (GzLM) by incorporating normally
distributed random effects

6
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

If there are no random effects, PROC GLIMMIX fits generalized linear models. In these models, PROC
GLIMMIX estimates the parameters by maximum likelihood, restric ted maximum likelihood,
or quasi-likelihood. Maximum likelihood and restricted maximum likelihood have been discussed earlier.
Quasi-likelihood will be discussed in a later section.

Generalized Linear Models

g ( E ( yi ))  0  1 x1i   k xk i
• The model relates the expected value of the response variable to the linear
predictor through a link function.
• The variance of the response variable is a specified function of its mean.
• The distribution of the response variable can come from a family of
exponential distributions.

7
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-6 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

To understand generalized linear mixed models, you need to have an understanding of generalized linear
models. These models extend the general linear model in several ways.
1. The distribution of the response variable can come from a family of exponential distributions rather
than just the normal distribution. The exponential family comprises many of the elementary discrete
and continuous distributions.

2. The link function allows a wide variety of response variables to be modeled rather than just
continuous response variables. For example, if the mean of the data is naturally restricted to a range
of values such as a proportion, the appropriate link function will ensure that the predicted values are
within the appropriate range.

3. The variance can be a specified function of the mean rather than just being constant.

 Generalized linear models can also be fit using the GENMOD procedure. This procedure will
be shown in a later section.

Examples of Generalized Linear Models


Response Distribution Link Function

continuous normal identity


binary binomial logit
cumulative
ordinal multinomial
logit
count Poisson natural log

8
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Generalized linear models have three components (McCullagh and Nelder 1989):
random component identifies the response variable and its probability distribution
systematic component specifies the predictor variables used in a linear predictor
link function specifies the function of E(Y) that the model equates to the systematic
component.
For the general linear model, the link function is the identity link (modeling the mean), the response
variable is normally distributed, and the variance is constant. For logistic regression, the link function

is the logit link ( g (u )  log( ) ) and the response variable follows a binomial distribution
1 

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-7

(a common distribution for binary outcomes). For Poisson regression, the link function is the natural log
and the response variable follows the Poisson distribution.
Each distribution in the exponential family has a natural location parameter, θ. For each distribution, there
exists a link function to transform the linear predictor to θ. This link function is called the canonical link.
For example, in the normal distribution the natural location parameter is the mean. Models with canonical
links usually make the best sense on mathematical grounds, but you can choose other link functions
besides the canonical links.

 The reason for restricting the distribution of the response variable to the family of exponential
distributions is that the same algorithm to compute maximum likelihood parameter estimates
applies to this entire family for any choice of monotonic and differentiable link function.

GzLMMs, GzLMs, GLMMs, and GLMs


Models PROCs
GLM – general linear model GLM or REG
GLMM – general linear mixed model MIXED
GzLM – generalized linear model GENMOD
GzLMM – generalized linear mixed model GLIMMIX

GLM  GzLM  GzLMM


GLM  GLMM  GzLMM

9
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The slide above shows the relationships between the general linear model (GLM), general linear mixed
model (GLMM), generalized linear model (GzLM), and generalized linear mixed model (GzLMM).
 General linear models assume normal data, and can be viewed as a special case of generalized linear
models, which can be used to model data from an exponential family of distributions.
 Generalized linear models cannot accommodate random effects, and can be viewed as a special case
of generalized linear mixed models.
 Generalized linear mixed models can model data from an exponential family of distributions,
as well as models with random effects.

On the other hand,


 General linear models can also be viewed as a special case of general linear mixed models, which
model normal response, plus random effects.
 When a response comes from an exponential family of distributions, you can fit a generalized linear
mixed model, which is more general than linear mixed models.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-8 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

3.01 Multiple Choice Poll

Which on the following statements is true regarding generalized linear mixed


models?
a. The variance of the response variable is assumed to be constant.
b. The conditional distribution of the data, given the random effects,
belongs to the exponential family of distributions.
c. The distribution of the random effects can belong to any of the
exponential family of distributions.
d. The response values are assumed to be normally distributed.

10
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

GLIMMIX Procedure
PRO C GLIMMIX <options>;
CLASS variables;
CO NTRAST 'label' contrast-specification </ options>;
CO VTEST <'label'> <test-specification> </ options>;
E F FECT effect-name = effect-type ( var-list < / effect-options >) ;
E S TIMATE 'label' contrast-specification </ options>;
LS MESTIMATE fixed-effect <'label'> values <divisor=n>
</ options>;
MO DEL response <(response options)>=<fixed-effects>
</ options>;
NLO PTIONS <options>;
O UTPUT <OUT=SAS-data-set> <keyword> </ options>;
PA RMS (value-list) …</ options>;
RA NDOM random-effects </ options>;
W E IGHT variable;
Programming statements…
RUN; 12
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The CONTRAST, ESTIMATE, COVTEST, and RANDOM statements can appear multiple times.
All other statements can appear only once with the exception of programming statements. The PROC
GLIMMIX and MODEL statements are required, and the MODEL statement must appear after the
CLASS statement if a CLASS statement is included.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-9

Selected GLIMMIX procedure statements:


CLASS names the classification variables to be used in the analysis. If the CLASS statement
is used, it must appear before the MODEL statement.
CONTRAST provides a mechanism for obtaining custom hypothesis tests. It is patterned after
the CONTRAST statement in PROC MIXED and enables you to select an
appropriate inference space
COVTEST provides a mechanism to obtain statistical inferences for the covariance parameters.
Significance tests are based on the ratio of (residual) likelihoods or pseudo-
likelihoods. Confidence limits and bounds are computed as Wald or likelihood ratio
limits. You can specify multiple COVTEST statements.

EFFECT The EFFECT statement enables you to construct special collections of columns for
design matrices. These collections are referred to as constructed effects to distinguish
them from the usual model effects that are formed from continuous or classification
variables. The name of the effect is specified after the EFFECT keyword. This name
can appear in only one EFFECT statement and cannot be the name of a variable
in the input data set. The effect-type is specified after an equal sign, followed by a list
of variables within parentheses, which are used in constructing the effect. Effect-
options that are specific to an effect-type can be specified after a slash (/) following
the variable list.
ESTIMATE provides a mechanism for obtaining custom hypothesis tests. As in the CONTRAST
statement, the basic element of the ESTIMATE statement is the contrast-
specification, which consists of MODEL and G-side RANDOM effects and their
coefficients.

LSMESTIMATE provides a mechanism for obtaining custom hypothesis tests among the least squares
means. In contrast to the hypotheses tested with the ESTIMATE or CONTRAST
statements, the LSMESTIMATE statement enables you to form linear combinations
of the least squares means, rather than linear combination of fixed-effects parameter
estimates or random-effects solutions, or both. Multiple-row sets of coefficients are
permitted.

MODEL names the dependent variable and the fixed effects. In contrast to PROC GLM, you
do not specify random effects in the MODEL statement. The dependent variable can
be specified using either the response syntax or the events/trials syntax. The
events/trials syntax is specific to models for binomial data.
NLOPTIONS allows for the specification and control of the nonlinear optimization methods.
OUTPUT creates a data set that contains predicted values and residual diagnostics, computed
after fitting the model. By default, all variables in the original data set are included
in the output data set.

PARMS specifies initial values for the covariance or scale parameters, or it requests a grid
search over several values of these parameters in generalized linear mixed models.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-10 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

RANDOM defines the Z matrix of the mixed model, the random effects in the  vector, the
structure of G, and the structure of R. The random effects can be classification or
continuous effects, and multiple RANDOM statements are possible. The RANDOM
_RESIDUAL_ statement indicates a residual-type (R-side) random component that
defines the R matrix.
WEIGHT uses weights to account for the differential weighting of observations. Observations
with nonpositive or missing weights are not included in the resulting analysis.
If a WEIGHT statement is not included, all observations used in the analysis are
assigned a weight of 1.
Selected MODEL statement options:
DIST= specifies the built-in (conditional) probability distribution of the data. If you specify
the DIST= option and you do not specify a user-defined link function,
a default link function is chosen. If you do not specify a distribution, the GLIMMIX
procedure defaults to the normal distribution for continuous response variables and
to the multinomial distribution for classification or character variables, unless the
events/trial syntax is used in the MODEL statement. If you choose the events/trial
syntax, the GLIMMIX procedure defaults to the binomial distribution.

LINK= specifies the link function in generalized linear mixed model.


The GLIMMIX procedure distinguishes two types of random effects. Depending
on whether the variance of the random effect is contained in G or in R, these are referred to as G-side
and R-side random effects. R-side effects are also named residual effects. Simply, if a random effect is an
element of G, it is a G-side effect. Otherwise, it is an R-side effect. Models without G-side effects are also
known as marginal (or population-averaged) models. Models fit with the GLIMMIX procedure can have
none, one, or more of each type of effect.

 An R-side effect in the GLIMMIX procedure is equivalent to a REPEATED effect


in the MIXED procedure. In the GLIMMIX procedure, all random effects are specified through
the RANDOM statement. Various statistical analyses using PROC GLIMMIX are shown in
the “Statistical Analysis with the GLIMMIX Procedure” course.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-11

Basic Features of PROC GLIMMIX

• Incorporates random effects in the model, and therefore it allows for


subject-specific and population-averaged inference
• Allows covariance parameter heterogeneity
• Has flexible covariance structures for random and residual random effects
including the spatial structures
• Enables you to compute variables with SAS programming statements inside
the procedure
• Fits multinomial models for ordinal and nominal outcomes

13
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

As with PROC MIXED, PROC GLIMMIX has RANDOM statements, which allow for subject-specific
(conditional) inference. Other features of PROC GLIMMIX include
 CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements, which produce hypothesis
tests and estimable linear combinations of effects.
 NLOPTIONS statement, which enables you to exercise control over the numerical optimization.
 COVTEST statement, which enables you to obtain inferences for the covariance parameters.
 computed variables with SAS programming statements inside PROC GLIMMIX (except for variables
listed in the CLASS statement). These computed variables can appear in the MODEL, RANDOM,
WEIGHT, or FREQ statements.
 choice of model-based variance-covariance estimators for the fixed effects or empirical (sandwich)
estimators to make the analysis robust against misspecification of the covariance structure and to adjust
for small-sample bias.
 joint modeling for multivariate data.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-12 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

PROC GLIMMIX versus PROC MIXED


Functionality PROC MIXED PROC GLIMMIX
Models nonnormal data No Yes
Allows programming statements No Yes

Uses the RANDOM statement to model No Yes


R-side random effects
Uses the REPEATED statement to model Yes No
R-side random effects

14
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The GLIMMIX and MIXED procedures are closely related and have some common functionality.
However, there are important differences, such as, no REPEATED statement in PROC GLIMMIX.
Furthermore, MODEL, WEIGHT, and FREQ variables, as well as variables specifying RANDOM effects,
SUBJECT= and GROUP= structures, do not have to be in the data set with PROC GLIMMIX. They can
be computed with programming statements in the procedure.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-13

Syntax: PROC GLIMMIX versus PROC MIXED

P R OC GLIMMIX PROC MIXED


BY BY
C L A SS C L A SS
C O N TRAST C O N TRAST
EFFECT
ES TIMATE ES TIMATE
FR EQ
ID ID
L S M EANS L S M EANS
L S M ESTIMATE
M O DEL M O DEL
N L OPTIONS
O U TPUT
PA R MS PA R MS
R A N DOM R A N DOM
R EP EATED
W EIGHT W EIGHT
15
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Notice that both the FREQ statement and the WEIGHT statement are available in PROC GLIMMIX.
The variable in the FREQ statement identifies a numeric variable that contains the frequency of
occurrence for each observation. PROC GLIMMIX treats each observation as if it appears f times,
where f is the value of the FREQ variable for the observation. The analysis that is produced using a FREQ
statement reflects the expanded number of observations. The WEIGHT statement replaces R with
W−1/2 RW−1/2, where W is a diagonal matrix containing the weights.

GzLMM Formulation and PROC GLIMMIX


g (( y |  ))  X  Z
LINK= MODEL RANDOM
option statement statement
Y| ~ exponential family DIST= option

var() = G Options in the


RANDOM statement

var(y| )=A RA


RANDOM
_RESIDUAL_
statement
16
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-14 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Y represents the (n  1) vector of observed data.

 represents a (r  1) vector of random effects.

g() represents a differentiable monotonic link function and g-1 () is its inverse.

X represents a (n  p) design matrix for the fixed effects with rank k.

Z represents a (n  r) design matrix for the random effects.


G represents the variance-covariance matrix for random effects. The random effects are
assumed to be normally distributed with mean 0 and covariance matrix G. Random
effects modeled through the G matrix are referred to as G-side random effects

( y |  ) represents the conditional expected value of Y.

A represents a diagonal matrix and contains the square root of the variance function
of the model. The variance function expresses the variance of a response as a function
of the mean.
R represents the variance-covariance matrix of the residual effects. The residual effects are
referred to as the R-side random effects. The R matrix is, by default, the scaled identity
matrix, R=I, where  is the scale parameter, and is, by definition, 1 for some
distributions (for example, binary, binomial, Poisson, and exponential distribution).
To specify a different R matrix, use the RANDOM statement with the _RESIDUAL_
keyword or the RESIDUAL option in the RANDOM statement.

G-side and R-side Random Effects

• There is no REPEATED statement in PROC GLIMMIX.


• All random effects are specified through the RANDOM statement.
• If a random effect is an element of the G matrix, it is a G-side effect.
• To model the R matrix, use the RANDOM statement with the _RESIDUAL_
keyword or the RESIDUAL option.

17
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-15

The GLIMMIX procedure distinguishes two types of random effects. If the variance of the random effect
is contained in the matrix G, then it is called a G-side random effect. If the variance of the random effect
is contained in the matrix R, then it is called an R-side random effect. R-side effects are also called
residual effects. An R-side random effect in PROC GLIMMIX is equivalent to a REPEATED effect
in PROC MIXED. Models without G-side effects are also known as marginal (or population-averaged)
models. All random effects are specified through the RANDOM statement in PROC GLIMMIX.

The R matrix is by default the scaled identity matrix. To specify a different R matrix, use the RANDOM
statement with the _RESIDUAL_ keyword or the RESIDUAL option. To add a multiplicative
overdispersion parameter, use the _RESIDUAL_ keyword in a separate RANDOM statement.

R-side Random Effects


proc glimmix data=sasuser.aids;
model cd4_scale = time;
random _residual_ / type=sp(pow)(time)
subject=id;
run;

18
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

If there are no repeated effects, use the RANDOM statement with the _RESIDUAL_ keyword to specify
the R-side random effects. The equivalent code in PROC MIXED is:
proc mixed data=long.aids;
model cd4_scale=time;
repeated / type=sp(pow)(time) subject=id;
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-16 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

R-side Random Effects


data aids;
set sasuser.aids;
timec=time;
run;

proc glimmix data=aids noclprint;


class timec;
model cd4_scale=time;
random timec / type=sp(pow)(time)
subject=id residual;
run;

19
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

To specify that the time effect for each patient is an R-side effect with a spatial power covariance
structure, use the RESIDUAL option. Since continuous effects are not allowed in R-side random effects,
two versions of the time variable were created. A continuous time is used in the MODEL statement while
the classification time is used in the RANDOM statement. The equivalent code in PROC MIXED
is shown below.
proc mixed data=aids noclprint;
class timec;
model cd4_scale=time;
repeated timec / type=sp(pow)(time) subject=id;
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-17

G-side Random Effects

G-side random effects accomplish the following:


• model the random effects within the link function
• provide subject-specific interpretations of the model
• indirectly model the correlations among the repeated measurements

20
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Because the generalized linear mixed model is g ( |  )  X  Z , the G-side random effects are fit
inside the link function. In other words, they are on the linked scale, which is similar to random effects
in linear mixed models. The correlations among the repeated measures on the linked scale are
accommodated by the G-side random effects (V=ZGZ’+f(R)). G-side random effect models have subject-
specific interpretations. That is, G-side random effect models provide the model for each subject
identified in the G-side random effects.

R-Side Random Effects

R-side random effects accomplish the following:


• model the random effects outside the link function
• provide population-average interpretations of the model if no G-side random
effects are present
• directly model the correlations among the repeated measurements

21
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-18 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

On the other hand, the R-side random effects are fit outside the link function. The correlations among
the repeated measures outside the link function are directly modeled as long as no G-side random effects
are present. R-side random effect models have population average interpretations if there are no G-side
random effects. That is, models with only R-side random effects provide predictions for the population.

3.02 Multiple Choice Poll

Which of the following statements is true regarding G -side and R-side


random effects?
a. R-side random effects provide subject-specific interpretations of the
model if no G-side random effects are present.
b. All random effects are specified through the RANDOM statement.
c. Continuous effects are allowed in R-side random effects.
d. R-side random effects are modeled by the REPEATED statement in PROC
GLIMMIX.

22
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Distributions Supported by PROC GLIMMIX

Discrete Response Continuous Response


Binary Beta
Binomial Normal
Poisson “Lognormal”
Geometric Gamma
Negative Binomial Exponential
Multinomial (nominal
Inverse Gaussian
and ordinal)
+ combinations Shifted T
+ combinations
24
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The distributions can be specified using the DIST= option in the MODEL statement in PROC GLIMMIX.
Notice that the lognormal distribution is not using the likelihood function for a lognormal distribution.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-19

Instead, it assumes that for the dependent variable Y, log(Y) follows a normal distribution ( ,  2 )
and the likelihood function is for the log(Y), not Y itself.
Combinations of distributions can be specified using the DIST=BYOBS(variable) option in the MODEL
statement in PROC GLIMMIX. This option enables you to model multivariate responses with different
distributions for each response variable. An example is Poisson distribution for one variable and normal
distribution for another response variable.

Parameters Estimation Methods


for Discrete Responses with Random Effects
Pseudo-likelihood linearization method
• uses first-order Taylor series to approximate the model as a series of linear mixed
models
Maximum likelihood methods are
• METHOD=QUAD (Gauss-Hermite quadrature)
• METHOD=LAPLACE (Laplace approximation)

25
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC GLIMMIX estimates the parameters by the pseudo-likelihood method by default for models with
discrete outcomes and random effects. Two maximum likelihood estimation methods based on integral
approximation are available in the PROC GLIMMIX. The METHOD=QUAD option in the PROC
GLIMMIX statement requests that the GLIMMIX procedure approximate the marginal log likelihood
with an adaptive Gauss-Hermite quadrature. The METHOD=LAPLACE option in the PROC GLIMMIX
statement requests that the GLIMMIX procedure approximate the marginal log likelihood by using
the Laplace method.

 Models with normally distributed outcomes use by default REML and models with discrete
outcomes with no random effects use by default ML.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-20 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Pseudo-Likelihood versus Maximum-Likelihood

The challenge in fitting GzLMMs is how to obtain the marginal log likelihood
function:
 p(y | x,  ,  )q( )d
difficult to have closed-form solutions

Compute the marginal log-likelihood


Approximate the integral numerically
for a different but similar model
to obtain the approximated marginal
(GLMM), whose marginal log-
log-likelihood
likelihood has a closed-form solution

Maximum likelihood with quadrature


Pseudo-likelihood (Linearization) 26
C o p yri gh t
or Laplace approximation
© SA S In sti tu te In c. A l l ri gh ts reserved .

Generalized linear mixed models are much more complex than linear mixed models because of the
difficulties in obtaining marginal log-likelihood functions. For all these models, parameter estimates are
obtained by maximizing the objective function, which is the marginal log-likelihood function. For linear
mixed models with normal errors and random effects, the marginal distribution of y over all possible
levels of random effects is simply normal with a mean of X and a covariance V. The log-likelihood
is readily available based on this distribution. However, the marginal distribution of y for generalized
linear mixed models is not readily available for non-Gaussian distributions. Therefore, the challenge
in fitting a generalized linear mixed model is to obtain the marginal distribution of y, or the marginal log-
likelihood function to be maximized.

Pseudo-likelihood (linearization) and maximum likelihood with adaptive quadrature or Laplace


approximation are two different techniques of obtaining marginal log-likelihood for generalized linear
mixed models. Because a closed-form integral for the marginal log-likelihood is difficult to obtain,
approximated marginal log-likelihood must be used. Linearization approximates the marginal log-
likelihood by using an approximated model – linear mixed model, whose marginal log-likelihood has
a closed-form solution and therefore is easy to obtain. The maximum likelihood methods use numerical
techniques to approximate the integral and obtain the approximated marginal log-likelihood.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-21

Pseudo-Likelihood Linearization Method


Update the
linearization
with the
new estimates.
Take first-order Obtain linear
Taylor series to mixed models
linearize with
the model. pseudo-response.
Fit linear mixed
model on
pseudo-response.

27
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The first step in the pseudo-likelihood linearization method is achieved by taking the first-order Taylor
series expansions to linearize the generalized linear mixed model to linear mixed models. Taylor series
expansions enable you to use the derivatives of a function to approximate the function as a sum
of polynomials.

After the linearization, a linear mixed model P = X + Z +  can be fit. P is referred to as the pseudo-
response.  represents fixed effects,  represents random effects, and  represents residuals in the linear
mixed model with the pseudo-response P. The residuals are assumed to be normally distributed with
1 1
mean 0 and variance var() = var(P|)= Δ A  RA  Δ , where

Δ is a diagonal matrix of derivatives of the conditional mean evaluated at the expansion locus.
A represents a diagonal matrix of the square root of the variance function of the model.
R is variance-covariance matrix of the residual effects, or the R-side random effects.

The variance of y, conditional on random effects , is var(y|)= A  RA  , and the marginal variance
1 1
in the linear mixed pseudo-model is V = ZGZ ' Δ A  RA  Δ .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-22 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Benefits of Linearization

• Can be used to fit models where the joint distribution is difficult, if not
impossible, to ascertain.
• Can fit complex models such as models with correlated errors, a large
number of random effects, crossed random effects, and multiple types
of subjects.

28
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The class of models for which pseudo-likelihood estimation can be applied is much larger than the class
of models that maximum likelihood can be applied in PROC GLIMMIX.

Drawbacks of Linearization

• “Likelihood” is for those of an approximated linear mixed model, so there is


no true likelihood.
• Normal assumption for linearized and transformed residuals in GzLMM
might not be appropriate.
• Variance estimates for random effects, especially for a binary outcome with
a rare event or a small number of clusters, might be biased.

29
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Because the linearization approach approximates the generalized linear mixed models as linear mixed
models, the computed likelihood is for these linear mixed models, not the original model. It is not the true
likelihood of your problem. Likelihood ratio tests that compare nested models might not be
mathematically valid and the model fit statistics should not be used for model comparisons (AIC, BIC).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-23

In addition, the normal assumption for the linearized model might not be appropriate. As a result, the
variance estimates for random effects might be biased. This is often the case when the pseudo-response
is far from normal, such as when a binary outcome has many non-events or few clusters (subjects).

Maximum Likelihood Estimation Method

• Use numerical techniques to approximate the integral to obtain the


marginal log-likelihood.
• Log-likelihood of the data is computed, so model comparisons are possible
based on information criteria.
• The pseudo-likelihood bias is avoided.
• This method enables likelihood-based inference about covariance
parameters (the COVTEST statement).
• Estimation includes fixed effects and covariance parameters.

30
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC GLIMMIX includes the fixed effects and all covariance parameters in the optimization when you
choose METHOD=QUAD or METHOD=LAPLACE. Both produce the maximum likelihood estimations.

Laplace estimates typically exhibit better asymptotic behavior and less small-sample bias than pseudo-
likelihood estimators. However, for both Laplace and quadrature methods, the class of models for which
the marginal log likelihood is available is much smaller compared to the class of models to which pseudo-
likelihood estimation can be applied.

 The term quadrature is more or less a synonym for numerical integration, especially as applied
to one-dimensional integrals. Two-dimensional integration is sometimes described as cubature,
although this term is much less frequently used and the meaning of quadrature is understood for
higher dimensional integration, as well.

 A quadrature integration rule is a method of numerical approximation of the definite integral


of a function, particularly as a weighted sum of function values at quadrature points within
the domain of integration:

f ( x)dx   wi f ( xi )
b
a

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-24 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Model Limitations for Maximum Likelihood

For METHOD=QUAD and METHOD=LAPLACE:


• models cannot have R-side random effects or covariance structures
• the Kenward-Roger degrees of freedom adjustment cannot be used
For METHOD=QUAD:
• G-side random effect must be processed by subjects (use the SUBJECT=
option)

31
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC GLIMMIX has a dedicated algorithm for METHOD=LAPLACE, which enables a larger class
of models, such as crossed random effects, random effects with no SUBJECT= option, or subjects that
do not have to be nested as compared to the Gauss-Hermite quadrature. It also allows the NOBOUND
option. As the number of random effects increases, Laplace approximation presents a computationally
more expedient alternative.
If you wonder whether METHOD=LAPLACE would present a viable alternative to a model that you can
fit with METHOD=QUAD, the “Optimization Information” table can provide some insights. The table
contains as its last entry the number of quadrature points determined by PROC GLIMMIX to yield
a sufficiently accurate approximation of the log likelihood (at the starting values). In many cases, a single
quadrature node is sufficient. In that case, the estimates are identical to those of METHOD=LAPLACE.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-25

The COVTEST Statement

The COVTEST statement enables you to obtain statistical inferences for the
covariance parameters.
• fit the model using PROC GLIMMIX
• specify hypotheses about the covariance parameters in the COVTEST
statement.
The procedure will do the following:
• refit the model under the restriction on the covariance parameters
• compare -2(restricted) log likelihoods
• make p-value adjustments for testing on the boundary, if possible and necessary

32
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The COVTEST statement enables you to obtain statistical inferences for the covariance parameters
in a mixed model by likelihood-based tests comparing full and reduced models with respect to the
covariance parameters. The comparisons of the models are based on the log likelihood or restricted log
likelihood in models that are fit by maximum likelihood (ML) or restricted maximum likelihood (REML).
With pseudo-likelihood methods, the calculations are based on the final pseudo-data of the converged
optimization. Confidence limits and bounds are computed as Wald or likelihood ratio limits. You can
specify multiple COVTEST statements.

The COVTEST Statement


COVTEST <'label'> <test-specification> </ options>;

covtest 'Ho: common variance' homogeneity;


covtest 'Ho: no random effects' GLM;
covtest ' Ho: independent random effects' diagG;
covtest 'Ho: no slope variance ' . . 0;

33
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-26 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

The test-specification in the COVTEST statement draws on keywords that represent a particular null
hypothesis, lists or data sets of parameter values, or general contrast specifications. Valid keywords
are as follows:
GLM | INDEP tests the model against a null model of complete independence. All G-side
covariance parameters are eliminated and the R-side covariance structure is
reduced to a diagonal structure.

DIAGG tests for a diagonal G matrix by constraining off-diagonal elements in G to zero.


The R-side structure is not modified.
DIAGR | CINDEP tests for conditional independence by reducing the R-side covariance structure
to diagonal form. The G-side structure is not modified.
HOMOGENEITY tests homogeneity of covariance parameters across groups by imposing equality
constraints.
START | INITIAL compares the final estimates to the starting values of the covariance parameter
estimates. This option is useful if, for example, you supply starting values in the
PARMS statement and want to test whether the optimization produced
significantly better values. In GLMMs based on pseudo-data, the likelihoods that
use the starting and the final values are based on the final pseudo-data.

ZEROG tests whether the G matrix can be reduced to a zero matrix. This eliminates
all G-side random effects from the model.
Only a single keyword is permitted in the COVTEST statement. To test more complicated hypotheses,
you can formulate tests by providing the values for the reduced covariance parameters. For example,
the last example on the slide tests if the last covariance parameter, which corresponds to the slope
variance, is zero.

A Note in the COVTEST Statement

• When the model is estimated by ML or REML, you will get the same results
from the COVTEST statement as you would from conducting a likelihood
ratio test using the full and reduced models.
• When the model is estimated by pseudo-likelihood, you will not get the
same results with the COVTEST statement as you would from a pseudo-
likelihood ratio test based on the full and reduced models.

34
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-27

The COVTEST statement not only works for models estimated by the maximum likelihood method, but
it also works for models estimated by the pseudo-likelihood (linearization) method. However, you do not
get true likelihood ratio tests from the COVTEST statement in the latter case.
When the model is estimated by pseudo-likelihood, PROC GLIMMIX takes the pseudo-data set from
the last iteration (the converged data set) from the full model, and treats it as a linear mixed model for
the COVTEST operations. Therefore, there are no more data set updates. The log likelihood of the
constrained model is then always ordered properly, guaranteeing that the likelihood ratio test statistic
is nonnegative.

3.03 Multiple Choice Poll

Which of the following statements is true regarding the pseudo-likelihood


linearization method?
a. The method cannot be used with the Kenward-Roger degrees of
freedom adjustment.
b. The method might produce biased variance estimates for the random
effects.
c. Model comparisons are possible based on information criteria such as
the AIC and BIC.
d. There is no distributional assumption for the linearized and transformed
residuals.

35
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-28 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Radial Keratotomy Study


Time-dependent
Outcome at 3 predictor variables
time points

Time-independent
predictor variables

37
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Example: Radial keratotomy is a form of surgery used to reduce myopia (nearsightedness). To evaluate
the long-term (10-year) efficacy and stability of the surgery, a longitudinal study of 362 adult
myopic patients was conducted. After surgery, patients were examined at 6 months and then
annually each year for 10 years. At each visit their refractive error was recorded. The concern
of the scientists is that the refractive error would continue to change over time and the patients
would become less and less nearsighted.
These are the variables in the data set:
patientid patient identification number.
visit time of follow-up visit (1=1 year, 4=4 years, 10=10 years).
unstable the outcome variable coded as 1 if there is a continuing effect of the surgery and 0
otherwise. For visit 1, a continuing effect was defined as if there was a reduction in
myopia of 0.5 diopters or more between 6 months and 1 year after surgery. For visit 4
and visit 10, a continuing effect was defined as if there was a reduction in myopia of 1
diopter or more between 6 months and 4 years after surgery for visit 4 and between 6
months and 10 years after surgery for visit 10.
diameter diameter of the clear zone during the surgery (in mm).
age patient age at baseline in years.
gender patient’s gender.
 The radial keratotomy data were provided by Azhar Nizam, Senior Associate, Rollins School
of Public Health of Emory University. The data were modified based on published reports from
the NEI funded Prospective Evaluation of Radial Keratotomy Study (Waring et al. 1994)
to protect confidentiality.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-29

Analysis Strategy

• Perform univariate analysis with contingency tables.


• Perform exploratory data analysis with logit plots to examine the
relationship between the response variable and the explanatory variables.
• Fit a longitudinal model in PROC GLIMMIX.

38
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When building a longitudinal model with a discrete response, it is recommended to first do an exploratory
data analysis with contingency tables and logit plots. A useful contingency table would be the subject’s
identification number by the time or visit value to make sure no subject has multiple records with the
same time or visit value. Then fit a generalized linear mixed model and decide whether you want to use
R-side random effects or G-side random effects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-30 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Estimated Logits

 mi  1 
ln  
 M i  mi  1 

where
m i= number of events
Mi = number of cases

39
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When the response variable is binary, it is common practice to transform the vertical axis of a scatter plot
to the logit scale and plot the logit by the continuous predictor variable. For continuous predictor
variables with a large number of unique values, binning the data (collapsing data values into groups) is
necessary to compute the logit.
A common approach in computing logits is to take the log of the odds. However, the logit is undefined for
any bin in which the outcome rate is 100% or 0%. To eliminate this problem and reduce the variability
of the logits, a common recommendation is to add a small constant to the numerator and denominator
of the formula that computes the logit (Duffy and Santner 1989).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-31

Exploratory Data Analysis Using Logit Plots

Example: Generate a line listing of the keratotomy data and logit plot of age.
/* long03d01.sas */
proc print data=long.keratotomy(obs=20);
title 'Line Listing of Keratotomy Data';
run;

Line Listing of Keratotomy Data

Obs patientid age diameter gender visit unstable

1 1 44.9117 3.0 Male 1 1


2 1 44.9117 3.0 Male 4 1
3 1 44.9117 3.0 Male 10 1
4 2 27.6413 3.5 Female 1 0
5 2 27.6413 3.5 Female 4 0
6 2 27.6413 3.5 Female 10 1
7 3 38.8337 3.5 Male 1 0
8 3 38.8337 3.5 Male 4 1
9 3 38.8337 3.5 Male 10 1
10 4 33.4292 4.0 Female 1 0
11 4 33.4292 4.0 Female 4 0
12 4 33.4292 4.0 Female 10 1
13 5 35.9480 3.0 Male 1 0
14 5 35.9480 3.0 Male 4 0
15 5 35.9480 3.0 Male 10 1
16 6 38.2669 4.0 Female 1 0
17 6 38.2669 4.0 Female 4 0
18 6 38.2669 4.0 Female 10 1
19 7 37.5962 4.0 Female 1 0
20 7 37.5962 4.0 Female 4 0

Notice that there is one observation per time point. Also notice that the variables age, diameter,
and gender are time-independent variables and visit is a time-dependent variable. Finally, notice that
the values of visit are in the proper order within each patient. If the values of visit are not in the proper
order or if there are missing time points for some patients, then visit (or a copy of visit) will have
to be specified in the CLASS statement and specified in the RANDOM statement as a repeated effect
in PROC GLIMMIX.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-32 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

proc rank data=long.keratotomy groups=20 out=ranks;


var age;
ranks bin;
run;

proc means data=ranks noprint nway;


class bin;
var unstable age;
output out=bins sum(unstable)=unstable mean(age)=age;
run;

data bins;
set bins;
logit=log((unstable+1)/(_freq_-unstable+1));
run;

proc sgplot data=bins;


scatter y=logit x=age / markerattrs=(color=blue size=10px
symbol=circlefilled);
xaxis label="Patient Age at Baseline in Years";
yaxis label="Estimated Logit";
title "Estimated Logit Plot of Patient's Age";
run;
Selected PROC RANK statement option:
GROUPS=n bins the variables into n groups.
Selected RANK procedure statement:
RANKS names the group indicators in the OUT= data set. If the RANKS statement is omitted,
then the group indicators replace the VAR variables in the OUT= data set.

Selected PROC MEANS statement option:


NWAY causes the output data set to have only one observation for each level of the class variable

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-33

There seems to be no relationship between the probability of the continuing effect of the surgery
and the age of the patient.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-34 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Logit Plot of Diameter


0.0

logit -0.5

-1.0
3.0 3.5 4.0
Diameter

41
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The variable diameter has a linear relationship with the logits. It seems that the patients with smaller
clear zones (they received more surgery) have a higher probability of having a continuing effect
of the surgery.

Logit Plot of Visit


1.0

logit

-1.0

-2.0
1 4 10
Visit
42
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-35

The variable visit might have a linear relationship with the outcome; it is also possible that the
relationship might be curvilinear. It seems that the longer the follow-up time from the surgery, the higher
the probability of a continuing effect of the surgery. Therefore, the refractive error continues to change as
a result of the surgery and the patients become less and less nearsighted. This is of medical concern
because beyond a certain point, being less nearsighted means becoming farsighted. Because people tend
to become farsighted, as they get older, the continuing effect of the surgery might be accelerating this
process.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-36 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Fitting Models with Binary Responses in PROC GLIMMIX

Example: Fit a generalized linear mixed model to the long.keratotomy data. Specify R-side
random effects, a binary distribution, and use the unstructured covariance structure.
Specify the ODDSRATIO option in the MODEL statement and create customized odds
ratios comparing male to female for gender, 2 to 1 for visit, and 3 to 4 for diameter. Use
an optimization technique of Newton-Raphson with ridging, and create an ODDSRATIO
plot displaying the statistics and a box plot for gender.
/* long03d02.sas */
proc glimmix data=long.keratotomy noclprint=5
plots=(oddsratio(stats) boxplot(fixed));
class patientid gender;
model unstable(event='1') = age diameter gender visit
/ solution ddfm=kr dist=binary
or(diff=first
at visit diameter =1 4
units diameter = -1);
random _residual_ / subject = patientid type=un;
nloptions tech=nrridg;
title 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
run;
Selected PROC GLIMMIX statement option:
PLOTS= requests that the GLIMMIX procedure produce statistical graphics via the Output
Delivery System.
NOCLPRINT suppresses the display of the Class Level Information table. If you specify a number, only
levels with totals that are less than that number are listed in the table.
Selected plot options:

BOXPLOT requests box plots for the residuals in your model by the classification effects only.
The FIXED box plot option produces box plots for all fixed effects consisting entirely
of classification variables.
ODDSRATIO requests a display of odds ratios and their confidence limits when the link function
permits the computation of odds ratios. The STATS odds ratio plot option adds
the numeric values of the odds ratio and its confidence limits to the graphic.
Selected MODEL statement response variable option:
EVENT= specifies the event category for the binary response model.
Selected MODEL statement options:
SOLUTION requests that a solution for the fixed-effects parameters be produced.
DIST= specifies the built-in (conditional) probability distribution of the data.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-37

DDFM= specifies the method for computing the denominator degrees of freedom for the tests
of fixed effects resulting from the MODEL, CONTRAST, ESTIMATE, LSMEANS, and
LSMESTIMATE statements. The keyword KR specifies the Kenward-Roger adjustment.
OR requests estimates of odds ratios and their confidence limits provided the link function
is either the logit, cumulative logit, or generalized logit.

Selected odds ratio options:


DIFF<=diff-type> controls the type of differences for classification main effects. By default,
odds ratios compare the odds of a response for level j of a factor to the
odds of the response for the last level of that factor (DIFF=LAST).
The DIFF=FIRST option compares the levels against the first level,
DIFF=ALL produces odds ratios based on all pairwise differences,
and DIFF=NONE suppresses odds ratios for classification main effects.
AT var-list =value-list specifies the reference values for continuous variables in the model.
By default, the average value serves as the reference.
UNIT var-list =value-list specifies the units in which the effects of continuous variable in the model
are assessed. By default, odds ratios are computed for a change of one unit
from the average.

Selected RANDOM statement options:


SUBJECT= identifies the subjects in your generalized linear mixed model.
TYPE= specifies the covariance structures of G for G-side effects and of R for
R-side effects.

Selected NLOPTIONS statement option:


TECH= specifies the optimization technique. The value of nrridg performs
a Newton-Raphson optimization with ridging.
Generalized Linear Mixed Model of Radial Keratotomy Surgery

The GLIMMIX Procedure

Model Information

Data Set LONG.KERATOTOMY


Response Variable unstable
Response Distribution Binary
Link Function Logit
Variance Function Default
Variance Matrix Blocked By patientid
Estimation Technique Residual PL
Degrees of Freedom Method Kenward-Roger
Fixed Effects SE Adjustment Kenward-Roger

The Model Information table summarizes important information about the model that you fit and about
aspects of the estimation technique. The marginal variance matrix is block-diagonal, and observations
from the same PATIENTID form the blocks. The default estimation technique in generalized linear mixed
models is residual pseudo-likelihood, for distributions other than the normal.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-38 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Class Level Information

Class Levels Values

patientid 362 not printed


gender 2 Female Male

Number of Observations Read 1086


Number of Observations Used 1046

The Class Level Information table lists the levels of the variables specified in the CLASS statement
and the ordering of the levels. The patientid levels have been suppressed because there are more than 5
patientid levels. The Number of Observations table displays the number of observations read and used
in the analysis. There are 362 patients in the study, so 6 patients were dropped because they had missing
values in every observation.
Response Profile

Ordered Total
Value unstable Frequency

1 0 634
2 1 412

The GLIMMIX procedure is modeling the probability that unstable='1'.

The Response Profile table shows the response variable values listed according to their ordered values.
By default, the response variable values are ordered alphanumerically and PROC GLIMMIX models
the probability of ordered value 1. Because you used the EVENT=option, in this example, the model
is based on the probability of having a continuing effect of the surgery (unstable=1).
Dimensions

R-side Cov. Parameters 6


Columns in X 6
Columns in Z per Subject 0
Subjects (Blocks in V) 356
Max Obs per Subject 3

The Dimensions table lists the size of the relevant matrices.


Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 6
Lower Boundaries 3
Upper Boundaries 0
Fixed Effects Profiled
Starting From Data

The Optimization Information table provides information about the methods and size of the optimization
problem.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-39

Iteration History

Objective Max
Iteration Restarts Subiterations Function Change Gradient

0 0 5 4727.8145847 0.48011446 0.00006


1 0 4 4895.6716029 0.14618701 8.303E-8
2 0 3 4935.9159763 0.03409445 6.841E-8
3 0 2 4942.8430009 0.00623447 4.293E-6
4 0 2 4943.8071483 0.00088555 1.709E-9
5 0 1 4943.9593606 0.00014143 0.000013
6 0 1 4943.9812554 0.00002048 2.784E-7
7 0 1 4943.9846473 0.00000317 6.743E-9
8 0 1 4943.9851462 0.00000047 1.46E-10
9 0 1 4943.9852224 0.00000007 3.77E-12
10 0 0 4943.9852337 0.00000000 3.454E-6

Convergence criterion (PCONV=1.11022E-8) satisfied.

The Iteration History table displays information about the progress of the optimization process. After the
initial optimization, PROC GLIMMIX performed 10 updates before the convergence criterion was met.
Fit Statistics

-2 Res Log Pseudo-Likelihood 4943.99


Generalized Chi-Square 1041.00
Gener. Chi-Square / DF 1.00

The ratio of the generalized chi-square statistic and its degrees of freedom is a measure of the residual
variability in the linearized pseudo model. It is not a useful measure for model assessment under pseudo-
likelihood estimation.
Covariance Parameter Estimates

Cov Standard
Parm Subject Estimate Error

UN(1,1) patientid 1.2278 0.09278


UN(2,1) patientid 0.3029 0.05955
UN(2,2) patientid 0.8653 0.06813
UN(3,1) patientid -0.08470 0.06373
UN(3,2) patientid 0.2806 0.05617
UN(3,3) patientid 1.1019 0.08384

The Covariance Parameter Estimates table displays estimates and asymptotic estimated standard errors for
all covariance parameters. Since R-side random effects are being used, the estimates represent the
variances and covariances of the measurements on the logit scale.
Solutions for Fixed Effects

Standard
Effect gender Estimate Error DF t Value Pr > |t|

Intercept 1.9131 0.9165 359 2.09 0.0375


age 0.01074 0.01227 362.1 0.88 0.3820
diameter -1.2162 0.2284 363.4 -5.32 <.0001
gender Female -0.5671 0.1819 358.1 -3.12 0.0020
gender Male 0 . . . .
visit 0.3372 0.02321 372.8 14.53 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-40 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

The Solutions for Fixed Effects table displays the parameter estimates for the fixed effects in the model.
The results show that diameter, gender, and visit are all significant at the 0.05 significance level.
The parameter estimates are on the logit scale.
Odds Ratio Estimates

gender age diameter visit _gender _age _diameter _visit Estimate DF

34.964 4 1 33.964 4 1 1.011 362.1


33.964 3 1 33.964 4 1 3.374 363.4
33.964 4 2 33.964 4 1 1.401 372.8
Male 33.964 4 1 Female 33.964 4 1 1.763 358.1

Odds Ratio Estimates

95% Confidence
gender age diameter visit _gender _age _diameter _visit Limits

34.964 4 1 33.964 4 1 0.987 1.035


33.964 3 1 33.964 4 1 2.153 5.288
33.964 4 2 33.964 4 1 1.339 1.466
Male 33.964 4 1 Female 33.964 4 1 1.233 2.521

The Odds Ratio Estimates table lists the variables and their values that are used in the computation
of the odds ratio, the estimate of the odds ratio, the degrees of freedom, and the 95% confidence limits for
the estimate of the odds ratio. By default, the reference values for continuous variables are the average
values. The reference level for gender is female because the option DIFF=FIRST was used. The first row
of the table compares age 34.964 (value in the numerator for the odds ratio) to age 33.964 (average value)
holding gender, diameter, and visit constant. The odds ratio is 1.011 with a 95% confidence interval
of 0.987 to 1.035. The second row of the table compares diameter 3 to diameter 4 (the reference value
was specified in the odds ratio AT option and the one unit decrease was specified in the odds ratio UNIT
option) holding gender, visit, and age constant. The odds ratio is 3.374 with a 95% confidence interval
of 2.153 to 5.288. The third row compares visit 2 to visit 1 holding the other variables constant. The odds
ratio is 1.401 with a confidence interval of 1.339 to 1.466. Finally, the fourth row of the table compares
gender Male to gender Female holding the other variables constant. The odds ratio is 1.763 with
a confidence interval of 1.233 to 2.521.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-41

The odds ratio plot illustrates the odds ratios from the last table along with the 95% confidence limits.
When the line segment representing the confidence interval crosses 1, the odds ratio is not significant.

 If higher order terms were in the model such as interactions or polynomials, the odds ratios
computed with the ODDSRATIO option would take the higher order terms into account.
Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

age 1 362.1 0.77 0.3820


diameter 1 363.4 28.35 <.0001
gender 1 358.1 9.72 0.0020
visit 1 372.8 211.14 <.0001

The Type III Tests of Fixed Effects table displays significance tests for the fixed effects in the model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-42 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

The box plot for gender shows a few extreme Pearson residuals. Females exhibit more extreme positive
outliers while males exhibit more extreme negative outliers.
Example: Fit a generalized linear mixed model to the long.keratotomy data set using G-side random
effects, the method of adaptive Gauss-Hermite quadrature, and the between-within degrees of
freedom adjustment. Specify the intercept as the random effect and use a binary distribution.
Use an optimization technique of Newton-Raphson with ridging, use the COVTEST statement
to test whether the G matrix can be reduced to a zero matrix, and create an output data set with
the EBLUPs and XBETAs.
proc glimmix data=long.keratotomy noclprint=5 method=quad;
class patientid gender;
model unstable(event='1') = age diameter gender visit
/ solution dist=binary ddfm=bw;
random intercept / subject = patientid;
nloptions tech=nrridg;
covtest "H0: No random effects" zerog;
output out=predict pred(blup ilink)=eblup
pred(noblup ilink)=xbeta;
title 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-43

Selected PROC GLIMMIX statement option:


METHOD= specifies the estimation method in a generalized linear mixed model. The choices among
discrete outcomes with random effects include several pseudo-likelihood techniques,
maximum likelihood with Laplace approximation, and maximum likelihood with
adaptive quadrature.

Selected COVTEST statement keyword:


ZEROG tests whether the G matrix can be reduced to a zero matrix. This eliminates all G-side
random effects from the model.

Selected OUTPUT statement keywords:


BLUP uses the random effects in computing the statistic.
ILINK computes the statistic on the scale of the data.
NOBLUP does not use the random effects in computing the statistic.
The METHOD=QUAD option in the PROC GLIMMIX statement requests that the GLIMMIX procedure
approximate the marginal log likelihood with an adaptive Gauss-Hermite quadrature. If you do not
specify the number of quadrature points with the suboptions of the METHOD option, the GLIMMIX
procedure attempts to determine a sufficient number of points adaptively. It should be noted that the
number of random variables in the quadrature puts serious limitations on the computational performance
and on the memory requirements

 The term quadrature is more or less a synonym for numerical integration, especially
as applied to one-dimensional integrals. Two-dimensional integration is sometimes described
as cubature, although this term is much less frequently used and the meaning of quadrature
is understood for higher dimensional integration as well.

 The default pseudo-likelihood estimation method for models containing random effects is RSPL.
This is the acronym for the residual subject-specific pseudo-likelihood method. The other three
methods are MSPL, RMPL, and MMPL. The first letter determines whether estimation is based
on a residual likelihood (R) or a maximum likelihood (M). The second letter identifies the
expansion locus for the linearization, which can be the vector of random effects solutions (S) or
the mean of the random effects (M).

 In models for normal data with identity link, METHOD=RSPL and METHOD=RMPL are
equivalent to restricted maximum likelihood estimation, and METHOD=MSPL and
METHOD=MMPL are equivalent to maximum likelihood estimation.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-44 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Generalized Linear Mixed Model of Radial Keratotomy Surgery

The GLIMMIX Procedure

Model Information

Data Set LONG.KERATOTOMY


Response Variable unstable
Response Distribution Binary
Link Function Logit
Variance Function Default
Variance Matrix Blocked By patientid
Estimation Technique Maximum Likelihood
Likelihood Approximation Gauss-Hermite Quadrature
Degrees of Freedom Method Between-Within

The between-within degrees of method is used because the Kenward-Roger degrees of freedom method
cannot be used with the maximum likelihood estimation methods.
Class Level Information

Class Levels Values

patientid 362 not printed


gender 2 Female Male

Number of Observations Read 1086


Number of Observations Used 1046

Response Profile

Ordered Total
Value unstable Frequency

1 0 634
2 1 412

The GLIMMIX procedure is modeling the probability that unstable='1'.

Dimensions

G-side Cov. Parameters 1


Columns in X 6
Columns in Z per Subject 1
Subjects (Blocks in V) 356
Max Obs per Subject 3

Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 6
Lower Boundaries 1
Upper Boundaries 0
Fixed Effects Not Profiled
Starting From GLM estimates
Quadrature Points 5

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-45

The Optimization Information table shows the number of quadrature points chosen by the procedure for
the numerical integration calculations. Based on the algorithm used, five nodes are determined by PROC
GLIMMIX to be sufficient for the quadrature. If you want to have a larger number of quadrature points,
you can use the QPOINTS= suboption in the METHOD=QUAD option.

 Recall that quadrature provides an approximation of the definite integral of a function. This
is usually stated as a weighted sum of function values at specified points within the domain
of integration. These specified points are known as the quadrature points.
Iteration History

Objective Max
Iteration Restarts Evaluations Function Change Gradient

0 0 11 1080.8513468 . 565.5695
1 0 13 1077.5642995 3.28704729 1098.838
2 0 9 1058.1407776 19.42352192 236.1398
3 0 9 1051.1727897 6.96798789 63.87405
4 0 9 1049.5098427 1.66294704 14.30124
5 0 9 1049.3679612 0.14188149 1.414772
6 0 9 1049.3664725 0.00148871 0.01517
7 0 9 1049.3664723 0.00000020 0.00002

Convergence criterion (GCONV=1E-8) satisfied.

Fit Statistics

-2 Log Likelihood 1049.37


AIC (smaller is better) 1061.37
AICC (smaller is better) 1061.45
BIC (smaller is better) 1084.62
CAIC (smaller is better) 1090.62
HQIC (smaller is better) 1070.61

The Fit Statistics table lists information about the fitted model. PROC GLIMMIX computes various
information criteria, which typically apply a penalty to the (possibly restricted) log likelihood, log
pseudo-likelihood, or log quasi-likelihood: this penalty depends on the number of parameters or
the sample size, or both. The consistent AIC (CAIC) is an extension of the AIC and it was derived in
order to make the AIC asymptotically consistent and to penalize overparameterization more stringently.
The Hannan-Quinn information criterion (HQIC) has a penalty term that is between the AIC and the BIC.
Fit Statistics for Conditional Distribution

-2 log L(unstable | r. effects) 748.60


Pearson Chi-Square 721.00
Pearson Chi-Square / DF 0.69

The fit statistics for conditional distribution are useful for evaluating the fixed effect model.
If the variance function, the model, and the random effects structure are correctly specified, the Pearson
Chi-Square/DF value should be close to 1. Even under correct specifications, there will be some
variations about the value 1. However, if this value is large, then you must fix something about your
model. It might be the conditional distribution of the response variable, the fixed effects, or the random
effects specified in your model might need revising.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-46 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Covariance Parameter Estimates

Standard
Cov Parm Subject Estimate Error

Intercept patientid 1.5405 0.4752

Solutions for Fixed Effects

Standard
Effect gender Estimate Error DF t Value Pr > |t|

Intercept 2.2635 1.0983 352 2.06 0.0400


age 0.01508 0.01475 352 1.02 0.3072
diameter -1.4759 0.2787 352 -5.29 <.0001
gender Female -0.6843 0.2192 352 -3.12 0.0019
gender Male 0 . . . .
visit 0.4044 0.03232 689 12.51 <.0001

Although the parameter estimates are different from the model fit by the pseudo-likelihood method,
the inferences are approximately the same.
Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

age 1 352 1.05 0.3072


diameter 1 352 28.04 <.0001
gender 1 352 9.75 0.0019
visit 1 689 156.52 <.0001

Tests of Covariance Parameters


Based on the Likelihood

Label DF -2 Log Like ChiSq Pr > ChiSq Note

H0: No random effects 1 1074.75 25.38 <.0001 MI

MI: P-value based on a mixture of chi-squares.

Common questions in mixed modeling are whether variance components are zero, whether random
effects are independent, and whether rows (columns) can be added or removed from an unstructured
covariance matrix. The likelihood ratio chi-square test indicates that the “no random effects model”
is rejected. The model with random effects fits your data better than the model without random effects.
When the parameters under the null hypothesis fall on the boundary of the parameter space, the
distribution of the likelihood ratio statistic can be a complicated mixture of distributions. In certain
situations it is known to be a relatively straightforward mixture of central chi-square distributions. When
the GLIMMIX procedure recognizes the model and hypothesis as a case for which the mixture is readily
available, the p-value of the likelihood ratio test is determined accordingly as a linear combination of
central chi-square probabilities. The Note column in the Likelihood Ratio Tests of Covariance Parameters
table, along with the table’s footnotes, informs you about when mixture distributions are used
in the calculation of p-values.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-47

proc print data=predict(obs=20);


title "EBLUPs and XBETAs from the Keratotomy Study";
run;

EBLUPs and XBETAs from the Keratotomy Study

Obs patientid age diameter gender visit unstable eblup xbeta

1 1 44.9117 3.0 Male 1 1 0.51185 0.25303


2 1 44.9117 3.0 Male 4 1 0.77913 0.53263
3 1 44.9117 3.0 Male 10 1 0.97557 0.92805
4 2 27.6413 3.5 Female 1 0 0.06249 0.05923
5 2 27.6413 3.5 Female 4 0 0.18317 0.17480
6 2 27.6413 3.5 Female 10 1 0.71736 0.70567
7 3 38.8337 3.5 Male 1 0 0.21388 0.12874
8 3 38.8337 3.5 Male 4 1 0.47789 0.33205
9 3 38.8337 3.5 Male 10 1 0.91197 0.84909
10 4 33.4292 4.0 Female 1 0 0.04265 0.03180
11 4 33.4292 4.0 Female 4 0 0.13035 0.09951
12 4 33.4292 4.0 Female 10 1 0.62915 0.55571
13 5 35.9480 3.0 Male 1 0 0.14358 0.22834
14 5 35.9480 3.0 Male 4 0 0.36063 0.49888
15 5 35.9480 3.0 Male 10 1 0.86457 0.91848
16 6 38.2669 4.0 Female 1 0 0.04456 0.03413
17 6 38.2669 4.0 Female 4 0 0.13561 0.10624
18 6 38.2669 4.0 Female 10 1 0.63973 0.57364
19 7 37.5962 4.0 Female 1 0 0.01679 0.03380
20 7 37.5962 4.0 Female 4 0 0.05434 0.10529

The output shows the best linear unbiased predictions and the values of the linear predictors for the first
twenty observations in the keratotomy study.

Example: Fit a generalized linear mixed model to the long.keratotomy data set using G-side random
effects, the method of adaptive Gauss-Hermite quadrature, and the between-within
degrees of freedom adjustment. Specify the intercept as the random effect and visit as a
categorical variable. Use the binary distribution and an optimization technique of
Newton-Raphson with ridging.
proc glimmix data=long.keratotomy noclprint=5 method=quad;
class patientid gender visit;
model unstable(event='1') = age diameter gender visit
/ solution dist=binary ddfm=bw;
random intercept / subject = patientid;
nloptions tech=nrridg;
title 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-48 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Generalized Linear Mixed Model of Radial Keratotomy Surgery

Model Information

Data Set LONG.KERATOTOMY


Response Variable unstable
Response Distribution Binary
Link Function Logit
Variance Function Default
Variance Matrix Blocked By patientid
Estimation Technique Maximum Likelihood
Likelihood Approximation Gauss-Hermite Quadrature
Degrees of Freedom Method Between-Within

Class Level Information

Class Levels Values

patientid 362 not printed


gender 2 Female Male
visit 3 1 4 10

Number of Observations Read 1086


Number of Observations Used 1046

Response Profile

Ordered Total
Value unstable Frequency

1 0 634
2 1 412

The GLIMMIX procedure is modeling the probability that unstable='1'.

Dimensions

G-side Cov. Parameters 1


Columns in X 8
Columns in Z per Subject 1
Subjects (Blocks in V) 356
Max Obs per Subject 3

Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 7
Lower Boundaries 1
Upper Boundaries 0
Fixed Effects Not Profiled
Starting From GLM estimates
Quadrature Points 5

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-49

Iteration History

Objective Max
Iteration Restarts Evaluations Function Change Gradient

0 0 12 1076.1085758 . 620.4781
1 0 13 1073.6473396 2.46123623 1313.735
2 0 10 1052.8574819 20.78985766 284.293
3 0 10 1045.3465254 7.51095646 77.94131
4 0 10 1043.4636627 1.88286271 18.14176
5 0 10 1043.285331 0.17833176 1.996121
6 0 10 1043.2830093 0.00232169 0.027969
7 0 10 1043.2830088 0.00000048 0.000026

Convergence criterion (GCONV=1E-8) satisfied.

Fit Statistics

-2 Log Likelihood 1043.28


AIC (smaller is better) 1057.28
AICC (smaller is better) 1057.39
BIC (smaller is better) 1084.41
CAIC (smaller is better) 1091.41
HQIC (smaller is better) 1068.07

The AIC of 1057.28 is lower than the AIC of the model that treated visit as a continuous variable
(AIC of 1061.37). Therefore, treating visit as a categorical variable led to a better fitting model.
Fit Statistics for Conditional Distribution

-2 log L(unstable | r. effects) 735.86


Pearson Chi-Square 686.69
Pearson Chi-Square / DF 0.66

Covariance Parameter Estimates

Standard
Cov Parm Subject Estimate Error

Intercept patientid 1.6148 0.4899

Solutions for Fixed Effects

Standard
Effect gender visit Estimate Error DF t Value Pr > |t|

Intercept 6.4449 1.1815 352 5.45 <.0001


age 0.01587 0.01496 352 1.06 0.2898
diameter -1.4932 0.2827 352 -5.28 <.0001
gender Female -0.6947 0.2226 352 -3.12 0.0019
gender Male 0 . . . .
visit 1 -3.5227 0.2924 688 -12.05 <.0001
visit 4 -2.8055 0.2572 688 -10.91 <.0001
visit 10 0 . . . .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-50 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

age 1 352 1.12 0.2898


diameter 1 352 27.91 <.0001
gender 1 352 9.74 0.0019
visit 2 688 79.00 <.0001

The contrasts of visit 1 to 10 and visit 4 to 10 are both highly significant.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-51

The “Sandwich” Estimator

• “Sandwich” or empirical estimators are the covariance matrix of the


parameter estimates that are computed based on a “sandwich” formula.
• They are asymptotically consistent estimators and therefore are useful for
obtaining inferences that are not sensitive to the choice of the covariance
model.
• PROC GLIMMIX can produce sandwich estimators that are unbiased even
when the number of clusters is small.

44
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Robust standard errors are derived by the sandwich estimator of the covariance matrix of the regression
coefficients. In general, the sandwich estimator uses a matrix with the diagonal elements equal to the
individual squared residuals to estimate the common variance (the square of any residual is an estimate
of the variance at that predictor variable value). This works because the average of a lot of poor
estimators (individual squared residuals) can be a good estimator of the common variance. In fact, Liang
and Zeger (1986) showed that the robust standard errors are robust to departures of the working
correlation matrix from the true correlation structure.

In the GLIMMIX procedure robust standard errors can be obtained by using the EMPIRICAL option
in the PROC GLIMMIX statement. The EMPIRICAL option in models w ith random effects is valid only
when the model is processed by subjects. The robust standard errors computed in PROC GLIMMIX have
advantages over the robust standard errors computed in other procedures because the classical sandwich
estimator can be biased if the number of subjects (or clusters) is small. However, the EMPIRICAL option
in PROC GLIMMIX has some suboptions that produce bias-corrected sandwich estimators.

 The name “sandwich” estimator stems from the layering of the estimator. An empirically based
estimate of the inverse variance of the parameter estimates (the “meat”) is wrapped by the model-
based variance estimate (the “bread”).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-52 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Using the Sandwich Estimator in PROC GLIMMIX

Example: Fit a generalized linear mixed model to the long.keratotomy data and specify the likelihood-
based sandwich estimators. Specify R-side random effects, a binary distribution, and use the
unstructured covariance structure. Use an optimization technique of Newton-Raphson with
ridging, and request the covariance matrix diagnostics.
/* long03d03.sas */
proc glimmix data=long.keratotomy noclprint=5 empirical=mbn;
class patientid gender;
model unstable(event='1') = age diameter gender visit
/ solution dist=binary covb(details);
random _residual_ / subject = patientid type=un;
nloptions tech=nrridg;
title1 'Generalized Linear Mixed Model of Radial Keratotomy '
'Surgery';
title2 "with Sandwich Estimators";
run;
Selected PROC GLIMMIX options:
EMPIRICAL requests that the covariance matrix of the parameter estimates be computed
as one of the asymptotically consistent estimators, known as sandwich or
empirical estimators.
EMPIRICAL=MBN requests the new likelihood-based sandwich estimator. The MBN suboptions are
a sample size adjustment (the adjustment is applied when the DF suboption is
in effect. The NODF suboption suppresses this component of the adjustment.),
and the tuning parameters r (lower bound of the design parameter) and d (used
in the computation of Morel’s parameter).
Selected MODEL statement options:
COVB produces the approximate variance-covariance matrix of the fixed-effects
parameter estimates
COVB(DETAILS) enables you to obtain a table of statistics about the covariance matrix of the fixed
effects. If an adjusted estimator is used because of the EMPIRICAL= or
DDFM=KENWARDROGER option, the GLIMMIX procedure displays statistics
for the adjusted and unadjusted estimators as well as statistics comparing them.
This enables you to diagnose, for example, changes in rank (because of an
insufficient number of subjects for the empirical estimator) and to assess
the extent of the covariance adjustment. In addition, the GLIMMIX procedure
then displays the unadjusted (model-based) covariance matrix of the fixed-effects
parameter estimates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-53

Generalized Linear Mixed Model of Radial Keratotomy Surgery


with Sandwich Estimators

Model Information

Data Set LONG.KERATOTOMY


Response Variable unstable
Response Distribution Binary
Link Function Logit
Variance Function Default
Variance Matrix Blocked By patientid
Estimation Technique Residual PL
Degrees of Freedom Method Between-Within
Fixed Effects SE Adjustment Sandwich - MBN(df,r=1,d=2)

The design-adjusted MBN estimator applies a bias correction of the classical sandwich estimator that rests
on an additive correction of the residual crossproducts and a sample size correction. The three default
suboptions are df (sample size adjustment is applied), r=1 and d=2 (tuning parameters for the algorithm) .
Besides good statistical properties in terms of Type I error rates in small sample size situations, the MBN
estimator also has the desirable property of recovering rank when the number of sampling units is small.

The Kenward-Roger degrees of freedom method is not available when you use the EMPIRICAL option.
Class Level Information

Class Levels Values

patientid 362 not printed


gender 2 Female Male

Number of Observations Read 1086


Number of Observations Used 1046

Response Profile

Ordered Total
Value unstable Frequency

1 0 634
2 1 412

The GLIMMIX procedure is modeling the probability that unstable='1'.

Dimensions

R-side Cov. Parameters 6


Columns in X 6
Columns in Z per Subject 0
Subjects (Blocks in V) 356
Max Obs per Subject 3

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-54 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 6
Lower Boundaries 3
Upper Boundaries 0
Fixed Effects Profiled
Starting From Data

The optimization information is exactly the same as the information from the model with no EMPIRICAL
option with the R-side random effects. The EMPIRICAL option affects the standard errors and therefore
the inferences for the fixed effects. The optimization technique and size of the optimization problem
should not be affected.
Iteration History

Objective Max
Iteration Restarts Subiterations Function Change Gradient

0 0 5 4727.8145847 0.48011446 0.00006


1 0 4 4895.6716029 0.14618701 8.303E-8
2 0 3 4935.9159763 0.03409445 6.841E-8
3 0 2 4942.8430009 0.00623447 4.293E-6
4 0 2 4943.8071483 0.00088555 1.709E-9
5 0 1 4943.9593606 0.00014143 0.000013
6 0 1 4943.9812554 0.00002048 2.784E-7
7 0 1 4943.9846473 0.00000317 6.742E-9
8 0 1 4943.9851462 0.00000047 1.46E-10
9 0 1 4943.9852224 0.00000007 3.89E-12
10 0 0 4943.9852337 0.00000000 3.454E-6

Convergence criterion (PCONV=1.11022E-8) satisfied.

The iteration history is also the same as the model with no EMPIRICAL option with the R-side random
effects.
Model Based Covariance Matrix for Fixed Effects (Unadjusted)

Effect gender Row Col1 Col2 Col3 Col4 Col5 Col6

Intercept 1 0.8296 -0.00596 -0.1792 -0.00604 0.001271


age 2 -0.00596 0.000149 0.000239 6.265E-6 0.000011
diameter 3 -0.1792 0.000239 0.05154 -0.00180 -0.00127
gender Female 4 -0.00604 6.265E-6 -0.00180 0.03267 -0.00058
gender Male 5
visit 6 0.001271 0.000011 -0.00127 -0.00058 0.000535

The model-based covariance matrix for the fixed effects shows the variances along the diagonal cells
and the covariances on the off-diagonal cells.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-55

Empirical Covariance Matrix for Fixed Effects

Effect gender Row Col1 Col2 Col3 Col4 Col5 Col6

Intercept 1 0.8693 -0.00580 -0.1910 0.01175 0.002261


age 2 -0.00580 0.000139 0.000290 -0.00020 2.732E-6
diameter 3 -0.1910 0.000290 0.05398 -0.00471 -0.00148
gender Female 4 0.01175 -0.00020 -0.00471 0.03362 -0.00057
gender Male 5
visit 6 0.002261 2.732E-6 -0.00148 -0.00057 0.000544

Comparing the unadjusted covariance matrix for the fixed effects with the empirical covariance matrix for
fixed effects, it appears the variance estimate for age decreased while the variance estimates for
diameter, gender, and visit increased with the adjustment. The model-based covariance matrix estimates
are based directly on the assumed covariance structure (in this example, the unstructured covariance
structure). The model-based standard errors are better estimates if the assumed model for the covariance
structure is correct, but worse if the assumed structure is incorrect. The empirical covariance matrix
estimates are robust to the choice of the covariance structure.
Diagnostics for Covariance Matrices of Fixed Effects

Model-
Based Adjusted

Dimensions Rows 6 6
Non-zero entries 25 25

Summaries Trace 0.9145 0.9576


Log determinant -27.67 -27.72

Eigenvalues > 0 5 5
= 0 1 1
max abs 0.869 0.9121
min abs non-zero 629E-8 58E-7
Condition number 138209 157305

Norms Frobenius 0.8697 0.9128


Infinity 1.0221 1.0801

Comparisons Concordance correlation 0.9921


Discrepancy function 0.0727
Frobenius norm of difference 0.0501
Trace(Adjusted Inv(MBased)) 5.0203

Determinant and inversion results apply to the nonsingular


partitions of the covariance matrices.

This table, produced by the COVB(DETAILS) option in the MODEL statement, enables you to diagnose
and assess the extent of the covariance adjustment. Typically, the most important information in this table
is in the Summaries and Eigenvalues information. The trace is the sum of the diagonal elements. If the
adjustment raises the standard errors, then the trace of the adjusted COVB matrix should be larger than
the model-based COVB matrix. In this example, the trace of the adjusted COVB is larger than the model-
based, which means the adjustment raised the standard errors of the fixed effects. In addition, the number
of positive and zero eigenvalues should be the same between the unadjusted and adjusted covariance
matrices.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-56 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Fit Statistics

-2 Res Log Pseudo-Likelihood 4943.99


Generalized Chi-Square 1041.00
Gener. Chi-Square / DF 1.00

The ratio of the generalized chi-square statistic and its degrees of freedom is the same as the model with
no EMPIRICAL option with the R-side random effects.
Covariance Parameter Estimates

Cov Standard
Parm Subject Estimate Error

UN(1,1) patientid 1.2278 0.09278


UN(2,1) patientid 0.3029 0.05955
UN(2,2) patientid 0.8653 0.06813
UN(3,1) patientid -0.08470 0.06373
UN(3,2) patientid 0.2806 0.05617
UN(3,3) patientid 1.1019 0.08384

The variance component estimate in the Covariance Parameter Estimates table is exactly the same
as the result from the model with no EMPIRICAL option with the R-side random effects. Since the
EMPIRICAL option affects only the standard error of the fixed effects and therefore the inference for
fixed effects (and not the random effects), the covariance parameter estimates should not be affected.
Solutions for Fixed Effects

Standard
Effect gender Estimate Error DF t Value Pr > |t|

Intercept 1.9131 0.9324 352 2.05 0.0409


age 0.01074 0.01179 352 0.91 0.3633
diameter -1.2162 0.2323 352 -5.23 <.0001
gender Female -0.5671 0.1834 352 -3.09 0.0021
gender Male 0 . . . .
visit 0.3372 0.02332 352 14.46 <.0001

Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

age 1 352 0.83 0.3633


diameter 1 352 27.40 <.0001
gender 1 352 9.57 0.0021
visit 1 352 209.08 <.0001

The Type III Tests of Fixed Effects are computed based on the empirical estimates. The results are similar,
but not identical to the results from the model with no EMPIRICAL option with the R-side random
effects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.1 Generalized Linear Mixed Models 3-57

Exercises

A longitudinal study was undertaken to assess the health effects of air pollution on children. The data
contain repeated binary measures of wheezing status for each of 537 children from Steubenville, Ohio.
The measurements were taken at age 7, 8, 9, and 10 years. The smoking status of the mother
at the first year of the study was also recorded. The data are stored in a SAS data set called long.wheeze.

These are the variables in the data set:


case patient identification number
wheeze wheezing status of child (1=yes, 0=no)
age age of child when measurement was taken (in years)
smoker smoking status of mother (Yes versus No).

 The data were obtained with permission from the OZDATA website. This website is a collection
of data sets and is maintained in Australia.

1. Generating Empirical Logit Plots


a. Generate a line listing of the wheezing data (first 20 observations) and logit plots of age.
1) Are the data in the proper order?
2) Describe the logit plot for age.
2. Fitting Generalized Linear Mixed Models
a. Fit a generalized linear mixed model to the long.wheeze data set using G-side random effects, the
method of adaptive Gauss-Hermite quadrature, and the between-within degrees of freedom
adjustment. Specify wheeze as the response variable and smoker, age, and age*age as the
predictor variables. Model the probability that wheeze is equal to 1 with the EVENT= option.
Also, request that the solution for the fixed-effects parameters be produced. Specify the
optimization technique of Newton-Raphson with ridging, and compute the odds ratio for smoker
(No as the reference value) and for a one-year decrease in age (10 as the reference value). Create
an odds ratio plot and display the statistics and use the COVTEST statement to test whether the G
matrix can be reduced to a zero matrix.

1) Interpret the odds ratio for age. Would the odds ratio change for a two-year decrease in age?
2) Interpret the Tests of Covariance Parameters table. Does the model with the random effects fit
the data better than the model without random effects?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-58 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

3.04 Multiple Choice Poll

The odds ratio for age in the exercise was 1.634. How can this be
interpreted?
a. A one-year decrease from age 10 results in a 63% increase in the odds of
wheezing.
b. A one-year decrease from any age results in a 63% increase in the odds
of wheezing.
c. A one-year increase from any age results in a 63% increase in the odds of
wheezing.
d. A one-year increase from age 10 results in a 63% increase in the odds of
wheezing.

47
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.2 Applications Using the GLIMMIX


Procedure

Objectives

• Define the concepts of ordinal logistic regression.


• Illustrate how to build a regression spline using the EFFECT statement.
• Fit a generalized linear mixed model with an ordinal response in PROC
GLIMMIX.

51
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-59

Ordinal Logistic Regression

• Models are used when the response variable is ordinal.


• Models can also be used when the response variable has a restricted range
due to limitations of the measuring device.
• Models use a link function of the cumulative logits.
• Only the G-side random effects are available for ordinal logistic regression
models in PROC GLIMMIX.

52
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In some situations with a continuous outcome, there is a restricted range of values because
of the limitations of the measuring techniques. This is a common feature in bioassay analyses. With
restricted ranges, there is usually a lower limit of quantification (LOQ) and an upper limit of
quantification. For example, suppose that the response variable had a lower LOQ of 300 and the upper
LOQ of 900 because of the limitations of the measuring device. Analyzing the response variable
as continuous might not be optimal given the truncated nature of the distribution. An alternative way
to analyze a continuous variable with a restricted range is to create ordered categories and fit an ordinal
logistic regression model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-60 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Cumulative Logits

If an ordinal outcome has k levels with proportions


( p1 , p2 , p3 ,..., pk ),

then the cumulative logits are


 p1   p1  p2   p  ...  pk 1 
log , log ,...,log 1
 
 p2  ...  pk   p3  ...  pk   p k 

53
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In ordinal logistic regression, the logit is now a cumulative logit. If k is the number of categories for
the outcome variable, then the number of cumulative logits is k-1. The GLIMMIX procedure models the
probabilities of levels of the response variable having lower ordered values in the Response Profile table.

Logistic Models

Binary Logistic Model   0  1 X1

Ordinal Logistic Model   0i  1 X1

where i indexes the cumulative logits.

54
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC GLIMMIX estimates a separate intercept for each cumulative logit. However, PROC GLIMMIX
does not estimate a separate slope for each cumulative logit, but rather a common slope across the
cumulative logits for each predictor variable. This common slope is a weighted average across the logits.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-61

Therefore, a parallel-lines regression model is fitted in which each curve that describes the cumulative
probabilities has the same shape. The only difference in the curves is the difference between the values
of the intercept parameters. This model is called a proportional odds model.

Proportional Odds Model

Logit(cum P)

Age

55
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The common effect of the predictor variable for different cumulative logits in the proportional odds model
can be motivated by assuming that a regression model holds when the response is measured more finely
(Anderson and Phillips 1981). For example, suppose there is an underlying continuous response variable
with ordered categories that is produced via cutoff points. The relationship between the predictor variable
and the outcome should not depend on the cutoff points. In other words, the effect parameters are
invariant to the choice of categories for the outcome variable. Only the intercept is affected by the cutoff
points.
Because there is a common slope for each predictor variable, the odds ratio is constant for all the
categories. The odds ratios can be interpreted as the effect of the predictor variable on the odds of being
in a lower rather than in a higher category, regardless of what cumulative logit you are examining
(the odds are cumulative odds). If you use the DESCENDING option in the MODEL statement, the odds
ratio is the effect of the predictor variable on the odds of being in a higher rather than a lower category.
The proportional odds model is also invariant to the choice of the outcome categories. There is some loss
of efficiency when you collapse the ordinal categories, but when the observations are evenly spread
among the categories the efficiency loss is minor. However, the efficiency loss is large when you collapse
the ordinal categories to a binary response (Agresti 1996). Allison (1999) recommends that you need
at least 10 observations for each category of the response variable. As the number of categories increases,
ordinary least squares might be appropriate. However, Hastie et al. (1989) showed that ordinary least
squares methods could give misleading results with up to 13 categories of the response variable.
The proportional odds model also makes no assumptions about the distances between the categories.
Therefore, how you code the ordinal outcome variable has no effect on the odds ratios.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-62 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

3.05 Multiple Choice Poll

Which one of the following statements is true for proportional odds models?
a. The model fits separate intercepts.
b. The model fits separate slopes.
c. The cumulative logits compare each category to the last category.
d. The coding of the ordinal outcome affects the odds ratios.

56
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

CD4+ Cell Numbers Data Set

CD4+ Cell
Numbers

Years since Seroconversion


58
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Example: The human immune deficiency virus (HIV) causes AIDS by attacking an immune cell called
the CD4+ cell, which facilitates the body’s ability to fight infection. An uninfected person has
approximately 1100 cells per milliliter of blood. Because CD4+ cells decrease in number from
the time of infection, a person’s CD4+ cell count can be used to monitor disease progression.
A subset of the Multicenter AIDS Cohort Study (Kaslow et al. 1987) was obtained for 369
infected men to examine CD4+ cell counts over time. The data is stored in a SAS data set
called long.cd4cat.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-63

The variables in the data set are


cd4cat CD4+ cell count category (1=0-300, 2=301-600, 3=601-900, 4=901+)
time time in years since seroconversion (time when HIV becomes detectable).
age in years relative to arbitrary origin.
cigarettes packs of cigarettes smoked per day.
drug recreational drug use (1=yes, 0=no).
partners number of partners relative to arbitrary origin.
depression CES-D (a depression scale).
id subject identification number.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-64 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Fitting Generalized Linear Mixed Models with an Ordinal


Response

Example: Fit an ordinal logistic model to the CD4+ cell count data in long.cd4cat. Specify a random
intercept and time with an unstructured covariance structure. Specify the ODDSRATIO
option in the MODEL statement and create customized odds ratios specifying a reference
value of 0 for time, cigarettes, drug, partners, and depression. Use Newton-Raphson with
ridging, create an odds ratio plot displaying the statistics, and test whether the G-side random
effects are significant.
/* long03d04.sas */
proc glimmix data=long.cd4cat method=laplace plots=oddsratio(stats);
model cd4cat = time age cigarettes drug partners depression
time*age time*depression
time*partners time*drug time*cigarettes time*time
time*time*time
/ dist=multinomial link=cumlogit solution ddfm=bw
or(at time cigarettes drug partners depression =
0 0 0 0 0);
random intercept time / subject=id type=un;
nloptions tech=nrridg;
covtest "H0: No random effects" zerog;
title 'Ordinal Model of Aids Data';
run;
Selected MODEL statement option:
LINK= specifies the link function in the generalized linear mixed model.
DDFM The BW|BETWITHIN option divides the residual degrees of freedom into between-subject
and within-subject portions. It then determines whether a fixed effect changes within any
subject. If so, it assigns within-subject degrees of freedom to the effect. Otherwise, it
assigns the between-subject degrees of freedom to the effect. If the analysis is not processed
by subjects, the DDFM=BW option has no effect.
One exception to the preceding method is the case where you model only R-side covariation with an
unstructured covariance matrix (TYPE=UN). However, only G-side effects can be modeled with the
multinomial distribution. The cumulative logit link function is appropriate only for multinomial
distributions.
Ordinal Model of Aids Data

The GLIMMIX Procedure

Model Information

Data Set LONG.CD4CAT


Response Variable cd4cat
Response Distribution Multinomial (ordered)
Link Function Cumulative Logit
Variance Function Default
Variance Matrix Blocked By id

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-65

Estimation Technique Maximum Likelihood


Likelihood Approximation Laplace
Degrees of Freedom Method Between-Within

Number of Observations Read 2376


Number of Observations Used 2376

The Kenward Roger degrees of freedom adjustment is not available for either of the maximum likelihood
estimation techniques.
Response Profile

Ordered Total
Value cd4cat Frequency

1 1 182
2 2 741
3 3 736
4 4 717

The GLIMMIX procedure is modeling the probabilities of levels of


cd4cat having lower Ordered Values in the Response Profile table.

You can reverse the order of the response categories with the DESC option in the MODEL statement.
Dimensions

G-side Cov. Parameters 3


Columns in X 16
Columns in Z per Subject 2
Subjects (Blocks in V) 369
Max Obs per Subject 12

Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 19
Lower Boundaries 2
Upper Boundaries 0
Fixed Effects Not Profiled
Starting From GLM estimates

Iteration History

Objective Max
Iteration Restarts Evaluations Function Change Gradient

0 0 24 4919.4503999 . 2955.587
1 0 22 4652.6684639 266.78193594 525.7503
2 0 22 4600.9405459 51.72791798 115.8175
3 0 22 4595.4042136 5.53633239 13.98589
4 0 22 4595.0664823 0.33773125 2.063348
5 0 22 4595.0623412 0.00414105 0.051142
6 0 22 4595.0623404 0.00000090 0.000035

Convergence criterion (GCONV=1E-8) satisfied.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-66 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Fit Statistics

-2 Log Likelihood 4595.06


AIC (smaller is better) 4633.06
AICC (smaller is better) 4633.38
BIC (smaller is better) 4707.37
CAIC (smaller is better) 4726.37
HQIC (smaller is better) 4662.58

Fit Statistics for Conditional


Distribution

-2 log L(cd4cat | r. effects) 3296.75

Fit Statistics are presented because the true likelihood, as opposed to pseudo likelihood, is computed. The
fit statistics for conditional distribution is not useful in the comparison of marginal models with different
fixed effects.
Covariance Parameter Estimates

Cov Standard
Parm Subject Estimate Error

UN(1,1) id 2.9023 0.3755


UN(2,1) id 0.3939 0.1122
UN(2,2) id 0.4361 0.08270

The estimated variance of the subject-specific intercepts is 2.9023, while the estimated variance
of the subject-specific slopes for time is 0.4361.
Solutions for Fixed Effects

Standard
Effect cd4cat Estimate Error DF t Value Pr > |t|

Intercept 1 -5.3113 0.2643 365 -20.09 <.0001


Intercept 2 -1.0839 0.1990 365 -5.45 <.0001
Intercept 3 1.5393 0.1981 365 7.77 <.0001
time 1.0521 0.1018 1995 10.34 <.0001
age -0.00793 0.01436 365 -0.55 0.5812
cigarettes -0.3486 0.06064 1995 -5.75 <.0001
drug -0.4616 0.1757 1995 -2.63 0.0087
partners -0.02847 0.02012 1995 -1.42 0.1572
depression 0.02362 0.007548 1995 3.13 0.0018
time*age 0.009228 0.006691 1995 1.38 0.1680
time*depression 0.001744 0.003733 1995 0.47 0.6405
time*partners -0.00344 0.01056 1995 -0.33 0.7445
time*drug -0.04463 0.08600 1995 -0.52 0.6039
time*cigarettes 0.06200 0.03060 1995 2.03 0.0429
time*time 0.1680 0.02568 1995 6.54 <.0001
time*time*time -0.04387 0.006282 1995 -6.98 <.0001

Notice that there are three intercepts for the model corresponding to the three cumulative logits. The first
one compares the log of the probability of CD4+ counts 300 or less to the probability of CD4+ cell counts
of 301 or higher. The second one compares the log of the probability of CD4+ counts 600 or less to the
probability of CD4+ cell counts of 601 or higher. Finally, the third one compares the log of the probability
of CD4+ counts of 900 or less to the probability of CD4+ cell counts of 901 or higher.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-67

The results show that the cubic effect of time, cigarettes, drug, and the time*cigarettes interaction are
all significant at the 0.05 significance level. The negative coefficient for cigarettes indicates that patients
who smoke have lower probabilities of having lower-ordered values for the response variable. In other
words, patients who smoke have higher CD4+ cell counts.
Odds Ratio Estimates

time age cigarettes drug partners depression _time _age _cigarettes _drug

1 2.636 0 0 0 0 0 2.636 0 0
0 3.636 0 0 0 0 0 2.636 0 0
0 2.636 1 0 0 0 0 2.636 0 0
0 2.636 0 1 0 0 0 2.636 0 0
0 2.636 0 0 1 0 0 2.636 0 0
0 2.636 0 0 0 1 0 2.636 0 0

time age cigarettes drug partners depression _time _age _cigarettes _partners

1 2.636 0 0 0 0 0 2.636 0 0
0 3.636 0 0 0 0 0 2.636 0 0
0 2.636 1 0 0 0 0 2.636 0 0
0 2.636 0 1 0 0 0 2.636 0 0
0 2.636 0 0 1 0 0 2.636 0 0
0 2.636 0 0 0 1 0 2.636 0 0

time age cigarettes drug partners depression _time _age _cigarettes _depression

1 2.636 0 0 0 0 0 2.636 0 0
0 3.636 0 0 0 0 0 2.636 0 0
0 2.636 1 0 0 0 0 2.636 0 0
0 2.636 0 1 0 0 0 2.636 0 0
0 2.636 0 0 1 0 0 2.636 0 0
0 2.636 0 0 0 1 0 2.636 0 0

time age cigarettes drug partners depression _time _age _cigarettes Estimate

1 2.636 0 0 0 0 0 2.636 0 3.322


0 3.636 0 0 0 0 0 2.636 0 0.992
0 2.636 1 0 0 0 0 2.636 0 0.706
0 2.636 0 1 0 0 0 2.636 0 0.630
0 2.636 0 0 1 0 0 2.636 0 0.972
0 2.636 0 0 0 1 0 2.636 0 1.024

time age cigarettes drug partners depression _time _age _cigarettes DF

1 2.636 0 0 0 0 0 2.636 0 1995


0 3.636 0 0 0 0 0 2.636 0 365
0 2.636 1 0 0 0 0 2.636 0 1995
0 2.636 0 1 0 0 0 2.636 0 1995
0 2.636 0 0 1 0 0 2.636 0 1995
0 2.636 0 0 0 1 0 2.636 0 1995

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-68 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

95%
Confidence
time age cigarettes drug partners depression _time _age _cigarettes Limits

1 2.636 0 0 0 0 0 2.636 0 2.710


0 3.636 0 0 0 0 0 2.636 0 0.964
0 2.636 1 0 0 0 0 2.636 0 0.627
0 2.636 0 1 0 0 0 2.636 0 0.447
0 2.636 0 0 1 0 0 2.636 0 0.934
0 2.636 0 0 0 1 0 2.636 0 1.009

95%
Confidence
time age cigarettes drug partners depression _time _age _cigarettes Limits

1 2.636 0 0 0 0 0 2.636 0 4.071


0 3.636 0 0 0 0 0 2.636 0 1.021
0 2.636 1 0 0 0 0 2.636 0 0.795
0 2.636 0 1 0 0 0 2.636 0 0.890
0 2.636 0 0 1 0 0 2.636 0 1.011
0 2.636 0 0 0 1 0 2.636 0 1.039

Effects of continuous variables are assessed as units offsets from


the reference value. The UNIT suboption modifies the offsets.

The Odds Ratio Estimates table shows that the odds ratio for a one-unit increase in time is 3.322 with
a 95% confidence interval of 2.710 to 4.071. Note that the odds ratio takes into account the higher-order
terms that involve time. The odds ratio for a one-unit increase in age (3.636 in the numerator and 2.636
in the denominator) is 0.992. The odds ratio for a one-unit increase in cigarettes is 0.706 and for
a one-unit increase in drug is 0.630. The odds ratio for a one-unit increase in partners is 0.972 and for
a one-unit increase in depression is 1.024. These are usually easier to read from the odds ratio plot.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-69

The odds ratio plot displays the odds ratios along with the confidence limits.
Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

time 1 1995 106.84 <.0001


age 1 365 0.30 0.5812
cigarettes 1 1995 33.05 <.0001
drug 1 1995 6.90 0.0087
partners 1 1995 2.00 0.1572
depression 1 1995 9.79 0.0018
time*age 1 1995 1.90 0.1680
time*depression 1 1995 0.22 0.6405
time*partners 1 1995 0.11 0.7445
time*drug 1 1995 0.27 0.6039
time*cigarettes 1 1995 4.10 0.0429
time*time 1 1995 42.79 <.0001
time*time*time 1 1995 48.77 <.0001

The results show the cubic and quadratic effects of time, the time by cigarettes interaction, cigarettes,
drug, and depression are significant at the 0.05 significance level.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-70 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Tests of Covariance Parameters


Based on the Likelihood

Label DF -2 Log Like ChiSq Pr > ChiSq Note

H0: No random effects 3 5406.08 811.02 <.0001 --

--: Standard test with unadjusted p-values.

The likelihood ratio chi-square test indicates that the no random effects model is rejected. The model with
random effects fits your data better than the model without random effects.

 The note in the table indicates that this test of covariance parameters based on the likelihood
is a standard test with unadjusted p-values.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-71

Regression Splines

• A regression spline consists of piecewise polynomial segments joined at


knots (with varying continuity and smoothness constraints).
• The number and degrees of the polynomial segments and the number and
position of the knots will vary in different situations.
• The simplest spline function is a linear spline function.

61
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC GLIMMIX has the functionality to include spline functions in the model. A spline function
is a piecewise polynomial function where the individual polynomials have the same degree and connect
smoothly at join points whose abscissa values, referred to as knots, are pre-specified. You can use spline
functions to fit curves to a wide variety of data.

Linear Spline Function with Three Knots


and a Truncated Power Basis
f ( X )  0  1 X  2 ( X  k1 )  3 ( X  k2 )  4 ( X  k3 )

where the truncated power function is defined as

( x  ki )d if x>ki
( x  ki )d 
0 if x≤ki

where d is the degree of the spline transformation and i is the knot number.

62
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-72 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

The name “truncated power function” is derived from the fact that these functions are shifted power
functions that are truncated to zero to the left of the knot. These functions are piecewise polynomial
functions whose function values and derivatives of all orders up to d-1are zero at the defining knot (k i ).
Hence, these functions are splines of degree d. The final model consists of d+1 polynomial terms
and the truncated power functions.

Linear Spline Function with Three Knots


and a Truncated Power Basis
f ( X )  0  1 X when X  k1

f ( X )  0  1 X  2 ( X  k1 ) when k1  X  k2

f ( X )  0  1 X  2 ( X  k1 )  3 ( X  k2 ) when k2  X  k3

f ( X )  0  1 X  2 ( X  k1 )  3 ( X  k2 )  4 ( X  k3 )

when X  k3
63
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The main advantage of the truncated power function basis is the simplicity of its construction and the ease
of interpreting the parameters in a model that corresponds to these basis functions.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-73

Linear Spline Function with Three Knots


and a Truncated Power Basis

k2
k3

f(X)
k1

X
64
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

A spline of degree 0 is a step function with steps located at the knots (k 1, k 2, and k 3). A spline of degree 1
is a piecewise linear function where the lines connect at the knots. A spline of degree 2 is a piecewise
quadratic curve whose values and slopes coincide at the knots. A spline of degree 3 is a piecewise cubic
curve whose values, slopes, and curvature coincide at the knots. Visually, a cubic spline is a smooth
curve, and it is the most commonly used spline when a smooth fit is desired. When no knots are used,
splines of degree d are simply polynomials of degree d.

The EFFECT Statement


General form of the EFFECT statement:
EFFECT effect-name = effect-type (var-list < / effect-options >) ;

• Specifies special constructed effects


- spline effects
- collection effects
- multimember effects
- polynomial effects
• Defines sets of columns for X and Z matrices that are different from classical
effects
• Allows multiple EFFECT statements

65
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-74 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

The EFFECT statement enables you to construct special collections of columns for or matrices
in your model. These collections are referred to as constructed effects to distinguish them from the usual
model effects formed from continuous or classification variables.
In the EFFECT statement, the name of the effect is specified after the EFFECT keyword. This name can
appear in only one EFFECT statement and cannot be the name of a variable in the input data set.
The effect type is specified after an equal sign, followed by a list of variables us ed in constructing
the effect within parentheses. Effect-type specific options can be specified after a slash (/) following
the variable list.
The following effect-types are available in the EFFECT statement:
COLLECTION is a collection effect defining one or more variables as a single effect with
multiple degrees of freedom. The variables in a collection are considered
as a unit for estimation and inference.
MULTIMEMBER | MM is a multimember classification effect whose levels are determined by one
or more variables that appear in a CLASS statement.

POLYNOMIAL | POLY is a multivariate polynomial effect in the specified numeric variables.


SPLINE is a regression spline effect whose columns are univariate spline expansions
of one or more variables. A spline expansion replaces the original variable with
an expanded or larger set of new variables.

The EFFECT Statement

• Constructed effects
proc glimmix;
class A B;
effect spl=spline(x);
model y = A B spl A*spl;
run;

• The EFFECT statement requests construction of columns in a design matrix for


spl as a B-spline in x, where x is a variable in the data set.
• The constructed effect spl is specified the same way in the MODEL statement as
the other effects.

66
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

A constructed effect is assigned through the EFFECT statement. In the slide above, the EFFECT
statement defines a constructed effect named spl. The columns of spl are formed from the data set
variable x as a cubic B-spline basis with three equally spaced interior knots (which is the default).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-75

Each constructed effect corresponds to a collection of columns that are referred to by using the name that
you supply. You can specify multiple EFFECT statements, and all EFFECT statements must precede
the MODEL statement.

 For more information about the B-spline basis, see the PROC GLIMMIX documentation.

Effect Type

• SPLINE effect type

EFFECT spl=SPLINE(x);

- This constructs spline effects from B-spline or truncated power function bases.
- Options give control over knot construction, number of knots, spline basis, and so on.
- This enables you to fit a spline model for certain terms while enjoying parametric model
capabilities.

67
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

There are many spline options in the EFFECT statement to give you control over the basis function
(B spline (the default) or the truncated power function), the degree of the spline function (default is 3),
the placement of the knots (default is EQUAL), and the number of knots (default is 3).

One of the advantages of using the constructed spline effects in the model is that you are able to model
some terms through a spline function, which is typically provided in nonparametric regression
procedures, while performing some tasks that are available only to parametric models, such as having
a mathematical form of the fitted model, performing comparisons involving the spline terms, and so on.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-76 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Fitting Generalized Linear Mixed Models with Splines

Example: Fit an ordinal logistic model to the CD4+ cell count data in long.cd4cat. Specify a random
intercept and time with an unstructured covariance structure. Create a constructed spline effect
of time specifying a truncated power function basis for the spline expansion excluding the
intercept column. Specify that the internal knots be placed at 4 equally spaced percentiles of
time and specify the degree of the spline transformation to be 2. Use Newton-Raphson with
ridging, the Laplace likelihood approximation, and the between-within degrees of freedom
adjustment.
/* long03d05.sas */
proc glimmix data=long.cd4cat method=laplace;
effect spl = spline(time / details basis=tpf(noint)
knotmethod=percentiles(4) degree=2);
model cd4cat = spl age cigarettes drug partners depression
time*age time*depression
time*partners time*drug time*cigarettes
/ dist=multinomial link=cumlogit solution ddfm=bw;
random intercept time / subject=id type=un;
nloptions tech=nrridg;
title 'Ordinal Model of Aids Data with a Spline for Time';
run;

Selected EFFECT statement options:

DETAILS requests tables that show the knot locations and the knots associated with each
spline basis function.

BASIS=TPF specifies a truncated power function basis for the spline expansion. For splines
of degree d defined with n knots for a variable X, this basis consists of an
intercept, polynomials X, X2 ,X3 ,…,Xd and one truncated power function for each
of the n knots. The option NOINT excludes the intercept column.

KNOTMETHOD= specifies how to construct the knots for spline effects. The PERCENTILES(4)
method requests that internal knots be placed at 4 equally spaced percentiles
of the variable or variables named in the EFFECT statement.

DEGREE= specifies the degree of the spline transformation. The degree must be a
nonnegative integer. The degree is typically a small integer, such as 0, 1, 2, or 3.
The default is DEGREE=3.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-77

Ordinal Model of Aids Data with a Spline for Time

Model Information

Data Set LONG.CD4CAT


Response Variable cd4cat
Response Distribution Multinomial (ordered)
Link Function Cumulative Logit
Variance Function Default
Variance Matrix Blocked By id
Estimation Technique Maximum Likelihood
Likelihood Approximation Laplace
Degrees of Freedom Method Between-Within

Knots for Spline Effect spl

Knot
Number time

1 -0.75839
2 0.24914
3 1.22656
4 2.55715

The four knot values are shown, which are placed at 4 equally spaced percentiles of the variable time.
In most situations, there is no subject matter knowledge on where to place the knots. However, where
the knots are placed is usually not that important to the model fit. The number of knots is usually more
important. One criterion to use in deciding the number of knots is the use of the AIC goodness-of-fit
statistic.
Basis Details for Spline Effect spl

Column Power Break Knot

1 1
2 2
3 2 -0.75839
4 2 0.24914
5 2 1.22656
6 2 2.55715

The model has six terms for the spline: a linear term, a quadratic term, and 4 truncated power basis
functions placed at the knot values.
Number of Observations Read 2376
Number of Observations Used 2376

Response Profile

Ordered Total
Value cd4cat Frequency

1 1 182
2 2 741
3 3 736
4 4 717

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-78 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

The GLIMMIX procedure is modeling the probabilities of levels of


cd4cat having lower Ordered Values in the Response Profile table.

Dimensions

G-side Cov. Parameters 3


Columns in X 19
Columns in Z per Subject 2
Subjects (Blocks in V) 369
Max Obs per Subject 12

The constructed spline effect increased the number of columns in X compared to the last ordinal model.
Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 22
Lower Boundaries 2
Upper Boundaries 0
Fixed Effects Not Profiled
Starting From GLM estimates

Iteration History

Objective Max
Iteration Restarts Evaluations Function Change Gradient

0 0 27 4879.3288001 . 868.2086
1 0 25 4600.9423456 278.38645445 202.6837
2 0 25 4547.0162638 53.92608183 49.48996
3 0 25 4541.7053663 5.31089750 11.9951
4 0 25 4541.4282046 0.27716173 1.293518
5 0 25 4541.4254663 0.00273824 0.015775
6 0 25 4541.425466 0.00000038 8.462E-6

Convergence criterion (GCONV=1E-8) satisfied.

Fit Statistics

-2 Log Likelihood 4541.43


AIC (smaller is better) 4585.43
AICC (smaller is better) 4585.86
BIC (smaller is better) 4671.46
CAIC (smaller is better) 4693.46
HQIC (smaller is better) 4619.60

The AIC is much lower compared to the last ordinal model (4585.43 versus 4633.06).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-79

Fit Statistics for Conditional


Distribution

-2 log L(cd4cat | r. effects) 3224.82

Covariance Parameter Estimates

Cov Standard
Parm Subject Estimate Error

UN(1,1) id 3.1126 0.3989


UN(2,1) id 0.3712 0.1159
UN(2,2) id 0.4383 0.08264

The estimated variances and covariances of the random effects are similar but not identical to the last
ordinal model.
Solutions for Fixed Effects

Standard
Effect cd4cat spl Estimate Error DF t Value Pr > |t|

Intercept 1 -6.5264 0.4142 365 -15.76 <.0001


Intercept 2 -2.2980 0.3678 365 -6.25 <.0001
Intercept 3 0.4161 0.3591 365 1.16 0.2472
spl 1 -0.06992 0.5300 1517 -0.13 0.8951
spl 2 0.04774 0.1747 1517 0.27 0.7847
spl 3 1.4939 0.4023 1517 3.71 0.0002
spl 4 -2.8854 0.4828 1517 -5.98 <.0001
spl 5 1.5830 0.3961 1517 4.00 <.0001
spl 6 -0.3349 0.2632 1517 -1.27 0.2033
age -0.00936 0.01480 365 -0.63 0.5276
cigarettes -0.3494 0.06208 1992 -5.63 <.0001
drug -0.4935 0.1790 1992 -2.76 0.0059
partners -0.01078 0.02055 1992 -0.52 0.6000
depression 0.02290 0.007709 1992 2.97 0.0030
age*time 0.01016 0.006723 1992 1.51 0.1308
depression*time 0.001313 0.003767 1992 0.35 0.7275
partners*time -0.01174 0.01068 1992 -1.10 0.2718
drug*time -0.05325 0.08663 1992 -0.61 0.5389
cigarettes*time 0.06077 0.03085 1992 1.97 0.0490

Notice there are six spline parameters in the model. They correspond to the linear term for time, the
quadratic term for time, and the four truncated power basis functions for time.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-80 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

spl 6 1517 29.67 <.0001


age 1 365 0.40 0.5276
cigarettes 1 1992 31.68 <.0001
drug 1 1992 7.60 0.0059
partners 1 1992 0.28 0.6000
depression 1 1992 8.82 0.0030
age*time 1 1992 2.29 0.1308
depression*time 1 1992 0.12 0.7275
partners*time 1 1992 1.21 0.2718
drug*time 1 1992 0.38 0.5389
cigarettes*time 1 1992 3.88 0.0490

The constructed spline effect is highly significant.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.2 Applications Using the GLIMMIX Procedure 3-81

Exercises

3. Fitting Generalized Linear Mixed Models with Splines


a. Fit a generalized linear mixed model to the long.wheeze data set but create a spline for age.
Specify a truncated power function basis for the spline expansion and use the NOPOWERS
option to exclude the intercept and polynomial columns. Use the knot method of list and list
the knot values as 8 and 9, specify a degree of spline expansion of 3, and request a table that
shows the knot locations and the knots associated with each spline basis function. Use R-side
random effects with an unstructured covariance structure and use an optimization technique
of Newton-Raphson with ridging.
1) Interpret the spline coefficients for age.
2) Why are AIC and BIC model fit statistics not produced?

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-82 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

3.06 Multiple Choice Poll

Why are AIC and BIC model fit statistics not produced in the exercise
problem?
a. The use of the RANDOM statement always suppresses the AIC and BIC
statistics.
b. Because the response variable has a binomial distribution, the AIC and
BIC statistics are always suppressed.
c. Because the linearization method was used, the AIC and BIC statistics are
not produced because there is no true likelihood.
d. PROC GLIMMIX does not support AIC and BIC model fit statistics.

70
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.3 GEE Regression Models

Objectives

• Explain the concepts of generalized estimating equations (GEE) models.


• Show the available correlation structures in the GENMOD procedure.
• Fit a longitudinal data model in PROC GENMOD.

74
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-83

GEE Regression Models

• GEE models are useful in analyzing data that arise from a longitudinal or
clustered design
• GEE models are marginal models that model the effect of the predictor
variables on the population-averaged response
• GEE models are recommended when the inferences from the regression
equation are the principal interest and the correlation is regarded as a
nuisance.

75
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Generalized estimating equations (GEE) were developed to accommodate correlated observations within
subjects. An estimating equation is simply the equation that you solve to calculate the parameter
estimates. The extra term generalized distinguishes the GEE as the estimating equations that
accommodate the correlation structure of the repeated measurements.
GEE are marginal models where the marginal expectation (average response for observations sharing the
same covariates) is modeled as a function of the predictor variables. The parameters in marginal models
can be interpreted as the influence of the covariates on the population-averaged response. These models
are appropriate when the scientific objectives are to characterize and contrast populations of subjects.

A useful feature of the GEE is that the parameter estimates along with the covariance matrix are
consistently estimated (the standard errors are consistent estimates of the true standard errors) even
if the correlation structure within subject is not known. Therefore, the variances along with the inferences
regarding the parameter estimates are asymptotically correct (Zeger and Liang 1986).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-84 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

GEE Regression Models

GEE models extend generalized linear models by allowing the following:


• the correlation of outcomes within an experimental unit to be estimated
and taken into account when estimating the regression coefficients and
their standard errors
• the calculation of the robust standard errors of the regression coefficients

76
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Provided that the mean model is correctly specified and the measurements between subjects are
independent, robust standard errors ensure consistent inferences from a GEE regression model. This
is true even if the chosen correlation structure is incorrect or if the strength of the correlation between
measurements varies from subject to subject. Although model-based standard errors are also produced,
they are consistent only if the specified correlation structure is correct. Consequently, the robust standard
errors (which are usually larger) are usually preferred especially when the number of subjects is large.
The desired number of subjects depends on the number of predictor variables in the model. If you have
less than 5 predictor variables, approximately 25 subjects might be enough to use the robust standard
errors. If you have 5 to 12 predictor variables, you would need at least 100 subjects. If you want
to be reasonably confident, then you would need around 200 subjects (Stokes, Davis, Koch 2000).
However, when the number of subjects is very small (less than 20), the model-based standard errors might
have better properties even if the specified correlation structure is wrong (Prentice 1988). This is because
the robust standard errors are asymptotically unbiased, but could be highly biased when the number
of subjects is small.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-85

Robust standard errors are derived by the sandwich estimator of the covariance matrix of the regression
coefficients. In general, the sandwich estimator uses a matrix with the diagonal elements equal to the
individual squared residuals to estimate the common variance (the square of any residual is an estimate
of the variance at that predictor variable value). This works because the average of a lot of poor
estimators (individual squared residuals) can be a good estimator of the common variance. In fact,
Liang and Zeger (1986) showed that the robust standard errors are robust to departures of the working
correlation matrix from the true correlation structure.

Variance-Covariance Matrix for GEE Models


Subject Time X Y
1 1 4 1
1
1
2
2
3
1
4
4
2
0
0
0
V1 0
2 2 2 0 V2
2 3 2 0
3 1 6 1
3 2 6 0 V3
3 3 6 0
4
4
4
1
2
3
8
8
8
1
1
1
0 V4

77
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In GEE regression models, the number of observations is not the number of subjects but rather
the number of measurements taken on all the subjects (similar to the layout for PROC MIXED).
The variance-covariance matrix is now a block-diagonal matrix in which the observations within each
block (the block corresponds to a subject) are assumed to be correlated and the observations outside
of the blocks are assumed to be independent. In other words, the subjects are still assumed to be
independent of each other and the measurements within each subject are assumed to be correlated.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-86 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Quasi-Likelihood Estimation

• Maximum likelihood estimation requires the specification of the


distribution of the response variable.
• For repeated discrete outcomes, it might be difficult to specify the
distribution.
• GEE regression models use the method of quasi-likelihood estimation.
• This method does not require the specification of the distribution of the
response variable.
• No log-likelihood is calculated for the GEE model.

78
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Generalized linear models use the likelihood function in statistical inference. However, the distribution
of the response variable must be specified. For repeated measures that are discrete outcomes, it might
be difficult to specify the appropriate theoretical probability distribution. Whereas generalized linear
mixed models use pseudo-likelihood or maximum likelihood methods of estimation, GEE regression
models use the quasi-likelihood method of estimation. This estimation method requires only that you
specify the relationships between the response mean and covariates and between the response mean
and variance. Quasi-likelihood estimation has many of the advantages of maximum likelihood estimation
without requiring full distributional assumptions. This is why the GEE approach is applicable to several
types of response variables (Zeger and Liang 1986).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-87

QIC Statistic

• The quasi-likelihood information criterion (QIC) is a modification of the


Akaike information criterion (AIC) to apply to models fit by GEEs.
• The likelihood is replaced by the quasi-likelihood and the penalty term takes
a modified form.
• The QIC can be used for choosing the working correlation matrix in the
estimating equation and for selecting predictor variables.

79
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The QIC statistic, which is based on the quasi-likelihood, is computed and can be used for model
assessment for GEE models. The QIC statistic was developed by Pan (2001) as a modification of the AIC
statistic. PROC GENMOD also computes an approximation to QIC defined by Pan (2001) called QICu.
QIC is appropriate for selecting regression models and working correlations, whereas QICu is appropriate
only for selecting regression models.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-88 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

GEE Fitting Algorithm

1. Fit a generalized linear model assuming independence.


2. Compute the parameter estimates of the working correlation matrix
based on the Pearson standardized residuals, the assumed structure of
the correlation matrix, and the parameter estimates from the mean
model.
3. Refit the regression model using an algorithm that incorporates the
parameters from the working correlation matrix.
4. Keep alternating between steps 2 and 3 until model convergence is
achieved.

83
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The process of fitting a GEE model can be summarized in a series of steps. First, a regression model
is fitted, which assumes independence and the Pearson standardized residuals are computed. These
residuals are then used to estimate the parameters of the correlation matrix, which characterizes
the correlation of the observations within subject. The correlation parameters are then incorporated into
the GEE estimating equations, which generates new values for the regression coefficients and new
Pearson residuals. These residuals are then used to re-estimate the correlation parameters. The cyclical
process continues until the parameter estimates stabilize and model convergence is achieved.

3.07 Multiple Choice Poll

Which one of the following statements is true regarding GEE regression


models?
a. GEE models can estimate subject-specific regression coefficients.
b. The robust standard errors are useful when the number of subjects is
small.
c. The quasi-likelihood estimation method does not require the
specification of the distribution of the response variable.
d. The likelihood-ratio test can be used to test the significance of predictor
variables.

84
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-89

GENMOD Procedure

General form of the GENMOD procedure:

P R OC GENMOD DATA=SAS-data-set <options>;


C L ASS variables </ options>;
MOD EL response=predictors </ options>;
R EP EATED SUBJECT=subject-effect </ options>;
ESTIMATE 'label' effect values … </ options>;
ASSESS VAR= (effect)|L INK </ options>;
OUTPUT <OUT=SAS-data-set>
<keyword=name … keyword=name>;
R UN;

86
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

PROC GENMOD can be used to fit GEE models to longitudinal data. The layout of the data is similar
to PROC MIXED where the number of observations is equal to the number of measurements taken
on all the subjects. The variance-covariance matrix is a block diagonal matrix in which the observations
within each block are assumed to be correlated and the observations outside of the blocks are assumed
to be independent.
Selected GENMOD procedure statements:
CLASS specifies the classification variables to be used in the analysis. If the CLASS statement is
used, it must appear before the MODEL statement.
MODEL specifies the response variable and the predictor variables. You can specify the response
in the form of a single variable or in the form of a ratio of two variables called
events/trials. This form is applicable only to summarized binomial response data.

REPEATED invokes the GEE method, specifies the correlation structure, and controls the displayed
output from the longitudinal model.

ESTIMATE provides a means for obtaining a test for a specified hypothesis concerning the model
parameters. It can also be used to produce the odds ratio estimate along with the 95%
confidence limits.
ASSESS computes and plots, using ODS Graphics, model-checking statistics based on aggregates
of residuals.
OUTPUT creates a new SAS data set that contains all the variables in the input data set and,
optionally, the estimated linear predictors and their standard error estimates, the weights
for the Hessian matrix, predicted values of the mean, confidence limits for predicted
values, and residuals.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-90 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Selected CLASS statement options:


PARAM= specifies the parameterization method for the classification variable or variables. Design
matrix columns are created from CLASS variables according to the following coding
schemes:
EFFECT species effect coding.
GLM specifies less than full rank, reference-cell coding. This coding is the default.
ORDINAL specifies the cumulative parameterization for an ordinal CLASS variable.
POLY specifies polynomial coding.
REF specifies reference cell coding.
ORTHEFFECT orthogonalizes PARAM=EFFECT.
ORTHORDINAL orthogonalizes PARAM=ORDINAL.
ORTHPOLY orthogonalizes PARAM=POLY.
ORTHREF orthogonalizes PARAM=REF.
REF= specifies the reference cell for PARAM=EFFECT, PARAM=REF, and their
orthogonalizations.

Selected REPEATED statement options:


SUBJECT= identifies subjects in the input data set. This is a required option and the variables used
in defining the subjects must be listed in the CLASS statement. The input data set does
not need to be sorted by subject.
TYPE= specifies the structure of the working correlation matrix used to model the correlation
of responses from subjects. The default working correlation type is the independent
correlation structure.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-91

PROC GLIMMIX versus PROC GENMOD

PROC GLIMMIX
• can accommodate random effects
• fits unit-specific models and population-average models
• provides (bias-adjusted) sandwich estimators of the covariance matrix of
the fixed effect that are unbiased even when the number of clusters is small
PROC GENMOD
• cannot accommodate random effects
• fits only population-average models
• provides sandwich estimators that are unbiased only when the number of
clusters is large
87
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The robust standard errors computed in PROC GLIMMIX have advantages over the robust standard
errors computed in PROC GENMOD because the classical sandwich estimator, as implemented in GEEs
in PROC GENMOD, tends to underestimate the variance of the fixed effects, particularly if the number
of subjects (or clusters) is small.
The subtle difference between the GEE-like estimates in PROC GLIMMIX and the GEE estimates
in PROC GENMOD is that the parameter estimates are obtained using the moment-based method
in PROC GENMOD, whereas the parameter estimates are obtained using the pseudo-likelihood method
in PROC GLIMMIX. Both approaches (PROC GENMOD and the EMPIRICAL option in PROC
GLIMMIX) assume that the missing data is missing completely at random (MCAR).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-92 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Effect Coding
Design
Variables
Variable Value Label 1 2
Income 1 Low 1 0
2 Medium 0 1
3 High -1 -1

88
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

To obtain the odds ratio (the odds ratio compares the odds of outcome in one group to the odds of
outcome in another group) for a one-unit increase in the predictor variable, an ESTIMATE statement
along with the EXP option has to be used. However, you need to be able to define the coefficients
in the ESTIMATE statement to obtain the odds ratio. For odds ratios involving class variables, there are
several coding schemes available for the design variables created in the CLASS statement.
For effect coding (also called deviation from the mean coding), the number of design variables created is
the number of levels of the CLASS variable minus 1. For example, if the variable income has three
levels, only two design variables were created. By default, all the design variables have a value
of –1 for the last level of the CLASS variable. Parameter estimates of the CLASS main effects using this
coding scheme estimate the difference between the effect of each level and the average effect over all
levels.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-93

Reference Cell Coding


Design
Variables
Variable Value Label 1 2
Income 1 Low 1 0
2 Medium 0 1
3 High 0 0

89
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

For reference cell coding, the number of design variables created is the number of levels of the CLASS
variable minus 1 and the parameter estimates of the CLASS main effects estimate the difference between
the effect of each level and the last level. For example, the effect for the level low would estimate
the difference between low and high. You can choose the reference level with the REF= option.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-94 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

GLM Coding
Design
Variables
Variable Value Label 1 2 3
Income 1 Low 1 0 0
2 Medium 0 1 0
3 High 0 0 1

90
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

GLM coding uses less than full rank parameterization for variables in a CLASS statement.
This parameterization constructs one design variable for each level of the predictor variable. Therefore,
income would have three design variables where the first design variable is 1 if low, 0 otherwise, the
second design variable is 1 if medium, 0 otherwise, and the third design variable is 1 if high, 0 otherwise.

 The rank of a matrix is defined as the maximum number of linearly independent row vectors
in the matrix. If the model has a design matrix that is not full rank, there are an infinite number
of solutions for the parameter estimates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-95

Computing Coefficients to Estimate Odds Ratios Using


Reference Cell Coding

1. State the odds ratio to estimate.


Odds ratio for Low Income versus Medium Income

2. Write of the equations for the odds of Low Income


and the odds of Medium Income.

Odds Medium  e 0 1 *0 2 *1


95
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved . ...

To obtain coefficients for an odds ratio comparing low income to medium income for a logistic regression
model, first write out the equation for the odds for low income and the odds for medium income. For
reference cell coding, two coefficients are needed because there are two design variables.

Computing Coefficients to Estimate Odds Ratios Using


Reference Cell Coding

3. Compute the odds ratio in terms of the odds for Low Income versus the
odds for Medium Income.
Odds Low e 0 1 *12 *0
 0 1 *0 2 *1
Odds Medium e

 e( 0 1*12*0) ( 0 1*0 2*1)


 e( 1*1 2 *( 1))
4. Identify the coefficients for the effects.
101
Income 1 –1; C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved . ...

Compute the odds ratio in terms of the odds of the group in the numerator and the odds in the group
in the denominator. Solving the expression algebraically shows that the coefficients for the odds ratio
comparing low income versus medium income are 1 1.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-96 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

The ESTIMATE Statement

Estimate ‘Low vs. Medium’ Income 1 -1 / exp;


Estimate ‘Low vs. High’ Income 1 0 / exp;
Estimate ‘Medium vs. High’ Income 0 1 / exp;

102
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The ESTIMATE statement consists of the following components:


label identifies the estimate in the output. A label is required for every estimate specified
and it must be enclosed in quotation marks.
effect identifies an effect that appears in the MODEL statement. You do not need to include
all the effects that are included in the MODEL statement.
values identifies the coefficients associated with the effect. To correctly specify your estimate,
it is crucial to know the ordering of the parameters within each effect and the variable levels
associated with each parameter.
If an effect is not specified in the ESTIMATE statement, all of its coefficients are set to 0. If too many
values are specified for an effect, the extra ones are ignored. If too few values are specified, the remaining
ones are set to 0. The EXP option requests that the exponentiated contrast be computed, thus, providing
the odds ratio.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-97

3.08 Multiple Choice Poll

The model has time and the quadratic effect of time as predictor variables. To
estimate the odds ratio for a 3-unit increase in time (time 0 to 3), the
coefficients for the ESTIMATE statement would be which of the following?
a. time 3
b. time 3 time*time 9
c. time 3 time*time 3
d. time -3 time*time -9

103
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Working Correlation Matrix

• The user chooses one of the available working correlation matrices in PROC
GENMOD.
• It is recommended that you choose a working correlation matrix that
approximates the average dependence among repeated measurements
within subject.
• Choosing the correct structure might increase efficiency.

105
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

When fitting a GEE model in PROC GENMOD, you should decide what is a reasonable model for
the correlation between measurements within subject. PROC GENMOD offers several common structures
to use to model the working correlation matrix. The choice of the structure should be consistent with the
empirical correlations. Liang and Zeger (1986) showed that there could be important gains in efficiency
by correctly specifying the working correlation matrix. However, the loss of efficiency is inconsequential
when the number of clusters is large (Davis 2002).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-98 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Independent Correlation Structure

1 1.0
2 1.0
0
Time

3 1.0
4
0 1.0

106
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The independent correlation structure forces the off-diagonal correlations to be 0. Therefore, no working
correlation structure is estimated in this case. Under this constraint, the coefficients and model-based
standard errors (requested by the MODELSE option in the REPEATED statement) are the same as those
reported in the LOGISTIC procedure. However, PROC GENMOD, by default, computes robust standard
error estimates. These estimates take into account the correlations among the repeated measurements and
usually are different from the model-based standard errors assuming independence.
The independent correlation structure might be a good choice when you have a large number of subjects
with few measurements per subject. The correlation influence is often small enough to have little impact
on the regression coefficients, but the robust standard errors will give the correct inferences. This model
gives consistent estimates of the parameters and standard errors when the mean model is correctly
specified (Davis 2002).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-99

1-Dependent Correlation Structure

1.0 1 0 0
1.0 1 0
1.0 1

1.0

107
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In the 1-dependent correlation structure, measurements are correlated if they are one time point apart.
They are uncorrelated if they are two or more time points apart.

2-Dependent Correlation Structure

1.0 1 2 0
1.0 1 2

1.0 1

1.0

108
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-100 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

For the 2-dependent correlation structure, measurements are correlated if they are two or less time periods
apart. Measurements that are one time period apart have different correlations than measurements that are
two time periods apart.
These last two correlation structures are generally called m-dependent correlation structures. The m
represents how many time periods apart the measurements remain correlated. Therefore, a 5-dependent
correlation structure indicates that measurements are correlated if they are five or fewer time periods
apart. This correlation structure is similar to the banded Toeplitz structure in PROC MIXED.

The m-dependent correlation structure assumes equally spaced time points and the same time points
across subjects.

Exchangeable Correlation Structure

1.0   

1.0  

1.0 

1.0

109
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The exchangeable correlation structure, which is similar to the compound symmetry structure in PROC
MIXED, assumes that the correlations are equal across time points. Although this structure might not
be justified in longitudinal studies, it is often reasonable in situations where the repeated measurements
are not obtained over time (Allison 1999). For example, the exchangeable correlation structure might
be a good choice if the independent experimental units were classrooms and the responses obtained were
from each student in the classroom (Davis 2002).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-101

Autoregressive AR(1) Correlation Structure

1.0  2 3

1.0  2

1.0 

1.0

110
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

The first order autoregressive structure specifies that the correlations be raised to the power of the number
of time points the measurements are apart. For example, if the measurements are three time points apart,
the correlation is  3 . The AR(1) model might be a good choice in a longitudinal model where
measurements are taken repeatedly over time. One shortcoming is that the correlation decays very quickly
as the spacing between measurements increases (Davis 2002).

The AR(1) correlation structure assumes equally spaced time points and the same time points across
subjects.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-102 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Unstructured Correlation Structure

1.0 12 13 14

1.0  23  24
1.0  34
1.0

111
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Finally, the unstructured correlation structure is completely unspecified. Therefore, there are
t (t  1)
parameters to be estimated. The unstructured working correlation structure is useful only when
2
there are very few observation times. If there were many time points, you would probably want to impose
some structure to the correlation matrix by selecting one of the other correlation structures (Allison 1999).
Furthermore, when there are missing values or a varying number of observations per subject, a
nonpositive definite matrix might occur, which would stop the parameter estimation process (Stokes,
Davis, and Koch 2000).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-103

Choice of Working Correlation Structure

• The nature of the problem can suggest the choice of correlation structure.
• If the number of observations is small and there is an equal number of time
points per subject, unstructured is recommended.
• If repeated measurements are obtained over time, AR(1) or m-dependent is
recommended.
• If repeated measurements are not naturally ordered, exchangeable is
recommended.
• If the number of clusters is large and the number of measurements is small,
independent structure might suffice.

112
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

If you do not know which working correlation structure to choose, one recommendation is to compare
the parameter estimates and standard errors from several correlation structures. This might indicate
whether there is sensitivity to the misspecification of the correlation structure. PROC GENMOD also
enables you to choose a user-defined correlation matrix.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-104 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

What If Assumed Correlation Structure


Is Wrong?
Time Points

Subjects

113
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

If the estimation of the regression coefficients is the primary objective of your study and there are a large
number of clusters (approximately 200) and a small number of time points, then you should not spend
much time choosing a correlation structure. If the mean model is correctly specified, the GEE method for
the parameter estimates was designed to guarantee consistency of the parameter estimates under minimal
assumptions about the time dependence (Diggle, Heagerty, Liang, and Zeger 2002). Furthermore, the loss
of efficiency from an incorrect choice of the working correlation structure is inconsequential when the
number of subjects is large (Davis 2002).

If there are a small number of clusters, then you should spend time choosing a correlation structure. Both
the model and the correlation structure must be approximately correct to obtain valid inferences (Diggle,
Heagerty, Liang, and Zeger 2002). In this situation it is important to use the model-based standard errors
rather than the robust standard errors (Prentice 1988). Choosing the correct correlation structure will also
result in increased efficiency (Davis 2002).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-105

Missing Values for GEE Models

• Missing data mechanism must be missing completely at random (MCAR).


• Intermittent missing values are not as problematic as dropouts.
• Avoid dropouts by taking energetic steps to retain subjects in the study.
• Collect covariates that are useful for predicting missing values.
• Collect as much information as possible about the reasons for missing
values.
• Conduct a sensitivity analysis when you have informative dropouts.

114
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Missing values that occur intermixed with nonmissing values are called intermittent missing values.
If these missing values are missing completely at random (MCAR), then the consistency results
established by Liang and Zeger (1986) hold. A simple check of MCAR is to divide the subjects into two
groups: those with a complete set of measurements and those with missing measurements. If the MCAR
assumption holds, then both groups (with their measurements) should be random samples of the same
population of measurements. In other words, the probability of missing is independent of the observed
measurements and the measurements that would have been available had they not been missing. The t-
tests for location and more general tests of equality of distribution can be used to test the MCAR
assumption (Little 1995). Tests of MCAR for repeatedly measured categorical data were discussed by
Park and Davis (1993).
Some intermittent missing values can arise due to censoring rules. For example, values outside a stated
range might be simply unreliable because of the limitations of the measuring techniques in use (Diggle,
Heagerty, Liang, and Zeger 2002). Methods for handling censored data in correlation data structures are
addressed in Laird (1988) and Hughes (1999). Intermittent missing values can also be related to the
outcome. For example, a patient might miss an appointment because of an adverse reaction to the
treatment. The fact that the subject remains in the study means that the investigator should have the
opportunity to ascertain the reason for the missing appointment and take corrective action accordingly
(Diggle, Heagerty, Liang, and Zeger 2002, Little 1995).
If all the missing values occur after a certain time point for a subject, then the missing values are called
dropouts. These are a more significant problem compared to intermittent missing values because usually
the subject is withdrawn for reasons directly or indirectly connected to the outcome and are lost to follow-
up. If you treat the dropouts as MCAR when they are in fact informative dropouts, the parameter
estimates will be biased (Diggle and Kenward 1994).

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-106 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Diggle, Heagerty, Liang, and Zeger (2002) state that “An emerging consensus is that analysis of data with
potentially informative dropouts necessarily involves assumptions that are difficult, or even impossible,
to check from the observed data. This suggests that it would be unwise to rely on the precise conclusions
of an analysis based on a particular informative dropout model.” They recommend that a sensitivity
analysis be conducted on the informative dropout model. This provides some protection against the
possibility that conclusions reached from a random dropout model are critically dependent on the validity
of MCAR. Scharstein et al. (1999) provides a discussion on how such sensitivity analyses might
be conducted.

Problems with GEE Models

• The correct specification of marginal mean and variance is required.


• Missing data cannot depend on the observed or unobserved responses.
• A moderate to large number of independent subjects is required.

115
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Because the GEE method is semiparametric (not nonparametric), the mean model and variance function
should be correctly specified. Thus, the consistency results of the GEE models depend on the correct
specification of the model for the mean. Furthermore, robust standard errors should be used only with
a large number of subjects.

Park (1993) compared GEE estimators with normal-theory maximum likelihood estimators and reported
that GEE estimators were more sensitive to the occurrence of missing data.

Several studies have shown that the bias and efficiency of the GEE method can depend on the number
of subjects, number of repeated measurements, magnitudes of the correlations among repeated
measurements, and number and type of covariates. Lipsitz et al. (1991) reported that the parameter
estimates for a binary GEE model were biased slightly upward and the bias increased as the magnitude
of the correlation increased. Paik (1988) reported that as the number of covariates increases, the number
of subjects needs to increase for the point estimates and confidence intervals to perform satisfactorily
(with 4 repeated measurements and 4 covariates, he recommended a sample size greater than 50).

 One solution to the MCAR limitation is to use the MI procedure to impute the missing values.
PROC MI invokes the MAR assumption. Then fit the GEE model in PROC GENMOD on the
complete data.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-107

3.09 Multiple Choice Poll

When you have a large sample size, what condition is not necessary for the
GEE model to have consistent parameter estimates and standard errors?
a. Missing values are MCAR
b. Correct specification of the model for the mean
c. Correct specification of the variance function
d. Correct specification of the correlation structure

116
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-108 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Longitudinal Models Using GEE

Example: Fit a GEE model on the long.keratotomy data set specifying the unstructured correlation
structure, reference cell coding for gender with female as the reference cell, and request the
Type 3 score statistics, the final working correlation matrix, the initial maximum likelihood
parameter estimates table, and the model-based standard errors. Also compute the odds ratio
for a one-unit decrease in diameter, a one-unit increase in visit, a ten-unit increase in age,
and an odds ratio comparing males to females.
/* long03d06.sas */
proc genmod data=long.keratotomy desc;
class patientid gender (param=ref ref='Female');
model unstable=age diameter gender visit / dist=bin type3;
repeated subject=patientid / corrw modelse type=unstr printmle;
estimate '10 year increase in age' age 10 / exp;
estimate '1 mm decrease in diameter' diameter -1 / exp;
estimate 'male vs. female' gender 1 / exp;
estimate '1 year increase in followup' visit 1 / exp;
title 'GEE Model of Radial Keratotomy Surgery';
run;
Selected PROC GENMOD statement option:
DESC reverses the sort order for the levels of the outcome variable.
Selected CLASS statement option:
PARAM= specifies the parameterization method for the classification variable or variables.
The default is PARAM=GLM.
REF= specifies the reference level for PARAM=EFFECT, PARAM=REF, and their
orthogonalizations. For an individual variable, you can specify the level of the
variable to use as the reference level. For a global or individual variable, you can
use one of the following keywords. The default is REF=LAST.
FIRST designates the first ordered level as the reference.
LAST designates the last ordered level as the reference.
Selected MODEL statement options:
DIST= specifies the built-in probability distribution to use in the model. The default link
function for the binomial distribution is the logit link function.
TYPE3 requests that Type 3 score statistics be computed for each effect that is specified
in the MODEL statement. Likelihood ratio statistics are produced for models
that are not GEE models.
Selected REPEATED statement options:
CORRW specifies that the final working correlation matrix be printed.
MODELSE displays an analysis of parameter estimates table using model-based standard
errors.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-109

PRINTMLE displays an analysis of maximum likelihood parameter estimates table.


Selected ESTIMATE statement option:
EXP requests that the exponentiated contrast, its standard error, and the confidence
bounds be computed.

 If the repeated measurements are not in the proper order or if there are missing time points for
some subjects, then the WITHIN= option in the REPEATED statement should be used. This
option names a variable that specifies the order of measurements within subjects. Variables used
in the WITHIN= option must also be listed in the CLASS statement.
GEE Model of Radial Keratotomy Surgery

The GENMOD Procedure

Model Information

Data Set LONG.KERATOTOMY


Distribution Binomial
Link Function Logit
Dependent Variable unstable

Number of Observations Read 1086


Number of Observations Used 1046
Number of Events 412
Number of Trials 1046
Missing Values 40

The Model Information table provides information about the data set and the model.
Class Level Information

Design
Class Value Variables

gender Female 0
Male 1

The Class Level Information table displays the levels of the class variables.
Response Profile

Ordered Total
Value unstable Frequency

1 1 412
2 0 634

PROC GENMOD is modeling the probability that unstable='1'.

The Response Profile table displays the levels of the response variable. Notice PROC GENMOD shows
which value of the response variable is being modeled.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-110 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Parameter Information

Parameter Effect gender

Prm1 Intercept
Prm2 age
Prm3 diameter
Prm4 gender Male
Prm5 visit

Algorithm converged.

The Parameter Information table displays the names of the parameters.


Analysis Of Initial Parameter Estimates

Standard Wald 95% Confidence Wald


Parameter DF Estimate Error Limits Chi -Square Pr > ChiSq

Intercept 1 1.3844 0.7778 -0.1400 2.9088 3.17 0.0751


age 1 0.0105 0.0103 -0.0096 0.0306 1.04 0.3068
diameter 1 -1.1957 0.1914 -1.5709 -0.8204 39.00 <.0001
gender Male 1 0.5611 0.1524 0.2624 0.8598 13.55 0.0002
visit 1 0.3188 0.0213 0.2770 0.3605 223.92 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.

The Analysis of Initial Parameter Estimates table (displayed by the PRINTMLE option) shows the
parameter estimates when the observations are treated as independent. These parameter estimates are used
as the starting values for the GEE solution. The inferences from this table should be used only as a
comparison to the inferences from the GEE model.
GEE Model Information

Correlation Structure Unstructured


Subject Effect patientid (362 levels)
Number of Clusters 362
Clusters With Missing Values 25
Correlation Matrix Dimension 3
Maximum Cluster Size 3
Minimum Cluster Size 0

Algorithm converged.

The GEE Model Information table displays information about the longitudinal model fit with GEE.
Because TYPE=UNSTR option is requested, the unstructured correlation structure is used. Furthermore,
because there are 362 patients, there are 362 clusters. Notice that the data are not complete as 25 clusters
have missing values.
Working Correlation Matrix

Col1 Col2 Col3

Row1 1.0000 0.2753 -0.0796


Row2 0.2753 1.0000 0.2621
Row3 -0.0796 0.2621 1.0000

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-111

Because the unstructured correlation structure is used, the correlations between time points are all
estimated. Because there are a relatively large number of clusters, the choice of correlation structures will
not significantly affect the results of the GEE model.
GEE Fit Criteria

QIC 1088.1506
QICu 1085.3915

The quasi-likelihood information criterion (QIC) is a modification of the Akaike information criterion
(AIC) to apply to models fit by GEEs. The QIC is appropriate for selecting regression models and
working correlations structures as the lower the value the better fit of the model. PROC GENMOD also
computes an approximation to QIC called QICu, and this is appropriate only for selecting regression
models.
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept 1.1199 0.9390 -0.7205 2.9602 1.19 0.2330


age 0.0133 0.0113 -0.0089 0.0354 1.18 0.2393
diameter -1.1429 0.2258 -1.5855 -0.7004 -5.06 <.0001
gender Male 0.5210 0.1785 0.1711 0.8710 2.92 0.0035
visit 0.3265 0.0225 0.2823 0.3706 14.49 <.0001

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept 1.1199 0.8826 -0.6100 2.8498 1.27 0.2045


age 0.0133 0.0117 -0.0096 0.0362 1.14 0.2550
diameter -1.1429 0.2167 -1.5677 -0.7182 -5.27 <.0001
gender Male 0.5210 0.1729 0.1822 0.8599 3.01 0.0026
visit 0.3265 0.0219 0.2836 0.3694 14.91 <.0001
Scale 1.0000 . . . . .
NOTE: The scale parameter was held fixed.

Because the MODELSE option is used, the Analysis Of GEE Parameter Estimates table shows both the
empirical standard error estimates and the model-based standard error estimates. The empirical standard
error estimates are robust estimates that do not depend on the correctness of the structure imposed on the
working correlation matrix. The model-based standard error estimates are based directly on the assumed
correlation structure. The model-based standard errors are better estimates if the assumed model for the
correlation structure is correct, but worse if the assumed model is incorrect (Allison 1999). Because the
sample size is large, the robust standard errors are generally preferred.
Score Statistics For Type 3 GEE Analysis

Chi-
Source DF Square Pr > ChiSq

age 1 1.41 0.2343


diameter 1 24.35 <.0001
gender 1 8.73 0.0031
visit 1 166.70 <.0001

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-112 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Because the TYPE3 option is used, the Score Statistics For Type 3 GEE Analysis table is displayed. The
results based on the empirical standard errors, the model-based standard errors, and the Type 3 score
statistics are all very similar because of the large sample size. However, the Z statistic (from the table
based on empirical and model-based standard errors) usually produce more liberal p-values than the score
statistic.

 If you have a small sample size, the score statistic is the statistic of choice (Stokes, Davis, and
Koch 2000).
Contrast Estimate Results

Mean Mean L'Beta Standard


Label Estimate Confidence Limits Estimate Error Alpha

10 year increase in age 0.5332 0.4779 0.5877 0.1330 0.1130 0.05


Exp(10 year increase in age) 1.1422 0.1291 0.05
1 mm decrease in diameter 0.7582 0.6683 0.8300 1.1429 0.2258 0.05
Exp(1 mm decrease in diameter) 3.1360 0.7081 0.05
male vs. female 0.6274 0.5427 0.7049 0.5210 0.1785 0.05
Exp(male vs. female) 1.6838 0.3006 0.05
1 year increase in followup 0.5809 0.5701 0.5916 0.3265 0.0225 0.05
Exp(1 year increase in followup) 1.3861 0.0312 0.05

Contrast Estimate Results

L'Beta Chi-
Label Confidence Limits Square Pr > ChiSq

10 year increase in age -0.0885 0.3545 1.38 0.2393


Exp(10 year increase in age) 0.9153 1.4255
1 mm decrease in diameter 0.7004 1.5855 25.62 <.0001
Exp(1 mm decrease in diameter) 2.0145 4.8818
male vs. female 0.1711 0.8710 8.52 0.0035
Exp(male vs. female) 1.1866 2.3892
1 year increase in followup 0.2823 0.3706 209.98 <.0001
Exp(1 year increase in followup) 1.3262 1.4486

The Contrast Estimate Results table displays the results of the ESTIMATE statement. Because the EXP
option is used, the contrast results are exponentiated, which produces the odds ratio estimate. The odds
ratio for diameter (which is in the L’Beta Estimate column) shows that patients with a one-millimeter
decrease in diameter are 3.14 times more likely to have a continuing effect of the surgery with respect
to odds. The 95% confidence bounds are 2.01to 4.88.

 There are several disadvantages of the Wald chi-square tests shown in the Contrast Estimate
Results table. One disadvantage is that the tests for individual parameters are dependent on the
measurement scale (they are not invariant to transformations). Another disadvantage of Wald tests
is that they require estimation of the covariance matrix of the vector of parameter estimates.
Estimates of variances and covariances might be unstable if the sample size is small. It is
recommended that you have around 200 clusters to provide a great deal of confidence concerning
assessments of statistical significance at the 0.05 confidence level or smaller (Stokes, Davis,
and Koch 2000). With 362 subjects, the Wald tests should perform reasonably well.

 The Mean Estimate column is the linear contrast L’Beta reflected through the inverse link
function (the associated probability). In this example, it is not meaningful.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-113

Example: Using the ESTIMATE statement and the same model as the last demonstration, generate the
probability of a continuing effect of the surgery at a visit of 10 years for 49-year-old males
with a diameter of the clear zone of 3. Use the ODS SELECT statement to select only the table
of the contrast estimate results.
ods select estimates;
proc genmod data=long.keratotomy desc;
class patientid gender (param=ref ref='Female');
model unstable = age diameter gender visit / dist=bin;
repeated subject = patientid / type=unstr;
estimate 'Probability for age 49 diameter 3 gender male visit 10'
int 1 age 49 diameter 3 gender 1 visit 10;
title 'GEE Model of Radial Keratotomy Surgery';
run;

Contrast Estimate Results

Mean Mean L'Beta


Label Estimate Confidence Limits Estimate

Probability for age 49 diameter 3 gender male visit 10 0.8936 0.8342 0.9335 2.1283

Standard L'Beta
Label Error Alpha Confidence Limits

Probability for age 49 diameter 3 gender male visit 10 0.2616 0.05 1.6157 2.6410

Chi-
Label Square Pr > ChiSq

Probability for age 49 diameter 3 gender male visit 10 66.21 <.0001

The estimated probability is 0.8936 with a 95% confidence interval of 0.8342 to 0.9335. You can change
the confidence limits with the ALPHA= option in the ESTIMATE statement. The L’Beta estimate is
the Xβ.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-114 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Parameter Estimates and Standard Errors across


Correlation Structures
Initial EXCH UNSTR AR

.0105 .0109 .0133 .0121


AGE
(.0103) (.0114) (.0113) (.0113)
-1.1957 -1.1904 -1.1429 -1.1689
DIAMETER (.1914) (.2216) (.2258) (.2236)

GENDER(f -.5611 -.5576 -.5210 -.5376


emale) (.1524) (.1760) (.1785) (.1770)
.3188 .3190 .3265 .3232
VISIT (.0213) (.0227) (.0225) (.0226)

119
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

In this example, the number of subjects is very large. Therefore, there should be little difference in the
parameter estimates and the robust standard errors across the different correlation structures. The slide
above illustrates the robustness of the GEE methods with regard to obtaining consistent parameter
estimates and standard errors. Notice the standard errors all increased from the initial model to the GEE-
based models. This makes sense for age, diameter, and gender because these variables are all time-
independent. However, visit is a time-dependent variable, so the standard error should have decreased.
The negative correlations among the observations might have caused this anomaly.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.3 GEE Regression Models 3-115

Exercises

A longitudinal study was undertaken to assess the health effects of air pollution on children. The data
contain repeated binary measures of wheezing status for each of 537 children from Steubenville, Ohio.
The measurements were taken at age 7, 8, 9, and 10 years. The smoking status of the mother at the first
year of the study was also recorded. The data are stored in a SAS data set called long.wheeze.

These are the variables in the data set:


case patient identification number
wheeze wheezing status of child (1=yes, 0=no)

age age of child when measurement was taken (in years)


smoker smoking status of mother (Yes versus No).
4. Fitting Binary GEE Models
a. Fit a GEE model on the wheezing data set using PROC GENMOD and specify wheeze as the
response variable and smoker, age, and age*age as the predictor variables. Use the DESC option
in the PROC GENMOD statement to model the probability of wheezing. Also request the
unstructured correlation structure, the type3 score statistics, the model-based standard errors, the
initial maximum likelihood parameter estimates table, and the working correlation matrix.

1) For the GEE parameter estimates, which parameters are significant at the 0.05 level?
2) Explain the changes in the p-values and standard errors for smoker and age when comparing
the initial parameter estimates to the GEE parameter estimates.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-116 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

3.10 Multiple Choice Poll

Compared to the results of the initial parameter estimates, the p-value and
standard error of the GEE parameter estimate for smoker did which of the
following?
a. Went up because it is a time-dependent variable
b. Went up because it is a time-independent variable
c. Went down because it is a time-dependent variable
d. Went down because it is a time-independent variable

121
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.4 Chapter Summary


Longitudinal models fit in the MIXED procedure have the assumption that the conditional responses are
normally distributed. However, the normality assumption might not always be reasonable, especially
when the response variable is discrete. Therefore, the generalized linear mixed models are widely used as
a way to deal with correlated discrete response data. These models have the flexibility to specify random
effects, spatial covariance structures, heterogeneity in the covariance parameters, and also to generate
subject-specific parameter estimates.
The GLIMMIX procedure distinguishes two types of random effects. If the variance of the random effect
is contained in the matrix G, then it is called a G-side random effect. If the variance of the random effect
is contained in the matrix R, then it is called an R-side random effect. R-side effects are also called
residual effects. An R-side random effect in PROC GLIMMIX is equivalent to a REPEATED effect in
PROC MIXED.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.4 Chapter Summary 3-117

The GLIMMIX procedure uses the pseudo-likelihood (linearization) method to obtain the parameter
estimates and standard errors from a linearized model. The first step is achieved by taking the first-order
Taylor series expansions to linearize the generalized linear mixed model to linear mixed models. Because
the linearization approach approximates the GzLMM as linear mixed models, the computed likelihood is
for these linear mixed models, not the original model. It is not the true likelihood of your problem.
Likelihood ratio tests that compare nested models might not be mathematically valid and the model fit
statistics should not be used for model comparisons (AIC, BIC).
There are two likelihood-based estimation methods. The METHOD=QUAD option in the PROC
GLIMMIX statement requests that the GLIMMIX procedure approximate the marginal log likelihood
with an adaptive Gauss-Hermite quadrature. The METHOD=LAPLACE option in the PROC GLIMMIX
statement requests that the GLIMMIX procedure approximate the marginal log likelihood by using the
Laplace method. Laplace estimates typically exhibit better asymptotic behavior and less small-sample
bias than pseudo-likelihood estimators. On the other hand, the class of models for which a Laplace
approximation of the marginal log likelihood is available is much smaller compared to the class of models
to which pseudo-likelihood estimation can be applied.
In the GLIMMIX procedure, robust standard errors can be obtained by using the EMPIRICAL option in
the PROC GLIMMIX statement. The subtle difference between this option and the GEE estimates in
PROC GENMOD is that the parameter estimates are obtained using the moment-based method in PROC
GENMOD, whereas the parameter estimates are obtained using the pseudo-likelihood method in PROC
GLIMMIX.
Models using the GEE method are marginal models that only estimate population average regression
coefficients and do not estimate subject-specific regression coefficients. These models are not flexible
enough to specify heterogeneity of the covariance parameters. However, fitting models using the GEE
approach has been shown to give consistent estimators of the regression coefficients and their variances
under weak assumptions about the actual correlation among a subject’s observations.
PROC GENMOD can be used to fit longitudinal data models with the use of the GEE method. The layout
of the data is similar to PROC GLIMMIX where the number of observations is the number of
measurements taken on all the subjects. The variance-covariance matrix is a block diagonal matrix in
which the observations within each block are assumed to be correlated while the observations outside of
the blocks are assumed to be independent.
If the estimation of the regression coefficients is the primary objective of your study and you have a large
number of subjects, then you should not spend much time choosing a correlation structure. If the
correlation among the measurements is of prime interest and you have a small number of subjects, then
you should spend time choosing a correlation structure. For this latter case, both the model and the
correlation structure must be approximately correct to obtain valid inferences.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-118 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

General form of the GLIMMIX procedure:

PROC GLIMMIX <options>;


CLASS variables;
CONTRAST 'label' contrast-specification </options>;
COVTEST <'label'> <test-specification> </ options>;
ESTIMATE 'label' contrast-specification </options>;
LSMEANS fixed-effects </options>;
LSMESTIMATE fixed-effect <'label'> values </options>;
MODEL response<(response options)>=<fixed-effects>
</options>;
NLOPTIONS <options>;
OUTPUT <OUT=SAS-data-set><keyword<(keyword-options) >
<=name>>... </ options>;
PARMS (value-list)…</options>;
RANDOM random-effects </options>;
WEIGHT variable;
RUN;

General form of the GENMOD procedure:

PROC GENMOD DATA=SAS-data-set <options>;


CLASS variables </option>;
MODEL response=predictors </options>;
REPEATED SUBJECT=subject-effect </options>;
ESTIMATE 'label' effect values … <options>;
RUN;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-119

3.5 Solutions
Solutions to Exercises
1. Generating Empirical Logit Plots
a. Generate a line listing of the wheezing data (first 20 observations) and logit plots of age.
proc print data=long.wheeze(obs=20);
title 'Line Listing of Wheezing Data';
run;

Line Listing of Wheezing Data

Obs smoker case age wheeze

1 No 1 7 0
2 No 1 8 0
3 No 1 9 0
4 No 1 10 0
5 No 2 7 0
6 No 2 8 0
7 No 2 9 0
8 No 2 10 0
9 No 3 7 0
10 No 3 8 0
11 No 3 9 0
12 No 3 10 0
13 No 4 7 0
14 No 4 8 0
15 No 4 9 0
16 No 4 10 0
17 No 5 7 0
18 No 5 8 0
19 No 5 9 0
20 No 5 10 0

1) The data are in the proper order (sorted by age within case).
proc means data=long.wheeze noprint nway;
class age;
var wheeze;
output out=bins sum(wheeze)=wheeze;
run;

data bins;
set bins;
logit=log((wheeze+1)/(_freq_-wheeze+1));
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-120 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

proc sgplot data=bins;


scatter y=logit x=age / markerattrs=(color=blue size=10px
symbol=circlefilled);
xaxis label="Age of Child in Years";
yaxis label="Estimated Logit";
title "Estimated Logit Plot of Age of Child";
run;

2) The plot shows that the logits possibly have a quadratic relationship with age.
2. Fitting Generalized Linear Mixed Models

a. Fit a generalized linear mixed model to the long.wheeze data set using G-side random effects, the
method of adaptive Gauss-Hermite quadrature, and the between-within degrees of freedom
adjustment. Specify wheeze as the response variable and smoker , age, and age*age as the
predictor variables. Model the probability that wheeze is equal to 1 with the EVENT= option.
Also, request that the solution for the fixed-effects parameters be produced. Specify the
optimization technique of Newton-Raphson with ridging, and compute the odds ratio for smoker
(No as the reference value) and for a one-year decrease in age (10 as the reference value). Create
an odds ratio plot and display the statistics and use the COVTEST statement to test whether the G
matrix can be reduced to a zero matrix.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-121

proc glimmix data=long.wheeze method=quad noclprint=5


plots=oddsratio(stats);
class case smoker;
model wheeze(event='1') = smoker age age*age / solution dist=binary
ddfm=bw or(diff=first at age = 10 unit age = -1);
random intercept / subject=case;
nloptions tech=nrridg;
covtest "H0: No random effects" zerog;
title 'Generalized Linear Mixed Model of Wheezing among Children';
run;

Generalized Linear Mixed Model of Wheezing among Children

The GLIMMIX Procedure

Model Information

Data Set LONG.WHEEZE


Response Variable wheeze
Response Distribution Binary
Link Function Logit
Variance Function Default
Variance Matrix Blocked By case
Estimation Technique Maximum Likelihood
Likelihood Approximation Gauss-Hermite Quadrature
Degrees of Freedom Method Between-Within

Class Level Information

Class Levels Values

case 537 not printed


smoker 2 No Yes

Number of Observations Read 2148


Number of Observations Used 2148

Response Profile

Ordered Total
Value wheeze Frequency

1 0 1822
2 1 326

The GLIMMIX procedure is modeling the probability that wheeze='1'.

Dimensions

G-side Cov. Parameters 1


Columns in X 5
Columns in Z per Subject 1
Subjects (Blocks in V) 537
Max Obs per Subject 4

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-122 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 5
Lower Boundaries 1
Upper Boundaries 0
Fixed Effects Not Profiled
Starting From GLM estimates
Quadrature Points 7

Iteration History

Objective Max
Iteration Restarts Evaluations Function Change Gradient

0 0 10 1664.4797484 . 10743.16
1 0 13 1600.6608663 63.81888214 1736.633
2 0 8 1593.7671408 6.89372550 288.9059
3 0 8 1591.5112399 2.25590087 78.47274
4 0 8 1590.9975586 0.51368136 13.79153
5 0 8 1590.729367 0.26819152 0.702447
6 0 8 1590.6477122 0.08165484 1.182501
7 0 8 1590.6477055 0.00000666 0.001201

Convergence criterion (GCONV=1E-8) satisfied.

Fit Statistics

-2 Log Likelihood 1590.65


AIC (smaller is better) 1600.65
AICC (smaller is better) 1600.68
BIC (smaller is better) 1622.08
CAIC (smaller is better) 1627.08
HQIC (smaller is better) 1609.03

Fit Statistics for Conditional


Distribution

-2 log L(wheeze | r. effects) 930.52


Pearson Chi-Square 851.45
Pearson Chi-Square / DF 0.40

Covariance Parameter Estimates

Cov Standard
Parm Subject Estimate Error

UN(1,1) case 4.7395 0.7838

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-123

Solutions for Fixed Effects

Standard
Effect smoker Estimate Error DF t Value Pr > |t|

Intercept -11.9408 5.3748 535 -2.22 0.0267


smoker No -0.4023 0.2749 535 -1.46 0.1439
smoker Yes 0 . . . .
age 2.4197 1.2825 1609 1.89 0.0594
age*age -0.1532 0.07564 1609 -2.03 0.0430

Odds Ratio Estimates

95% Confidence
smoker age _smoker _age Estimate DF Limits

Yes 10 No 10 1.495 535 0.871 2.566


9 10 1.634 1609 1.168 2.286

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-124 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

smoker 1 535 2.14 0.1439


age 1 1609 3.56 0.0594
age*age 1 1609 4.10 0.0430

Tests of Covariance Parameters


Based on the Likelihood

Label DF -2 Log Like ChiSq Pr > ChiSq Note

H0: No random effects 1 1817.18 226.53 <.0001 MI

MI: P-value based on a mixture of chi-squares.

1) The odds ratio for age compares age 9 to age 10 (the odds of the event in age 9 in the
numerator and the odds of the event in age 10 in the denominator) taking into account the
polynomial term. The estimate of 1.634 means that a one-year decrease in age going from age
10 to age 9 results in a 63% increase ((1.634-1)*100) in the odds of wheezing. Since the
polynomial term is in the model, a two-year decrease in age would result in a different
odds ratio.

2) The results in the Tests of Covariance Parameters table indicate that the “no random effects
model” is rejected. Therefore, the model with random effects fits the data better than the
model without random effects.
3. Fitting Generalized Linear Mixed Models with Splines
a. Fit a generalized linear mixed model to the long.wheeze data set but create a spline for age.
Specify a truncated power function basis for the spline expansion and use the NOPOWERS
option to exclude the intercept and polynomial columns. Use the knot method of list and list the
know values as 8 and 9, specify a degree of spline expansion of 3, and request a table that shows
the knot locations and the knots associated with each spline basis function. Use R-side random
effects with an unstructured covariance structure and use an optimization technique of Newton-
Raphson with ridging.
proc glimmix data=long.wheeze noclprint=5;
class case smoker;
effect spl=spline(age / details basis=tpf(nopowers)
knotmethod=list(8 9) degree=3);
model wheeze(event='1') = smoker spl / solution dist=binary;
random _residual_ / type=un subject=case;
nloptions tech=nrridg;
title 'Generalized Linear Mixed Model of Wheezing among Children';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-125

Generalized Linear Mixed Model of Wheezing among Children

The GLIMMIX Procedure

Model Information

Data Set LONG.WHEEZE


Response Variable wheeze
Response Distribution Binary
Link Function Logit
Variance Function Default
Variance Matrix Blocked By case
Estimation Technique Residual PL
Degrees of Freedom Method Between-Within

Class Level Information

Class Levels Values

case 537 not printed


smoker 2 No Yes

Knots for Spline Effect spl

Knot
Number age

1 8.00000
2 9.00000

Basis Details for Spline Effect spl

Column Power Break Knot

1 3 8.00000
2 3 9.00000

Number of Observations Read 2148


Number of Observations Used 2148

Response Profile

Ordered Total
Value wheeze Frequency

1 0 1822
2 1 326

The GLIMMIX procedure is modeling the probability that wheeze='1'.

Dimensions

R-side Cov. Parameters 10


Columns in X 5
Columns in Z per Subject 0
Subjects (Blocks in V) 537
Max Obs per Subject 4

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-126 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Optimization Information

Optimization Technique Newton-Raphson with Ridging


Parameters in Optimization 10
Lower Boundaries 4
Upper Boundaries 0
Fixed Effects Profiled
Starting From Data

Iteration History

Objective Max
Iteration Restarts Subiterations Function Change Gradient

0 0 6 10077.035719 0.63377064 6.022E-7


1 0 4 10207.415785 0.11357921 6.212E-9
2 0 2 10216.577937 0.00749397 0.00002
3 0 1 10216.567153 0.00001965 6.094E-7
4 0 1 10216.567263 0.00000010 8.46E-12
5 0 0 10216.567266 0.00000000 7.543E-7

Convergence criterion (PCONV=1.11022E-8) satisfied.

Fit Statistics

-2 Res Log Pseudo-Likelihood 10216.57


Generalized Chi-Square 2144.00
Gener. Chi-Square / DF 1.00

Covariance Parameter Estimates

Cov Standard
Parm Subject Estimate Error

UN(1,1) case 0.9934 0.06072


UN(2,1) case 0.3542 0.04598
UN(2,2) case 1.0138 0.06194
UN(3,1) case 0.3057 0.04513
UN(3,2) case 0.4440 0.04763
UN(3,3) case 1.0039 0.06134
UN(4,1) case 0.3235 0.04528
UN(4,2) case 0.3296 0.04578
UN(4,3) case 0.3818 0.04634
UN(4,4) case 1.0002 0.06111

Solutions for Fixed Effects

Standard
Effect spl smoker Estimate Error DF t Value Pr > |t|

Intercept -1.4541 0.1448 535 -10.04 <.0001


smoker No -0.2569 0.1775 535 -1.45 0.1483
smoker Yes 0 . . . .
spl 1 -0.06043 0.1130 535 -0.53 0.5931
spl 2 0.08091 0.8687 535 0.09 0.9258

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-127

Type III Tests of Fixed Effects

Num Den
Effect DF DF F Value Pr > F

smoker 1 535 2.10 0.1483


spl 2 535 4.91 0.0077

1) The two spline coefficients for age deal with the truncated power basis functions. The first
spline coefficient for age deals with the first truncated power basis function. For the first
knot, if age is 8 or less than the truncated basis function is 0. If age is greater than 8, then
the truncated power basis function is (age-8)3. For the second truncated power basis function,
if age is greater than 9, then the truncated power basis function is (age-9)3 .

2) Because the linearization method was used, the AIC and BIC statistics are not produced
because there is no true likelihood.

4. Fitting Binary GEE Models


a. Fit a GEE model on the wheezing data set using PROC GENMOD and specify wheeze
as the response variable and smoker, age, and age*age as the predictor variables. Use the DESC
option in the PROC GENMOD statement to model the probability of wheezing. Also request
the unstructured correlation structure, the type3 score statistics, the model-based standard errors,
the initial maximum likelihood parameter estimates table, and the working correlation matrix.
proc genmod data=long.wheeze desc;
class case smoker;
model wheeze=smoker age age*age / dist=bin type3;
repeated subject=case / corrw modelse type=unstr printmle;
title 'Longitudinal Model of Wheezing among Children';
run;

Longitudinal Model of Wheezing among Children

The GENMOD Procedure

Model Information

Data Set LONG.WHEEZE


Distribution Binomial
Link Function Logit
Dependent Variable wheeze

Number of Observations Read 2148


Number of Observations Used 2148
Number of Events 326
Number of Trials 2148

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-128 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Class Level Information

Class Levels Values

case 537 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
...
smoker 2 No Yes

Response Profile

Ordered Total
Value wheeze Frequency

1 1 326
2 0 1822

PROC GENMOD is modeling the probability that wheeze='1'.

Parameter Information

Parameter Effect smoker

Prm1 Intercept
Prm2 smoker No
Prm3 smoker Yes
Prm4 age
Prm5 age*age

Algorithm converged.

Analysis Of Initial Parameter Estimates

Standard Wald 95% Confidence Wald


Parameter DF Estimate Error Limits Chi -Square Pr > ChiSq

Intercept 1 -7.6106 4.2978 -16.0342 0.8130 3.14 0.0766


smoker No 1 -0.2725 0.1235 -0.5146 -0.0303 4.86 0.0274
smoker Yes 0 0.0000 0.0000 0.0000 0.0000 . .
age 1 1.5735 1.0278 -0.4409 3.5879 2.34 0.1258
age*age 1 -0.0996 0.0606 -0.2185 0.0192 2.70 0.1003
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-129

GEE Model Information

Correlation Structure Unstructured


Subject Effect case (537 levels)
Number of Clusters 537
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4

Algorithm converged.

Working Correlation Matrix

Col1 Col2 Col3 Col4

Row1 1.0000 0.3528 0.3110 0.3267


Row2 0.3528 1.0000 0.4404 0.3241
Row3 0.3110 0.4404 1.0000 0.3833
Row4 0.3267 0.3241 0.3833 1.0000

GEE Fit Criteria

QIC 1828.2487
QICu 1825.1870

Analysis Of GEE Parameter Estimates


Empirical Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept -7.6279 3.5958 -14.6756 -0.5802 -2.12 0.0339


smoker No -0.2621 0.1780 -0.6109 0.0867 -1.47 0.1408
smoker Yes 0.0000 0.0000 0.0000 0.0000 . .
age 1.5762 0.8593 -0.1081 3.2605 1.83 0.0666
age*age -0.0998 0.0506 -0.1990 -0.0006 -1.97 0.0487

Analysis Of GEE Parameter Estimates


Model-Based Standard Error Estimates

Standard 95% Confidence


Parameter Estimate Error Limits Z Pr > |Z|

Intercept -7.6279 3.5903 -14.6648 -0.5910 -2.12 0.0336


smoker No -0.2621 0.1773 -0.6097 0.0855 -1.48 0.1394
smoker Yes 0.0000 0.0000 0.0000 0.0000 . .
age 1.5762 0.8576 -0.1047 3.2571 1.84 0.0661
age*age -0.0998 0.0505 -0.1989 -0.0007 -1.97 0.0483
Scale 1.0000 . . . . .
NOTE: The scale parameter was held fixed.

Score Statistics For Type 3 GEE Analysis

Chi-
Source DF Square Pr > ChiSq

smoker 1 2.04 0.1529

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-130 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

age 1 3.32 0.0685


age*age 1 3.83 0.0504

1) The parameter age*age is significant at the 0.05 level for the empirical standard errors and
the model-based standard errors, but it is not significant for the score statistics. The reason for
this discrepancy is the score statistics are more conservative than the Z statistics.
2) The p-value and standard error for the GEE parameter estimate for smoker went up when
compared to the initial parameter estimate because it is a time-independent variable while the
p-value and standard error for age went down because it is a time-dependent variable.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-131

Solutions to Student Activities (Polls/Quizzes)

3.01 Multiple Choice Poll – Correct Answer

Which on the following statements is true regarding generalized linear mixed


models?
a. The variance of the response variable is assumed to be constant.
b. The conditional distribution of the data, given the random effects,
belongs to the exponential family of distributions.
c. The distribution of the random effects can belong to any of the
exponential family of distributions.
d. The response values are assumed to be normally distributed.

11
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.02 Multiple Choice Poll – Correct Answer

Which of the following statements is true regarding G -side and R-side


random effects?
a. R-side random effects provide subject-specific interpretations of the
model if no G-side random effects are present.
b. All random effects are specified through the RANDOM statement.
c. Continuous effects are allowed in R-side random effects.
d. R-side random effects are modeled by the REPEATED statement in PROC
GLIMMIX.

23
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-132 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

3.03 Multiple Choice Poll – Correct Answer

Which of the following statements is true regarding the pseudo-likelihood


linearization method?
a. The method cannot be used with the Kenward-Roger degrees of
freedom adjustment.
b. The method might produce biased variance estimates for the random
effects.
c. Model comparisons are possible based on information criteria such as
the AIC and BIC.
d. There is no distributional assumption for the linearized and transformed
residuals.

36
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.04 Multiple Choice Poll – Correct Answer

The odds ratio for age in the exercise was 1.634. How can this be
interpreted?
a. A one-year decrease from age 10 results in a 63% increase in the odds of
wheezing.
b. A one-year decrease from any age results in a 63% increase in the odds
of wheezing.
c. A one-year increase from any age results in a 63% increase in the odds of
wheezing.
d. A one-year increase from age 10 results in a 63% increase in the odds of
wheezing.

48
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-133

3.05 Multiple Choice Poll – Correct Answer

Which one of the following statements is true for proportional odds models?
a. The model fits separate intercepts.
b. The model fits separate slopes.
c. The cumulative logits compare each category to the last category.
d. The coding of the ordinal outcome affects the odds ratios.

57
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.06 Multiple Choice Poll – Correct Answer

Why are AIC and BIC model fit statistics not produced in the exercise
problem?
a. The use of the RANDOM statement always suppresses the AIC and BIC
statistics.
b. Because the response variable has a binomial distribution, the AIC and
BIC statistics are always suppressed.
c. Because the linearization method was used, the AIC and BIC statistics are
not produced because there is no true likelihood.
d. PROC GLIMMIX does not support AIC and BIC model fit statistics.

71
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-134 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

3.07 Multiple Choice Poll – Correct Answer

Which one of the following statements is true regarding GEE regression


models?
a. GEE models can estimate subject-specific regression coefficients.
b. The robust standard errors are useful when the number of subjects is
small.
c. The quasi-likelihood estimation method does not require the
specification of the distribution of the response variable.
d. The likelihood-ratio test can be used to test the significance of predictor
variables.

85
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.08 Multiple Choice Poll – Correct Answer

The model has time and the quadratic effect of time as predictor variables. To
estimate the odds ratio for a 3-unit increase in time (time 0 to 3), the
coefficients for the ESTIMATE statement would be which of the following?
a. time 3
b. time 3 time*time 9
c. time 3 time*time 3
d. time -3 time*time -9

104
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

0  1 *3  2 *32
Oddstime3 e
 0  1*0 2 *0  e 1*3 2 *9
Oddstime 0 e
Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3.5 Solutions 3-135

3.09 Multiple Choice Poll – Correct Answer

When you have a large sample size, what condition is not necessary for the
GEE model to have consistent parameter estimates and standard errors?
a. Missing values are MCAR
b. Correct specification of the model for the mean
c. Correct specification of the variance function
d. Correct specification of the correlation structure

117
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

3.10 Multiple Choice Poll – Correct Answer

Compared to the results of the initial parameter estimates, the p-value and
standard error of the GEE parameter estimate for smoker did which of the
following?
a. Went up because it is a time-dependent variable
b. Went up because it is a time-independent variable
c. Went down because it is a time-dependent variable
d. Went down because it is a time-independent variable

122
C o p yri gh t © SA S In sti tu te In c. A l l ri gh ts reserved .

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
3-136 Chapter 3 Longitudinal Data Analysis w ith Discrete Responses

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Appendix A References

A.1 References ................................................................................................................... A-3


A-2 Appendix A References

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A.1 References A-3

A.1 References
1. Agresti, A. (1996), An Introduction to Categorical Data Analysis, New York: John Wiley & Sons.
2. Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transaction
on Automatic Control, 19:716–723.
3. Albert, P.S. and McShane, L.M. (1995), “A generalized estimating equations approach for spatially
correlated binary data: Applications to the analysis of neuroimaging data,” Biometrics, 51: 627–638.
4. Allison, P. (1999), Logistic Regression Using the SAS System: Theory and Application, Cary, NC:
SAS Institute Inc.
5. Anderson, J.A. and Philips, P.R. (1981), “Regression, discrimination, and measurement models for
ordered categorical variables,” Applied Statistician, 30:22–31.
6. Barnhart, H.X. and Williamson, J.M. (1998), “Goodness-of-fit tests for GEE modeling with binary
data,” Biometrics, 54:720–729.
7. Breslow, N.E. and Clayton, D.G. (1993), “Approximate Inference in Generalized Linear Mixed
Models,” Journal of American Statistical Association, 88, 9–25.
8. Brown, H. and Prescott, R. (2001), Applied Mixed Models in Medicine, New York: John Wiley
& Sons Ltd.
9. Chi, E.M. and Reinsel, G.C. (1989), “Models for longitudinal data with random effects and AR(1)
errors,” Journal of the American Statistical Association, 84:452–459.
10. Cook, R.D. (1977), “Detection of influential observations in linear regression,” Technometrics,
19:15–28.
11. Cook, R.D. (1979), “Influential observations in linear regression,” Journal of American Statistical
Association, 74: 169–174.
12. Cook, R.D. (1986), Discussion of paper by S. Chatterjee and A.S. Hadi, Statistical Science,
1:393–397.
13. Davis, C.S. (2002), Statistical Methods for the Analysis of Repeated Measurements, New York:
Springer-Verlag.
14. Diggle, P.J. (1988), “An Approach to the Analysis of Repeated Measurements,” Biometrics, 44: 959–
971.
15. Diggle, P.J. (1990), Time Series: a Biostatistical Introduction, Oxford: Oxford University Press.
16. Diggle, P.J., Heagerty, P., Liang, K., and Zeger, S.L. (2002), Analysis of Longitudinal Data, 2nd
Edition, New York: Oxford University Press.
17. Diggle, P.J. and Kenward, M.G. (1994), “Informative dropout in longitudinal data analysis
(with discussion),” Applied Statistics, 43:49–73.
18. Duffy, T.J. and Santner, D.E. (1989), The Statistical Analysis of Discrete Data, New York: Springer-
Verlag.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A-4 Appendix A References

19. Dunlop, D.D. (1994), “Regression for Longitudinal Data: A Bridge from Least Squares Regression,”
The American Statistician, 48:299–303.
20. Evans, S.R. (1998), Goodness of Fit in Two Models for Clustered Binary Data. Ph.D. dissertation,
University of Massachusetts.
21. Evans, S.R. and Hosmer, D.W. (2004), “Goodness of Fit Tests for Logistic GEE Models: Simulation
Results,” Communication in Statistics, 33:247–258.
22. Guerin, L. and Stroup, W.W. (2000), “A Simulation Study to Evaluate PROC MIXED Analys
is of Repeated Measures Data,” Proceedings of the Twelfth Annual Kansas State University
Conference on Applied Statistics in Agriculture, April 30 – May 2, 2000.
23. Hastie, T.J., Botha, J.L., and Schnitzler, C.M. (1989), “Regression with an ordered categorical
response,” Statistics in Medicine, 8: 785–794.
24. Heagerty, P.J. and Zeger, S.L. (1998), “Lorelogram: a regression approach to exploring dependence in
longitudinal categorical responses,” Journal of the American Statistical Association, 93:150–162.
25. Horton, N.J., Bebchuk, J.D., Jones, C.L., et al. (1999), “Goodness-of-fit for GEE: an example with
mental health service utilization,” Statistics in Medicine, 18:213–222.
26. Hosmer, D.W. and Lemeshow, S. (2000), Applied Logistic Regression, 2nd edition, New York: John
Wiley & Sons.
27. Hughes, J.P. (1999), “Mixed effects models with censored data with application to HIV RNA levels,”
Biometrics, 55:625–629.
28. Kaslow, R.A., et al. (1987), “The Multicenter AIDS Cohort Study: Rationale, Organization, and
Selected Characteristics of the Participants,” American Journal of Epidemiology, 126:310–318.
29. Kenward, M.G.. and Roger, J.H. (1997), “Small Sample Inference for Fixed Effects from Restricted
Maximum Likelihood,” Biometrics, 53:983–997.
30. Kenward, M.G. and Roger, J.H. (2009), “An Improved Approximation to the Precision of Fixed
Effects from Restricted Maximum Likelihood,” Computational Statistics and Data Analysis, 53,
2583–2595.
31. Laird, N.M. (1988), “Missing data in longitudinal studies,” Statistics in Medicine, 7:305–315.
32. Liang, K.Y. and Zeger, S.L. (1986), “Longitudinal Data Analysis using Generalized Linear Models,”
Biometrika, 73:13–22.
33. Lin, D.Y., Wei, L.J., and Ying, Z. (2002), “Model-Checking Techniques Based on Cumulative
Residuals,” Biometrics, 58:1–12.
34. Lipsitz, S., Laird, N., and Harrington, D. (1991), “Generalized estimating equations for correlated
binary data: using odds ratios as a measure of association,” Biometrika, 78:153–160.
35. Littell, R.C., Henry, P.R., and Ammerman, C.B. (1998), “Statistical Analysis of Repeated Measures
Data Using SAS Procedures,” Animal Journal of Animal Science, 76:1216–1231.
36. Littell, R.C., Milliken, G.A., Stroup, W.W., and Wolfinger, R.D. (1996), SAS® System for Mixed
Models, Cary, NC: SAS Institute Inc.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A.1 References A-5

37. Littell, R.C., Stroup, W.W., and Freund, R.J. (2002), SAS® for Linear Models, Fourth Edition, Cary,
NC: SAS Institute Inc.
38. Little, R.J.A. and Rubin, D.B. (1987), Statistical Analysis with Missing Data, New York:
John Wiley & Sons.
39. Little, R.J.A. (1995), “Modeling the Drop-Out Mechanism in Repeated Measures Studies,” Journal of
the American Statistical Association, 90:1112–1121.
40. McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models, 2nd Edition, London: Chapman
and Hall.
41. Molenberghs, G., and Verbeke, G. (2005), Models for Discrete Longitudinal Data, New York:
Springer Science+Business Media, Inc.
42. Paik, M.C. (1988), “Repeated measurement analysis for nonnormal data in small samples,”
Communications in Statistics – Simulation and Computation, 17: 1155–1171.
43. Pan, W. (2001), “Akaike’s information criterion in generalized estimating equations,” Biometrics,
57:120–125.
44. Park, T. (1993), “A comparison of the generalized estimating equation approach with
the maximum likelihood approach for repeated measurements,” Statistics in Medicine, 12: 1723–
1732.
45. Park, T. and Davis, C.S. (1993), “A Test of the Missing Data Mechanism for Repeated Categorical
Data,” Biometrics, 49: 631–638.
46. Pepe, M.S. and Anderson, G.A. (1994), “A cautionary note on inference for marginal regression
models with longitudinal data and general correlated response data,” Communication in Statistics –
Simulation, 23:939–951.
47. Pregibon, D. “Logistic Regression Diagnostics,” The Annals of Statistics, 9: 705–724.
48. Preisser, J.S. and Qaqish, B.F. (1996), “Deletion diagnostics for generalized estimating equations,”
Biometrika, 83: 551–562.
49. Prentice, R. L. (1988), “Correlated binary regression with covariates specific to each binary
observation,” Biometrics, 44: 1033–1048.
50. Ruppert, D., Wand, M.P., and Carroll, R.J., (2003), Semiparametric Regression, Cambridge:
Cambridge University Press.
51. SAS Institute Inc. (1995), Logistic Regression Examples Using the SAS® System, Version 6, First
Edition, Cary, N.C.: SAS Institute Inc.
52. SAS Institute Inc. (2005), The GLIMMIX Procedure, Cary, N.C.: SAS Institute Inc.
53. Schabenberger, O. (2004), “Mixed Model Influence Diagnostics,” SUGI 29 Proceedings.
54. Scharfstein, D.O., Rotnitzky, A. and Robins, J.M. (1999), “Adjusting for nonignorable dropout using
semiparametric non-response models (with Discussion),” Journal of the American Statistical
Association, 94:1096–1120.
55. Schwarz, G. (1978), “Estimating the Dimension of a Model,” Annals of Statistics, 6:461–464.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
A-6 Appendix A References

56. Shock, N.W., et al. (1984), Normal Human Aging: The Baltimore Longitudinal Study of Aging,
National Institutes of Health Publication 84–2450. Washington, DC: National Institutes of Health.
57. Sommer, A., Katz, J., and Tarwotjo, I. (1984), “Increased risk of respiratory infection
and diarrhea in children with pre-existing mild vitamin A deficiency,” American Journal
of Clinical Nutrition, 40:1090–1095.
58. Stokes, M.E., Davis, C.S., and Koch, G.G. (2000), Categorical Data Analysis using
the SAS System, Second Edition, Cary, N.C.:SAS Institute Inc.
59. Swallow, W.H. and Monahan, J.F. (1984), “Monte Carlo Comparison of ANOVA, MIVQUE, REML,
and ML Estimators of Variance Components,” Technometrics, 28:47–57.
60. Tufte, E.R. (1990), Envisioning Information, Cheshire, CT: Graphics Press.
61. Verbeke, G.. and Lesaffre, E. (1997), “The effect of misspecifying the random effects distribution in
linear mixed models for longitudinal data,” Computational Statistics and Data Analysis, 23:541–556.
62. Verbeke, G.. and Molenberghs, G. (1997), Linear Mixed Models in Practice: A SAS-Oriented
Approach, New York: Springer-Verlag, Inc.
63. Verbeke, G.. and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data,
New York: Springer-Verlag, Inc.
64. Waring, G.O., Lynn, M.J., and McDonnell, P.J. (1994), “Results of the Prospective Evaluation of
Radial Keratotomy (PERK) Study 10 years after Surgery,” Arch Ophthalmol, 112: 1298–1308.
65. Zeger, S.L. and Liang, K. (1986), “Longitudinal Data Analysis for Discrete and Continuous
Outcomes,” Biometrics, 42:121–130.
66. Zeger, S.L. and Liang, K (1992), “An Overview of Methods for the Analysis of Longitudinal Data,”
Statistics In Medicine, 11:1825–1839.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
Appendix B Additional Resources

B.1 Programs ...................................................................................................................... B-3

B.2 Model Diagnostics for GEE Regression Models ...................................................... B-10


Demonstration: GEE Diagnostic Plots .................................................................................B-14
B-2 Appendix B Additional Resources

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-3

B.1 Programs
1. The VARIOGRAM macro creates the data set varioplot that contains the data values to construct
a sample variogram. The data must be sorted by the subject’s identification number.
%macro variogram (data=,resvar=,clsvar=,expvars=,id=,time=,maxtime=,);

ods select none;

proc mixed data=&data;


class &clsvar;
model &resvar=&expvars / outpm=residuals;
run;

ods select all;


An output data set is created with the residuals from the mean model.
data residuals1;
set residuals;
by &id;
if first.&id then timegrp=1;
else timegrp+1;
run;

proc transpose data=residuals1 out=subject prefix=time;


var resid &time;
by &id;
id timegrp;
run;
BY-group processing is used to create timegrp, which indicates what time point (first, second, third,
and so on) the observation is. The TRANSPOSE procedure is used to create two observations per subject
with the variables time1, time2, and so on, which represent the residual values and time values.
data variogram_table(keep=variogram)
time_interval_table(keep=time_interval);
set subject;
array time(*) time1-time&maxtime;
array diff(%eval(&maxtime-1),%eval(&maxtime-1));
array timei(%eval(&maxtime-1),%eval(&maxtime-1));
if _name_='Resid' then
do i=1 to %eval(&maxtime-1);
do k=i+1 to &maxtime;
if time(i) ne . and time(k) ne . then
do;
diff(i,k-1)=((time(i)-time(k))**2)/2;
end;
end;
end;
else
do i=1 to %eval(&maxtime-1);
do k=i+1 to &maxtime;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-4 Appendix B Additional Resources

if time(i) ne . and time(k) ne . then


do;
timei(i,k-1)=abs(time(i)-time(k));
end;
end;
end;
do i=1 to %eval(&maxtime-1);
do k=i to %eval(&maxtime-1);
if diff(i,k) ne . then
do;
variogram=diff(i,k);
output variogram_table;
end;
else
if timei(i,k) ne . then
do;
time_interval=timei(i,k);
output time_interval_table;
end;
end;
end;
run;
The first set of DO loops calculates the observed half-squared differences between pairs of residuals
within each subject.
The second set of DO loops calculates the corresponding time differences within each subject.
The third set of DO loops outputs two data sets. The data set variogram_table has the variogram values
(observed half-squared differences between pairs of residuals). The data set time_interval has the time
interval values (corresponding time differences).
data varioplot;
merge variogram_table time_interval_table;
run;

%mend variogram;
The DATA step does a one-to-one merge of the two data sets. This is appropriate because the data sets are
in identical order.
2. The VARIANCE macro computes the variogram-based estimate of the process variance. The data
must be sorted by the subject’s identification number.
%macro variance(data=,id=,resvar=,clsvar=,expvars=,
subjects=,maxtime=,);

ods select none;

proc mixed data=&data;


class &clsvar;
model &resvar=&expvars / outpm=residuals;
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-5

ods select all;


An output data set is created with the residuals from the mean model.
data residvar;
set residuals;
by &id;
if first.&id then timegrp=1;
else timegrp+1;
run;

proc transpose data=residvar out=varsubject


prefix=time;
var resid;
by &id;
id timegrp;
run;
BY-group processing is used to create timegrp, which indicates what time point (first, second, third, and
so on) the observation is. The TRANSPOSE procedure is used to create one observation per subject with
the variables time1, time2, and so on, representing the residual values.
data variance1(keep=diff1-diff%eval(&maxtime*&subjects));
retain timepts1-timepts%eval(&maxtime*(&subjects+1))
diff1-diff%eval(&maxtime*&subjects);
set varsubject end=lastone;
array time{&maxtime};
array timepts{%eval(&subjects+1),&maxtime};
array diff(&subjects,&maxtime);
do i=1 to &maxtime;
timepts(_n_,i)=time(i);
end;
if lastone=1 then
do;
do i=1 to &subjects;
do j=1 to &maxtime;
do k=1 to %eval(&subjects+1)-i;
do l=1 to &maxtime;
diff(k,l)=((timepts(i,j) -
timepts(k+i,l))**2)*1/2;
end;
end;
output;
do k=1 to &subjects;
do l=1 to &maxtime;
diff(k,l)=.;
end;
end;
end;
end;
end;
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-6 Appendix B Additional Resources

The DATA step calculates the variance by comparing each subject’s time points to all other time points
for all other subjects. Three arrays are created. The time array contains the time points from the
varsubject data set. The timepts array contains the time points for all the subjects from the varsubject
data set. The diff array contains differences between one time point for one subject with all time points
for all the other subjects.
The first DO loop reads the time points into the timepts array.
After reading in the data set varsubject, populate the array containing differences between one time point
for one subject with all the time points for all the other subjects. In the DO group, the first DO loop is
the loop for one subject. The second DO loop is the loop for each time point for one subject. The third
DO loop cycles through the other subjects. The fourth DO loop cycles through all the time points for each
subject.
The last two DO loops clear out the difference values to avoid carrying over any residual values.
data average_variance(keep=average total nonmissing);
array diff{%eval(&maxtime*&subjects)};
set variance1 end=lastone;
nonmissing+n(of diff1-diff%eval(&maxtime*&subjects));
total+sum(of diff1-diff%eval(&maxtime*&subjects));
if lastone=1 then
do;
average=total/nonmissing;
output;
end;
run;
The DATA step calculates the average of all differences. It accumulates the number of nonmissing
differences and the total of nonmissing differences. After reading all the difference values in the data set,
it calculates the average difference and writes the average to a data set.
proc print data=average_variance;
run;

%mend variance;
3. This program computes a likelihood ratio test that compares a random coefficients model with
the cubic effect of time with a random coefficients model with the quadratic effect of time for
the aids data set.
ods select none;

proc mixed data=sasuser.aids;


model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*drug time*partners time*cigarettes
time*time time*time*time / ddfm=kr;
random intercept time time*time time*time*time /
type=un subject=id;
ods output lrt=cubicmodel;
run;

proc mixed data=sasuser.aids;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-7

model cd4_scale=time age cigarettes drug partners


depression time*age time*depression
time*drug time*partners time*cigarettes
time*time time*time*time / ddfm=kr;
random intercept time time*time / type=un subject=id;
ods output lrt=quadmodel;
run;
The output object of interest is called LRT, which has the results of the null model likelihood ratio test.
The SAS data set cubicmodel has the results of the cubic model while quadmodel has the results of the
quadratic model.
data likelihood;
merge cubicmodel (rename=(df=df3 chisq=chisq3))
quadmodel (rename=(df=df2 chisq=chisq2));
testcubic=chisq3-chisq2;
dfcubic=df3-df2;
pvaluecubic=1-probchi(testcubic,dfcubic);
run;
The variable CHISQ has the likelihood ratio test statistic while DF has the degrees of freedom.
Subtracting the two test statistics and the two degrees of freedom will give the information needed to
compute the likelihood ratio test comparing the two models. The PROBCHI function is used to compute
the p-value for the test.

 If one of the variance components is 0, then the NOBOUND option should be used when
computing the likelihood ratio test.
ods select all;

proc print data=likelihood split='*' noobs;


var testcubic dfcubic pvaluecubic;
label testcubic='likelihood ratio*test
statistic*comparing*cubic model to quadratic
model'
dfcubic='degrees of*freedom'
pvaluecubic='p-value';
title 'Likelihood Ratio Test for the Cubic Model';
run;

Likelihood Ratio Test for the Cubic Model

likelihood ratio
test statistic
comparing degrees of
cubic model to quadratic model freedom p-value

23.9263 5 .000224306

The test statistic is clearly significant. Therefore, the cubic effect for time should remain in the
RANDOM statement.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-8 Appendix B Additional Resources

4. This program computes a likelihood ratio test for the fixed effects comparing the full model with a
model without the three non-significant interactions for the aids data set. The program only computes
the degrees of freedom correctly when there are no classification variables in the MODEL statement.
ods select none;

proc mixed data=sasuser.aids method=ml;


model cd4_scale=time age cigarettes drug partners
depression time*age time*depression
time*drug time*partners time*cigarettes
time*time time*time*time / ddfm=kr;
random intercept time time*time time*time*time /
type=un subject=id;
ods output dimensions=fulldim
fitstatistics=fullfit;
run;
proc mixed data=sasuser.aids method=ml;
model cd4_scale=time age cigarettes drug partners
depression time*age
time*cigarettes time*time time*time*time /
ddfm=kr;
random intercept time time*time time*time*time /
type=un subject=id;
ods output dimensions=subsetdim
fitstatistics=subsetfit;
run;
The output objects of interest are DIMENSIONS, which has the number of fixed effect parameters
in the model, and FITSTATISTICS, which has the –2 log likelihood value for the model.
The SAS data set
 fulldim has the number of fixed effect parameters in the full model
 fullfit has the –2 log likelihood value for the full model
 subsetdim has the number of fixed effect parameters in the subset model
 subsetfit has the –2 log likelihood value for the subset model

 If a fixed effect is in the CLASS statement, then the degrees of freedom must be calculated
differently.
data _null_;
set fulldim;
if descr='Columns in X';
call symput('dffull',value);
run;

data _null_;
set subsetdim;
if descr='Columns in X';
call symput('dfsubset',value);
run;
data _null_;
set fullfit;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.1 Programs B-9

if descr='-2 Log Likelihood';


call symput('fulllr',value);
run;

data _null_;
set subsetfit;
if descr='-2 Log Likelihood';
call symput('subsetlr',value);
run;
The four DATA _null_ steps create four macro variables with the two values of fixed-effect parameters
and the two values of –2 log likelihoods.
data results;
testfull=&subsetlr-&fulllr;
dffull=&dffull-&dfsubset;
pvaluefull=1-probchi(testfull,dffull);
run;
The testfull variable has the likelihood ratio test statistic value and the dffull variable has the test statistic
degrees of freedom.
ods select all;

proc print data=results split='*';


var testfull dffull pvaluefull;
label testfull='likelihood ratio*test statistic*
comparing*full model to subset model'
dffull='degrees of*freedom'
pvaluefull='p-value';
title 'Likelihood Ratio Test for the 3 Fixed Effect '
'Interactions';
run;

Likelihood Ratio Test for the 3 Fixed Effect Interactions

likelihood ratio
test statistic
comparing degrees of
Obs full model to subset model freedom p-value

1 2.37334 3 0.49862

The likelihood ratio test is clearly not significant. Therefore, the three interaction terms should be
eliminated from the model.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-10 Appendix B Additional Resources

B.2 Model Diagnostics for GEE Regression


Models

Objectives

• Define the GEE case deletion diagnostic statistics created in PROC


GENMOD.
• Plot the GEE case deletion diagnostic statistics.

4
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Case Deletion Diagnostic Statistics

• PROC GENMOD now computes GEE diagnostics, which account for the
leverage and residuals in a set of observations to determine their influence
on regression parameter estimates and fitted values.
• PROC GENMOD also computes observation-deletion diagnostics and
cluster-deletion diagnostics.
• The diagnostics are generalizations of Cook’s D, dfbeta, and leverage for
general linear models and generalized linear models.
• There are no published cutoffs for the GEE diagnostic statistics.

5
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-11

Preisser and Qaqish (1996) proposed case-deletion regression diagnostics for the GEE methodology. The
diagnostics are an approximation to the difference in the estimated regression coefficients that one would
obtain upon deleting either one observation or one cluster. They proposed the computationally feasible
one-step approximation diagnostics, which are similar to the ones proposed by Pregibon (1981) for
generalized linear models. The authors also showed that the one-step diagnostic statistics were very good
approximations to their exact diagnostic counterparts and their computations were very fast. They
recommend that these diagnostic statistics be routinely used in data analysis for GEE models. However,
there are no published cutoffs for these diagnostic statistics.

Leverage

• Values range from 0 to 1.


• Large leverage values might indicate that an observation or cluster is
influential.
• The sum of all leverage values is equal to p, the number of parameters in
the model.
• Values are produced for both observations and clusters.

6
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

The leverage is simply the diagonal elements in the hat matrix, which corresponds to the amount
of influence the observation has on the fitted values. PROC GENMOD produces leverage values for both
observations and clusters.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-12 Appendix B Additional Resources

Cook’s D Statistic

• The statistic is a measure of the simultaneous change in the parameter


estimates when an observation or cluster is deleted from the analysis.
• The measure is scaled by the variance estimate of the parameter estimate
based on all the observations.
• Statistics are produced for both observations and clusters.

7
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Cook’s D statistic (Cook 1977, 1979) measures the influence of observations on the estimated values
of the linear predictor. These diagnostic statistics measure the influence of a deleted observation or cluster
on the overall fit of the model. PROC GENMOD also produces these statistics for both observations and
clusters.

Cluster Fit Statistic

The cluster lack-of-fit statistic does the following:


• examines the lack of fit of a cluster
• differs from Cook’s D in that the measure is scaled by the variance of the
parameter estimate based on all but a subset of observations that defines
the cluster.

8
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-13

Another diagnostic statistic proposed by Preisser and Qaqish (1996) examines the lack of fit of a cluster.
The difference between this statistic and Cook’s D is that the cluster is deleted from both the numerator
and denominator. Cook’s D is useful in that the comparison of distances between clusters is meaningful
because they refer to the same metric. However, because the deleted cluster influences the estimate
of the variance, its inclusion might decrease the magnitude of the diagnostic and might hide influence.
Cook (1986) points out that the studentized diagnostic (Preisser lack-of-fit statistic) has a different
interpretation than the standardized version (Cook’s D). The studentized diagnostic has the interpretation
of the influence of the cluster on the parameter estimates and the variance estimate of the parameter
estimates simultaneously. Therefore, Preisser and Qaqish (1996) recommend that the question “influence
on what?” should be the determining factor in your choice of statistics.

Dfbetas

• Dfbetas measure the change in parameter estimates when an observation


or cluster is deleted from the analysis.
• They are produced for both observations and clusters.
• Both the standardized and unstandardized versions can be specified.

9
Copyri g ht © S A S Insti tute Inc. A l l ri g hts reserved.

PROC GENMOD computes four dfbeta statistics. The statistic dfbetac is the effect of deleting a cluster,
dfbetacs is the standardized version of the cluster deletion statistic, dfbetao is the effect of deleting an
observation, and dfbetaos is the standardized version of the observation deletion statistic.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-14 Appendix B Additional Resources

GEE Diagnostic Plots

Example: Fit a GEE model on the keratotomy data and specify the unstructured correlation structure
and reference cell coding for gender with female as the reference cell. Produce the ODS
Statistical Graphics on the GEE diagnostic plots by requesting the cluster leverage plot, the
cluster Cook’s D plot, the cluster lack of fit plot, and the standardized cluster dfbeta plot and
label the observations by the case number. Also create a SAS data set with the diagnostic
statistics for both the observations and the clusters.
/* long0bd01.sas */
proc genmod data=sasuser.keratotomy desc plots(clusterlabel)=
(cleverage dcls mcls dfbetacs);
class patientid gender (param=ref ref='Female');
model unstable = age diameter gender visit / dist=bin;
repeated subject = patientid / type=unstr;
output out=diagnostics CLEVERAGE=cleverage CLUSTER=cluster
CLUSTERCOOKSD=clustercooksd DFBETAC=_all_ DFBETACS=_all
CLUSTERDFIT=clusterdfit COOKSD=cooksd LEVERAGE=leverage;
title 'GEE Model of Radial Keratotomy Surgery';
run;
Selected PROC GENMOD statement options:
PLOTS= specifies plots to be created using ODS Graphics. Many of the observational
statistics in the output data set can be plotted using this option. You are not
required to create an output data set in order to produce a plot.
Selected global plot option:
CLUSTERLABEL displays formatted levels of the SUBJECT= effect instead of plot symbols. This
option applies only to diagnostic statistics for models fit by GEEs that are plotted
against cluster number, and provides a way to identify cluster level names with
corresponding ordered cluster numbers.
Selected plot options:
CLEVERAGE plots the cluster leverage as a function of ordered cluster.
DCLS plots the cluster Cook’s distance statistic as a function of ordered cluster.

MCLS plots the studentized cluster Cook’s distance statistic as a function of ordered
cluster.
DFBETACS plots the standardized cluster deletion statistic as a function of ordered cluster for
each regression parameter in the model.
Selected OUTPUT statement keywords:
CLEVERAGE represents the leverage of a cluster.
CLUSTER represents the numerical cluster index, in order of sorted clusters.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-15

CLUSTERCOOKSD represents the Cook distance-type statistic to measure the influence of deleting an
entire cluster on the overall model fit.
DFBETAC= represents the effect of deleting an entire cluster on parameter estimates. If you
specify the keyword _all_ after the equal sign, variables named
DFBETAC_ParameterName will be included in the output data set to contain the
values of the diagnostic statistic to measure the influence of deleting the cluster
on the individual parameter estimates. ParameterName is the name of the
regression model parameter formed from the input variable names concatenated
with the appropriate levels, if classification variables are involved.
DFBETACS= represents the effect of deleting an entire cluster on normalized parameter
estimates. If you specify the keyword _all_ after the equal sign, variables named
DFBETACS_ParameterName will be included in the output data set to contain the
values of the diagnostic statistic to measure the influence of deleting the cluster
on the individual parameter estimates, normalized by their standard errors.
CLUSTERDFIT represents the studentized Cook distance-type statistic to measure the influence
of deleting an entire cluster on the overall model fit.
COOKSD represents the Cook distance type statistic to measure the influence of deleting a
single observation on the overall model fit.
LEVERAGE represents the leverage of a single observation.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-16 Appendix B Additional Resources

Several clusters appear to have high GEE diagnostic statistics. You should examine influential clusters
and determine whether they are erroneous. If these clusters are legitimate, then they might represent
important new findings. They also might indicate that your current model is inadequate.

The observation labels are difficult to see because so many observations fall in the same space in the
diagnostic plot.
Example: Print the clusters that exceed the 99th percentile for the distribution of Cook’s D for the cluster
and the cluster lack of fit statistic.
proc rank data=diagnostics groups=100 out=percentiles;
var clustercooksd clusterdfit;
ranks percentilecooksd percentiledfit;
run;

proc print data=percentiles noobs;


where percentilecooksd = 99 or percentiledfit = 99;
var patientid unstable gender age diameter visit clustercooksd
cooksd cleverage leverage clusterdfit;
title 'Clusters with Possible Outlying Covariate Patterns';
run;

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B.2 Model Diagnostics for GEE Regression Models B-17

Clusters with Possible Outlying Covariate Patterns

c
l
u c
s l
p t c u
a u d e l l s
t n i r e e t
i s g a c c v v e
e t e m v o o e e r
n a n e i o o r r d
t b d a t s k k a a f
i l e g e i s s g g i
d e r e r t d d e e t

99 1 Female 23.2252 3.0 1 0.017523 .003971849 0.021370 .006157116 0.017214


99 1 Female 23.2252 3.0 4 0.017523 .000278872 0.021370 .006504543 0.017214
99 1 Female 23.2252 3.0 10 0.017523 .000366719 0.021370 .008708255 0.017214
131 0 Male 51.9042 3.5 1 0.016831 .000559362 0.026457 .008853698 0.016457
131 0 Male 51.9042 3.5 4 0.016831 .000002554 0.026457 .008896500 0.016457
131 0 Male 51.9042 3.5 10 0.016831 .006604576 0.026457 .008707161 0.016457
186 1 Female 23.1841 3.0 1 0.017579 .003982858 0.021422 .006170329 0.017269
186 1 Female 23.1841 3.0 4 0.017579 .000279700 0.021422 .006521120 0.017269
186 1 Female 23.1841 3.0 10 0.017579 .000367831 0.021422 .008730165 0.017269

The output shows which clusters had a high cluster Cook’s D or a high cluster lack of fit statistic.

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.
B-18 Appendix B Additional Resources

Copyright © 2017, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED.

You might also like