Statistics
Multivariate Generalized Linear Mixed Models Using R
Multivariate Generalized Linear Mixed Models Using R presents
robust and methodologically sound models for analyzing large and
complex data sets, enabling readers to answer increasingly complex
research questions. The book applies the principles of modeling
to longitudinal data from panel and related studies via the Sabre
software package in R.
The authors first discuss members of the family of generalized linear
models, gradually adding complexity to the modeling framework by
incorporating random effects. After reviewing the generalized linear
model notation, they illustrate a range of random effects models,
including three-level, multivariate, endpoint, event history, and state
dependence models. They estimate the multivariate generalized
linear mixed models (MGLMMs) using either standard or adaptive
Gaussian quadrature. The authors also compare two-level fixed and
random effects linear models. The appendices contain additional
information on quadrature, model estimation, and endogenous
variables, along with SabreR commands and examples.
In medical and social science research, MGLMMs help disentangle
state dependence from incidental parameters. Focusing on these
sophisticated data analysis techniques, this book explains the
statistical theory and modeling involved in longitudinal studies. Many
examples throughout the text illustrate the analysis of real-world
data sets. Exercises, solutions, and other material are available on a
supporting website. Berridge • Crouchley
K10680
K10680_Cover.indd 1 3/17/11 10:00 AM
Multivariate Generalized
Linear Mixed Models
Using R
Damon M. Berridge
Robert Crouchley
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2011 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20111012
International Standard Book Number-13: 978-1-4398-1327-0 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
List of Figures xi
List of Tables xiii
List of Applications xv
List of Datasets xvii
Preface xix
Acknowledgments xxiii
1 Introduction 1
2 Generalized linear models for continuous/interval scale
data 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Continuous/interval scale data . . . . . . . . . . . . . 10
2.3 Simple and multiple linear regression models . . . . . 11
2.4 Checking assumptions in linear regression models . . . 12
2.5 Likelihood: multiple linear regression . . . . . . . . . . 13
2.6 Comparing model likelihoods . . . . . . . . . . . . . . 14
2.7 Application of a multiple linear regression model . . . 15
2.8 Exercises on linear models . . . . . . . . . . . . . . . . 17
3 Generalized linear models for other types of data 21
3.1 Binary data . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Logistic regression . . . . . . . . . . . . . . . . . 22
3.1.3 Logit and probit transformations . . . . . . . . 23
3.1.4 General logistic regression . . . . . . . . . . . . 24
3.1.5 Likelihood . . . . . . . . . . . . . . . . . . . . . 24
3.1.6 Example with binary data . . . . . . . . . . . . 24
3.2 Ordinal data . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . 26
iii
iv Contents
3.2.2 The ordered logit model . . . . . . . . . . . . . 27
3.2.3 Dichotomization of ordered categories . . . . . . 29
3.2.4 Likelihood . . . . . . . . . . . . . . . . . . . . . 29
3.2.5 Example with ordered data . . . . . . . . . . . 30
3.3 Count data . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . 32
3.3.2 Poisson regression models . . . . . . . . . . . . 33
3.3.3 Likelihood . . . . . . . . . . . . . . . . . . . . . 34
3.3.4 Example with count data . . . . . . . . . . . . . 34
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Family of generalized linear models 43
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 The linear model . . . . . . . . . . . . . . . . . . . . . 44
4.3 The binary response model . . . . . . . . . . . . . . . 44
4.4 The Poisson model . . . . . . . . . . . . . . . . . . . . 46
4.5 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Mixed models for continuous/interval scale data 49
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Linear mixed model . . . . . . . . . . . . . . . . . . . 49
5.3 The intraclass correlation coefficient . . . . . . . . . . 51
5.4 Parameter estimation by maximum likelihood . . . . . 53
5.5 Regression with level-two effects . . . . . . . . . . . . 54
5.6 Two-level random intercept models . . . . . . . . . . . 55
5.7 General two-level models including random intercepts 56
5.8 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.9 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.10 Checking assumptions in mixed models . . . . . . . . 59
5.11 Comparing model likelihoods . . . . . . . . . . . . . . 60
5.12 Application of a two-level linear model . . . . . . . . . 61
5.13 Two-level growth models . . . . . . . . . . . . . . . . 66
5.13.1 A two-level repeated measures model . . . . . . 66
5.13.2 A linear growth model . . . . . . . . . . . . . . 66
5.13.3 A quadratic growth model . . . . . . . . . . . . 67
5.14 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.15 Example using linear growth models . . . . . . . . . . 68
5.16 Exercises using mixed models for continuous/interval
scale data . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Mixed models for binary data 75
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 75
Contents v
6.2 The two-level logistic model . . . . . . . . . . . . . . . 75
6.3 General two-level logistic models . . . . . . . . . . . . 77
6.4 Intraclass correlation coefficient . . . . . . . . . . . . . 77
6.5 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.6 Example using binary data . . . . . . . . . . . . . . . 78
6.7 Exercises using mixed models for binary data . . . . . 81
7 Mixed models for ordinal data 85
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 The two-level ordered logit model . . . . . . . . . . . . 85
7.3 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 Example using mixed models for ordered data . . . . . 87
7.5 Exercises using mixed models for ordinal data . . . . . 90
8 Mixed models for count data 93
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 The two-level Poisson model . . . . . . . . . . . . . . . 93
8.3 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.4 Example using mixed models for count data . . . . . . 95
8.5 Exercises using mixed models for count data . . . . . 97
9 Family of two-level generalized linear models 99
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 99
9.2 The mixed linear model . . . . . . . . . . . . . . . . . 100
9.3 The mixed binary response model . . . . . . . . . . . 100
9.4 The mixed Poisson model . . . . . . . . . . . . . . . . 102
9.5 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 102
10 Three-level generalized linear models 105
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 105
10.2 Three-level random intercept models . . . . . . . . . . 105
10.3 Three-level generalized linear models . . . . . . . . . . 106
10.4 Linear models . . . . . . . . . . . . . . . . . . . . . . . 107
10.5 Binary response models . . . . . . . . . . . . . . . . . 108
10.6 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 108
10.7 Example using three-level generalized linear models . 109
10.8 Exercises using three-level generalized linear mixed
models . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
11 Models for multivariate data 115
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 115
11.2 Multivariate two-level generalized linear model . . . . 116
vi Contents
11.3 Bivariate Poisson model: example . . . . . . . . . . . . 117
11.4 Bivariate ordered response model: example . . . . . . 121
11.5 Bivariate linear-probit model: example . . . . . . . . . 126
11.6 Multivariate two-level generalized linear model
likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.7 Exercises using multivariate generalized linear mixed
models . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
12 Models for duration and event history data 135
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 135
12.1.1 Left censoring . . . . . . . . . . . . . . . . . . . 135
12.1.2 Right censoring . . . . . . . . . . . . . . . . . . 135
12.1.3 Time-varying explanatory variables . . . . . . . 136
12.1.4 Competing risks . . . . . . . . . . . . . . . . . . 136
12.2 Duration data in discrete time . . . . . . . . . . . . . 137
12.2.1 Single-level models for duration data . . . . . . 137
12.2.2 Two-level models for duration data . . . . . . . 139
12.2.3 Three-level models for duration data . . . . . . 140
12.3 Renewal data . . . . . . . . . . . . . . . . . . . . . . . 143
12.3.1 Introduction . . . . . . . . . . . . . . . . . . . . 143
12.3.2 Example: renewal models . . . . . . . . . . . . . 145
12.4 Competing risk data . . . . . . . . . . . . . . . . . . . 147
12.4.1 Introduction . . . . . . . . . . . . . . . . . . . . 147
12.4.2 Likelihood . . . . . . . . . . . . . . . . . . . . . 148
12.4.3 Example: competing risk data . . . . . . . . . . 150
12.5 Exercises using renewal and competing risks models . 153
13 Stayers, non-susceptibles and endpoints 157
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 157
13.2 Mover-stayer model . . . . . . . . . . . . . . . . . . . 157
13.3 Likelihood incorporating the mover-stayer model . . . 160
13.4 Example 1: stayers within count data . . . . . . . . . 161
13.5 Example 2: stayers within binary data . . . . . . . . . 164
13.6 Exercises: stayers . . . . . . . . . . . . . . . . . . . . . 166
14 Handling initial conditions/state dependence in binary
data 169
14.1 Introduction to key issues: heterogeneity, state
dependence and non-stationarity . . . . . . . . . . . . 169
14.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 170
14.3 Random effects models . . . . . . . . . . . . . . . . . . 171
14.4 Initial conditions problem . . . . . . . . . . . . . . . . 172
Contents vii
14.5 Initial treatment . . . . . . . . . . . . . . . . . . . . . 173
14.6 Example: depression data . . . . . . . . . . . . . . . . 174
14.7 Classical conditional analysis . . . . . . . . . . . . . . 174
14.8 Classical conditional model: example . . . . . . . . . . 175
14.9 Conditioning on initial response but allowing random
effect u0j to be dependent on zj . . . . . . . . . . . . 176
14.10 Wooldridge conditional model: example . . . . . . . . 177
14.11 Modelling the initial conditions . . . . . . . . . . . . . 178
14.12 Same random effect in the initial response and
subsequent response models with a common
scale parameter . . . . . . . . . . . . . . . . . . . . . . 179
14.13 Joint analysis with a common random effect: example 180
14.14 Same random effect in models of the initial response
and subsequent responses but with different
scale parameters . . . . . . . . . . . . . . . . . . . . . 181
14.15 Joint analysis with a common random effect (different
scale parameters): example . . . . . . . . . . . . . . . 182
14.16 Different random effects in models of the initial response
and subsequent responses . . . . . . . . . . . . . . . . 183
14.17 Different random effects: example . . . . . . . . . . . . 184
14.18 Embedding the Wooldridge approach in joint models for
the initial response and subsequent responses . . . . . 185
14.19 Joint model incorporating the Wooldridge approach: ex-
ample . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
14.20 Other link functions . . . . . . . . . . . . . . . . . . . 187
14.21 Exercises using models incorporating initial conditions/
state dependence in binary data . . . . . . . . . . . . 188
15 Incidental parameters: an empirical comparison of fixed
effects and random effects models 195
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 195
15.2 Fixed effects treatment of the two-level linear model . 197
15.3 Dummy variable specification of the fixed effects
model . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
15.4 Empirical comparison of two-level fixed effects and
random effects estimators . . . . . . . . . . . . . . . . 200
15.5 Implicit fixed effects estimator . . . . . . . . . . . . . 204
15.6 Random effects models . . . . . . . . . . . . . . . . . . 204
15.7 Comparing two-level fixed effects and random effects
models . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
15.8 Fixed effects treatment of the three-level linear
model . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
viii Contents
15.9 Exercises comparing fixed effects and random effects . 209
A SabreR installation, SabreR commands, quadrature,
estimation, endogenous effects 215
A.1 SabreR installation . . . . . . . . . . . . . . . . . . . . 215
A.2 SabreR commands . . . . . . . . . . . . . . . . . . . . 215
A.2.1 The arguments of the SabreR object . . . . . . 215
A.2.2 The anatomy of a SabreR command file . . . . 216
A.3 Quadrature . . . . . . . . . . . . . . . . . . . . . . . . 218
A.3.1 Standard Gaussian quadrature . . . . . . . . . . 218
A.3.2 Performance of Gaussian quadrature . . . . . . 219
A.3.3 Adaptive quadrature . . . . . . . . . . . . . . . 221
A.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . 223
A.4.1 Maximizing the log likelihood of random effects
models . . . . . . . . . . . . . . . . . . . . . . . 223
A.5 Fixed effects linear models . . . . . . . . . . . . . . . . 225
A.6 Endogenous and exogenous variables . . . . . . . . . . 226
B Introduction to R for Sabre 229
B.1 Getting started with R . . . . . . . . . . . . . . . . . . 229
B.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . 229
B.1.1.1 Working with R in interactive mode . 229
B.1.1.2 Basic functions . . . . . . . . . . . . . 231
B.1.1.3 Getting help . . . . . . . . . . . . . . . 232
B.1.1.4 Stopping R . . . . . . . . . . . . . . . 232
B.1.2 Creating and manipulating data . . . . . . . . . 232
B.1.2.1 Vectors and lists . . . . . . . . . . . . 232
B.1.2.2 Vectors . . . . . . . . . . . . . . . . . 233
B.1.2.3 Vector operations . . . . . . . . . . . . 234
B.1.2.4 Lists . . . . . . . . . . . . . . . . . . . 235
B.1.2.5 Data frames . . . . . . . . . . . . . . . 236
B.1.3 Session management . . . . . . . . . . . . . . . 237
B.1.3.1 Managing objects . . . . . . . . . . . . 237
B.1.3.2 Attaching and detaching objects . . . 237
B.1.3.3 Serialization . . . . . . . . . . . . . . . 238
B.1.3.4 R scripts . . . . . . . . . . . . . . . . . 238
B.1.3.5 Batch processing . . . . . . . . . . . . 239
B.1.4 R packages . . . . . . . . . . . . . . . . . . . . . 239
B.1.4.1 Loading a package into R . . . . . . . 239
B.1.4.2 Installing a package for use in R . . . 239
B.1.4.3 R and Statistics . . . . . . . . . . . . . 240
B.2 Data preparation for SabreR . . . . . . . . . . . . . . 240
Contents ix
B.2.1 Creation of dummy variables . . . . . . . . . . . 240
B.2.2 Missing values . . . . . . . . . . . . . . . . . . . 243
B.2.3 Creating lagged response covariate data . . . . 245
References 249
Author Index 259
Subject Index 263
This page intentionally left blank
List of Figures
11.1 The relationship between wages and trade union
membership: I . . . . . . . . . . . . . . . . . . . . . . . . 127
11.2 The relationship between wages and trade union
membership: II . . . . . . . . . . . . . . . . . . . . . . . 127
11.3 The relationship between wages and trade union
membership: III . . . . . . . . . . . . . . . . . . . . . . . 128
12.1 Duration data . . . . . . . . . . . . . . . . . . . . . . . . 136
12.2 Diagrammatic representation of renewal data . . . . . . 143
12.3 Example of competing risk data: failure due to two failure
mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . 148
12.4 Data required to model failure due to mechanism A . . 148
12.5 Data required to model failure due to mechanism B . . 149
13.1 The normal distribution . . . . . . . . . . . . . . . . . . 158
13.2 Quadrature points approximating the normal
distribution . . . . . . . . . . . . . . . . . . . . . . . . . 158
13.3 Quadrature with left and right endpoints . . . . . . . . 159
13.4 Quadrature with left endpoint only . . . . . . . . . . . . 159
B.1 First few lines of essays.tab . . . . . . . . . . . . . . . 241
B.2 First few lines of new dataset essays2.tab . . . . . . . 242
B.3 First few lines of thaieduc.tab . . . . . . . . . . . . . . 243
B.4 Ungrouped depression data (depression0.tab) . . . . . 246
B.5 First few lines of depression.tab . . . . . . . . . . . . 247
B.6 First few lines of new dataset depression2.tab . . . . 248
xi
This page intentionally left blank
List of Tables
11.1 Crosstabulation of dvisits by prescrib . . . . . . . . 119
12.1 Sample of duration data in continuous time . . . . . . . 138
12.2 Sample of duration data, reconfigured in discrete time . 138
12.3 Sample of renewal data in continuous time . . . . . . . . 144
12.4 Sample of renewal data, reconfigured in discrete time . . 144
12.5 Sample of competing risk data in continuous time . . . . 149
12.6 Sample of competing risk data, reconfigured in discrete
time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
13.1 Observed migration frequencies . . . . . . . . . . . . . . 162
14.1 Depression data (1 = depressed, 0 = not depressed) . . 171
xiii
This page intentionally left blank
List of Applications
Angina pectoris (renewal data), 153
Attitudes to abortion, 6, 39, 91
Attitudes to gender roles (bivariate ordered data), 115, 121
Choosing teaching as a profession, 30, 87
Demand for health care, 34, 95, 216
Demand for health care (bivariate count data), 115–117
Depression, 170, 172, 174, 175, 177, 180, 182, 184, 187, 245
Educational attainment, 3, 18, 70
Effect of education on log wages, 210
Effect of job training on firm scrap rates, 209
Epileptic seizures, 7, 40, 97
Essay grading, 240
Essay grading (binary response), 4, 37, 81
Essay grading (continuous response), 2, 17, 70
Essay grading (ordered response), 5, 38, 90
Expiratory flow rates (bivariate data), 131
Female employment participation (stayers in binary data), 167
Female UK labour force participation, 191
Filled and lapsed vacancies (competing risk data), 150
Filling job vacancies (competing risk data), 116
Filling vacancies (three-level data), 140
Fish caught by US National Park visitors (stayers in count data), 168
German unemployment (competing risk data), 155
Headaches, 7
Headaches (count data), 40, 97
Immunization of Guatemalan children, 5, 38, 83
Immunization of Guatemalan children (binary response), 113
Log wages (three-level data), 109
xv
xvi List of Applications
Mathematics achievement, 15, 61, 105
Migration moves (binary data), 164
Migration moves (count data), 161
Patents and R&D expenditure, 192
Psychological distress, 2, 10, 11, 14, 17, 49, 52, 55, 56, 60, 69
Pupil rating of school managers, 4, 19, 72
Pupil rating of school managers (three-level data), 208
Repeating a grade, 24, 78, 243
Residential mobility, 145
Respiratory status, 6, 39, 92
Skin cancer deaths, 7, 41, 98, 113
Student evaluation of teachers, 68
Tower of London, 5, 37, 82
Tower of London (binary response), 112
Trade union membership, 4, 37, 81
Trade union membership (stayers in binary data), 166
Trade union membership of females, 189
Trade union membership of young males, 188
Unemployment claims, 3, 18, 71
Wage determinants, 3, 18, 72
Wages and trade union membership, 116, 121
Wages and trade union membership (bivariate data), 126, 132
Wages of young women, 200
List of Datasets
abortion2.tab, 6, 39, 91
angina.tab, 154
deaths.tab, 7, 41, 98, 114
depression.tab, 174, 179, 180, 182, 184, 187, 247
depression0.tab, 245, 246
depression1.tab, 248
depression2.tab, 174, 175, 177, 247, 248
epilep.tab, 7, 40, 97
essays.tab, 240, 242
essays2.tab, 4, 37, 81, 242
essays ordered.tab, 6, 38, 90
ezunem2.tab, 3, 18, 71
fish.tab, 168
ghq2.tab, 2, 10, 70
grader1.tab, 2, 70
grader2.tab, 2, 17
guatemala immun.tab, 5, 38, 83, 113
headache2.tab, 7, 40, 97
hsb.tab, 15, 61
jtrain.tab, 209
labour.tab, 167
manager.tab, 4, 19, 72, 110
neighbourhood.tab, 3, 18, 70
nls.tab, 126, 128, 166, 200
nlswage-union.tab, 126
opfama.tab, 121, 122
opfamaf.tab, 124
opfamf.tab, 122, 123
patents.tab, 192
pefr.tab, 132
racd.tab, 35, 95, 216, 217
respiratory2.tab, 7, 39, 92
roch.tab, 145
rochmig.tab, 164
rochmigx.tab, 161
xvii
xviii List of Datasets
teacher1.tab, 30
teacher2.tab, 30, 87
thaieduc.tab, 243, 244
thaieduc1.tab, 24, 78, 244, 245
thaieduc2.tab, 78, 245
tower1.tab, 5, 37, 82, 112
unemployedR.tab, 155
unionjmw1.tab, 188, 189
unionjmw2.tab, 188, 189
unionred1.tab, 189, 190
unionred2.tab, 189, 190
vacancies.tab, 151
visit-prescribe.tab, 118
vwks4 30k.tab, 140
wagepan.tab, 3, 5, 19, 37, 72, 81, 132
wagepan2.tab, 210
wemp-base1.tab, 191
wemp-base2.tab, 191
Preface
The main aims of this book are to provide an introduction to the princi-
ples of modelling as applied to longitudinal data from panel and related
studies with the necessary statistical theory, and to describe the applica-
tion of these principles to the analysis of a wide range of examples using
the Sabre software (http://sabre.lancs.ac.uk/) from within R.
This material on multivariate generalized linear mixed models arises
from the activities at the Economic and Social Research Council
(ESRC)-funded Colaboratory for Quantitative e-Social Science (CQeSS)
at Lancaster University from 2003 to 2008. Sabre is a program for the
statistical analysis of multi-process event/response sequences. These re-
sponses can take the form of binary, ordinal, count and linear recurrent
events. The response sequences can also be of different types, for ex-
ample, a linear response (wages) and a binary one (trade union mem-
bership). Such multi-process data are common in many research areas,
for example, in the analysis of work and life histories from the British
Household Panel Survey or the German Socio-Economic Panel Study
where researchers often want to disentangle state dependence (the effect
of previous responses or related outcomes) from any omitted effects that
might be present in recurrent behaviour (for example, unemployment).
Understanding the need to disentangle these generic substantive issues
dates back to the study of accident proneness in the 1950s and has since
been discussed in many applied areas, including consumer behaviour
and voting behaviour. These issues, and others relating to the analysis
of longitudinal or event history data, are discussed in more detail in the
following text:
• Shahtahmasebi, S. and Berridge, D. (2010) Conceptualizing Human
Behaviour in Health and Social Research: A Practical Guide to
Data Analysis, New York: Nova
Some key contributions in the References, including a number of
Heckman’s seminal works, have been reprinted in the following series:
1. Penn, R. and Berridge, D. (2010) Social Statistics Volume 1: The
Fundamentals of Descriptive Social Statistics, London: Sage
2. Penn, R. and Berridge, D. (2010) Social Statistics Volume 2: The
Development of Statistical Modelling, London: Sage
xix
xx Preface
3. Penn, R. and Berridge, D. (2010) Social Statistics Volume 3: Sta-
tistical Modelling of Longitudinal Data, London: Sage
4. Penn, R. and Berridge, D. (2010) Social Statistics Volume 4: Sta-
tistical Modelling of Ordinal Categorical Data, London: Sage
Those contributions appearing in this series are indicated by asterisks
in the References. One asterisk indicates Volume 1, two asterisks indicate
Volume 2, and so on.
Sabre can also be used to model collections of single sequences such as
may occur in medical trials on the number of headaches experienced over
a sequence of weeks, or in single-equation descriptions of cross-sectional
clustered data such as the educational attainment of children in schools.
Sabre is available in three forms: (1) stand-alone (as discussed in
Shahtahmasebi and Berridge, 2010), (2) the R plugin (as discussed in
the current text), and (3) the Stata plugin (as discussed on the Sabre
web page — see above).
The class of models that can be estimated by Sabre may be termed
Multivariate Generalized Linear Mixed Models (MGLMMs). These mod-
els have special features to help them disentangle state dependence from
the incidental parameters (omitted or unobserved effects). The incidental
parameters can be treated as random or fixed. The random effects mod-
els can be estimated with standard Gaussian quadrature or adaptive
Gaussian quadrature. Quadrature methods (and particularly adaptive
Gaussian quadrature) are the most reliable way of handling random ef-
fects in MGLMMs, as the adequacy of the numerical integration can be
improved by adding more quadrature points. The number of quadrature
points required will depend on the model being estimated. If additional
quadrature points fail to improve the log likelihood, then we have found
an accurate evaluation of the integral. Even though the linear model
integral has a closed form solution, we do not use it as it cannot eas-
ily be used in multivariate models when some of the joint sequences
do not have interval level responses. Also current computational facili-
ties on many desktop computers often make the delay involved in using
numerical integration for the linear model negligible for many small to
medium-sized data sets. ‘End effects’ can also be added to the models to
accommodate ‘stayers’ or ‘non-susceptibles’. The fixed effects algorithm
we have developed uses code for large sparse matrices from the Harwell
Subroutine Library; see http://www.cse.scitech.ac.uk/nag/hsl/.
Also included in Sabre is the option to undertake all the calculations
using increased accuracy. Numerical underflow and overflow often occur
in the estimation process for models with incidental parameters. We
suppose that many of the alternative software systems truncate their
Preface xxi
calculations without informing the user when this happens as there is
little discussion of this in their respective user manuals.
This book is written in a way that we have found appropriate for
some of our short courses. The book starts by discussing members of the
family of generalized linear models and gradually adds complexity to the
modelling framework by incorporating random effects. We then review
the generalized linear model notation before illustrating a range of more
substantively appropriate random effects models, for example, the three-
level model, multivariate (in particular, bivariate and trivariate) models,
endpoint, event history and state dependence models. The MGLMMs are
estimated using either standard Gaussian quadrature or adaptive Gaus-
sian quadrature. The book compares two-level fixed and random effects
linear models. Additional information on quadrature, model estimation
and endogenous variables is included in Appendix A. Appendix B con-
tains an introduction to R and some examples of using R to pre-process
the data for Sabre.
There are two other related SabreR booklets available from the Sabre
web page:
• Exercises for SabreR
• Solutions Manual for SabreR Exercises
These booklets contain the exercises and solutions on small data sets
that have been written to accompany this book. These exercises will run
quickly on a desktop PC.
Drafts of the chapters of this book were developed and revised in the
process of preparing and delivering short courses in ‘Statistical Mod-
elling using Sabre’, ‘Multilevel Modelling’ and ‘Event History Analysis’
given at CQeSS and the Department of Mathematics and Statistics at
Lancaster University and elsewhere. We are grateful to many of the stu-
dents of these courses who are from a range of backgrounds (for example,
computational science and the social sciences) and whose comments and
criticisms improved these early drafts. We think that the book should
serve as a training manual for postgraduate Masters and research stu-
dents, and as a self-teaching manual for data analysts.
If you have any suggestions as to how this book could be improved—
for instance by the addition of other material—please let us know via
the Sabre mailing list,
[email protected].
We accept no liability for anything that might happen as a conse-
quence of your use of Sabre, though we are happy to accept recognition
of its successful use.
Dr. Damon M. Berridge and Professor Robert Crouchley
Lancaster University
February 2011
This page intentionally left blank
Acknowledgments
Many thanks to Dr. Iraj Kazemi for helping to draft the material in the
first ten chapters of this book. Thanks to Professor Richard B. Davies
for inspiring the early development of Sabre (Poisson and logit models
with endpoints).
Many thanks to Daniel Grose for writing the R side of the SabreR
library. Dan also wrote much of the introductory material on R in Ap-
pendix B. David Stott and John Pritchard undertook all the recent devel-
opment work on Sabre. Dave wrote the standard Gaussian and adaptive
Gaussian quadrature algorithms. John wrote the algorithm for manipu-
lating the large sparse matrices used by the fixed effects estimator.
This work was supported by the following ESRC research grants:
• RES-149-28-1003: The Colaboratory for Quantitative e-Social Sci-
ence (E-Social Science Centre Lancaster Node), principal investi-
gator: Professor Robert Crouchley
• RES-149-25-0010: An OGSA Component-Based Approach to Mid-
dleware for Statistical Modelling, principal investigator: Professor
Robert Crouchley
• RES-576-25-0019: The Lancaster-Warwick-Stirling Node: Develop-
ing Statistical Modelling in the Social Sciences (National Centre
for Research Methods (NCRM) Phase 2), principal investigator:
Professor Brian Francis
The NCRM Phase 2 grant was particularly important for the de-
velopment of the bivariate ordered response model reported in Chapter
11.
Finally, we wish to express our gratitude to Professor Roger Penn for
his assistance in proofreading the final draft of this book.
xxiii
This page intentionally left blank
1
Introduction
A major objective of this book is to provide data analysts with the tools
to analyze large and complex datasets using methodologically sound
models, thereby enabling them to answer increasingly complex research
questions. The statistical software used in this book is SabreR. This is a
version of the package Sabre, for the statistical analysis of multi-process
event/response sequences, which has been implemented within the R
environment.
These responses can take the form of binary, ordinal, count and linear
recurrent events. The response sequences can also be of different types,
for example, a linear response (wages) and a binary response (trade union
membership). Such multi-process data are common in many research
areas, for example, in the analysis of work and life histories from the
British Household Panel Survey or the German Socio-Economic Panel
Study where researchers often want to disentangle state dependence (the
effect of previous responses or related outcomes) from any omitted effects
that might be present in recurrent behaviour (unemployment).
Understanding of the need to disentangle these generic substantive
issues dates back to the study of accident proneness [14] and has been
discussed in many applied areas, including consumer behaviour [75] and
voting behaviour [34].
SabreR can also be used to model collections of single sequences
such as those that may occur in medical trials, for example, headaches
and epileptic seizures [29,30], or in single-equation descriptions of cross-
sectional clustered data such as the educational attainment of children
in schools.
The class of models that can be estimated by SabreR may be called
multivariate generalized linear mixed models. These models have special
features added to standard models to help us disentangle state depen-
dence from the incidental parameters (omitted or unobserved effects).
The incidental parameters can be treated as random or fixed, the ran-
dom effects models being estimated using standard Gaussian quadrature
or adaptive Gaussian quadrature. ‘End effects’ can also be added to the
models to accommodate ‘stayers’ or ‘non-susceptibles’, resulting in a
more parsimonious model which provides a better fit to the data with
1
2 Multivariate Generalized Linear Mixed Models Using R
fewer parameters than a non-parametric specification of the random ef-
fects. The fixed effects algorithm we have developed uses code for large
sparse matrices from the Harwell Subroutine Library [49].
SabreR also includes the option to undertake all of the calculations
using increased accuracy. This is important because numerical under-
flow and overflow often occur in the estimation process for models with
incidental parameters.
Chapters 2 and 3 cover the analysis of single-level data of various
types: continuous, binary, ordinal and count data using univariate gen-
eralized linear models. The material covered in these chapters is sum-
marized in Chapter 4. Chapters 5 to 8 extend these models to handle
multi-level, specifically two-level, data of various types: continuous, bi-
nary, ordinal and count data, using univariate generalized linear mixed
models. The models considered in Chapters 5 to 8 are summarized in
Chapter 9, and are generalized to handle three-level data in Chapter 10.
A key feature of this book is the emphasis on the application of sta-
tistical models to real-life examples. At the heart of each chapter will be
a fully worked example. In addition, readers will have the opportunity
to apply these statistical models and to interpret the resulting output
through a large number of exercises spanning a wide variety of areas
of application. The exercises illustrating the use of models for continu-
ous/interval scale data in Chapters 2 and 5 are based on the following
examples:
Example 1.1. Psychological distress
Twelve students completed the twelve-item version of Goldberg’s
General Health Questionnaire (GHQ) [42]. The questionnaire was com-
pleted by each student on two different occasions, separated by three
days. A psychological distress score was computed, on the basis of the
twelve GHQ items, for each student on each of the two occasions [39].
These student-occasion-specific scores are saved in the file ghq2.tab.
Example 1.2. Essay grading (continuous response)
Johnson and Albert [66] analyzed data on the grading of essays by
several experts. Essays were graded on a scale between 1 and 10, with a
score of 10 corresponding to ‘excellent’. In this example, we consider a
subset of the data limited to the grades given to 198 essays by markers 1
and 4. This subset of data is stored in the data file grader1.tab which
may be found on the Sabre web page. The grades given by markers 1 and
4 are stacked in a single column grade in the file grader2.tab. This file
also includes an identifier which distinguishes between the two graders,
in other words, the variable dg4 which takes value 1 if the grader is
number 4, and value 0 otherwise. Alternative treatments of the response
are considered in Examples 1.7 and 1.11.
Introduction 3
Example 1.3. Educational attainment
Garner and Raudenbush [41] and Raudenbush and Bryk [93] studied
the role of school and/or neighbourhood effects on the educational at-
tainment of young people, from one Scottish Local Education Authority,
who left school between 1984 and 1986. The primary outcome of inter-
est is a young person’s combined end-of-school educational attainment
as measured by his/her grades.
Explanatory variables are available at two levels: (i) the individ-
ual young person level and (ii) the school and/or neighbourhood level.
Most explanatory variables present in the dataset are specific to each
young person. These variables include: young person’s gender; verbal
reasoning quotient and reading ability as measured by tests in pri-
mary school at age 11–12; father’s occupation and education. The one
school/neighbourhood-specific explanatory variable is an index of social
deprivation for the local community within which the young person lived.
The data are stored in the file neighbourhood.tab on the Sabre web
page.
Example 1.4. Unemployment claims
Indiana’s enterprise zone programme provided tax credits for cities
with high poverty and unemployment levels. In a bid to establish whether
those cities targetted by the programme had significantly lower unem-
ployment claims than those cities lying outside enterprise zones, Papke
[85] analyzed annual data from 1980 to 1988. The dataset (ezunem2.tab)
comprises the number of unemployment claims in 22 cities, and whether
each city was located within an enterprise zone, in each of the nine years
1980 to 1988.
Example 1.5. Wage determinants
Vella and Verbeek [103] analyzed annual data on 545 males from the
Youth Sample of the US National Longitudinal Survey for the period
1980 to 1987. The version of the data used in this book (wagepan.tab)
was obtained from Wooldridge [106]. We wish to relate the outcome of
primary interest, log hourly wage (in US dollars), to a time-invariant fac-
tor (ethnicity) and a variety of time-dependent explanatory variables.
Those variables allowed to vary over time include respondent demo-
graphics (marital status, region of US lived in, rural/urban area lived
in), education (years of schooling), labour market experience and trade
union membership. These data are re-considered in Example 1.8, where
trade union membership is regarded as the binary response of interest.
Having analyzed these data in Chapters 2 and 5, we will return to
this dataset on further occasions in this book. In Chapter 11, in the
context of bivariate models, we will estimate a joint model for wages and
4 Multivariate Generalized Linear Mixed Models Using R
trade union membership. We will allow trade union membership to be
endogenous in the wage equation. In Chapter 14, we will use the data on
trade union membership to illustrate Wooldridge’s [107] treatment of the
initial conditions problem in first-order Markov models. In Chapter 15,
we compare and contrast the inferences made when we first assume fixed
effects and then proceed under the assumption of random effects. We will
use these data to relate log wages to time-varying explanatory variables
such as number of years of labour market experience, marital status and
trade union membership, and to time-invariant factors including race
and education.
Example 1.6. Pupil rating of school managers
856 pupils in 94 schools were asked to rate the performance of their
school managers/directors on the basis of six questions, each response
recorded on a four-point scale [64]. The response to each item given by
each pupil is presented in the dataset manager.tab. Pupil-specific ex-
planatory variables are gender and school year. School-specific factors
are gender of the school manager/director and type of school which is
classified into the following three categories: ‘general (AVO)’, ‘profes-
sional (MBO&T)’ and ‘day/evening’.
The exercises illustrating the use of models for binary data in Chap-
ters 3 and 6 are based on the following examples:
Example 1.7. Essay grading (binary response)
In an extension to Example 1.1, we use data on the grades given
to 198 essays by markers 1 to 5. Essays were graded on a scale from 1
to 10, with 10 classified as ‘excellent’. For the purposes of the current
example, the original essay grading variable is converted into a binary
response variable, labelled as pass in the dataset essays2.tab. The
variable pass takes the value 1 for grades 5 to 10, and value 0 for grades
1 to 4. The primary objective in this example is to test for significant
differences in this binary response between markers, whilst adjusting for
six explanatory variables which characterize the 198 essays.
Four of these factors are lexical in nature: average word length
(wordlength), square root of the number of words (sqrtwords), aver-
age sentence length (sentlength) and proportion of words in the essay
which are prepositions (prepos). A fifth explanatory variable is related
to punctuation: number of commas, multiplied by 100 and divided by
the total number of words in the essay (commas). The sixth factor is the
percentage of words in the essay which are spelt incorrectly (errors).
Example 1.8. Trade union membership
In Example 1.5, we related data from the Youth Sample of the US
Introduction 5
National Longitudinal Survey on log hourly wage to a time-invariant
factor (ethnicity) and a variety of time-dependent explanatory variables.
In the current example, we use the same dataset (wagepan.tab) and
treat trade union membership as the binary response of interest. We wish
to examine the effects of ethnicity, respondent demographics, education
and labour market experience on this binary response.
Example 1.9. Tower of London
The Tower of London test was used to assess the cognitive perfor-
mance of three groups of participants: (i) subjects with schizophrenia;
(ii) the subjects’ relatives; (iii) control participants. The test was re-
peated at three different levels of difficulty. The dataset tower1.tab
[90] has a three-level structure. The binary response dtlm takes the
value 1 if the test taken by participant j from family k was completed
in the minimum number of moves on occasion i, and takes the value 0
otherwise.
Example 1.10. Immunization of Guatemalan children
The Guatemalan government wished to establish the effectiveness
of its 1986 campaign to immunize children against major childhood dis-
eases. In 1987, the government conducted a questionnaire survey of 1595
mothers across 161 communities. The questionnaire contained informa-
tion on the immunization status of children who were alive in 1987 and
who were born between 1982 and 1987. If the child were more than two
years old at the time of the interview, then they were old enough to be
immunized during the campaign.
The dataset (guatemala immun.tab) contains the binary response
immun which represents whether child i in family j within community
k was immunized (coded 1) or not (0) [94]. Information was collected
on two child-specific explanatory variables: age and birth order. Family-
specific factors included age, education and working status of the mother,
as well as education of the father. Each community was classified as ei-
ther rural (coded 1) or urban (0). The other community-specific explana-
tory variable was proportion of the population which was indigenous in
1981.
The exercises illustrating the use of models for ordinal categorical
data in Chapters 3 and 7 are based on the following examples:
Example 1.11. Essay grading (ordered response)
In Example 1.1, the original gradings of 198 essays by five experts
were recorded on a 10-point scale and were treated as continuous/interval
scale data. In Example 1.7, the original grades were converted into a
binary response. In the current example, the original grades are recoded
6 Multivariate Generalized Linear Mixed Models Using R
into an ordered response ngrade comprising four categories. The variable
ngrade takes the value 1 if the original grade was either 1 or 2; value
2 if the original grade was either 3 or 4; value 3 if the original grade
was either 5 or 6; value 4 otherwise. The ordinal response ngrade and
the explanatory variables, including the six essay characteristic variables
outlined in Example 1.7, are stored in the file essays ordered.tab.
Example 1.12. Attitudes to abortion
The British Social Attitudes Survey (BSAS) is a multi-stage clustered
random sample of adults who are aged 18 and over, and who are living
in private households in Britain. Wiggins et al. [105] studied attitudes to
abortion by following respondents from the 1983 wave of BSAS for four
years. Each year, respondents were presented with seven situations such
as ‘The woman became pregnant as a result of rape’ and ‘The woman
decides on her own that she does not wish to have a child’. Respondents
were asked to say whether abortion should be legal (coded 1) or not
(0) under each of these situations. The strength of support for legalizing
abortion was judged by combining the responses. The respondent’s total
score (score) was obtained by adding up the responses across all seven
circumstances. This total score was converted into an ordered response
(nscore) which took value 1 if score equalled 0, 1 or 2 (as these values
were rare), value 2 if score equalled 3, value 3 if score equalled 4, value
4 if score equalled 5, and value 5 if score equalled 6 (as value 7 never
occurred).
Information on the respondents’ age, gender, religion, political affili-
ation and self-assessed social class was extracted from BSAS data from
1983 to 1986. We wish to test whether any of these characteristics play
a significant role in determining attitudes to abortion, whilst adjusting
for the fact that the data are clustered by district. We limit the data
(abortion2.tab) to the 246 respondents who provided valid responses
to all seven questions across all four years 1983 to 1986.
Example 1.13. Respiratory status
A two-centre clinical trial was conducted to compare two groups of
patients being treated for respiratory illness [70]. There were 110 eligi-
ble patients who were randomized either to the experimental treatment
or to the placebo. The respiratory status of each patient was recorded
prior to randomisation and at four subsequent follow-up visits to the
clinic. In this book, respiratory status is regarded as an ordered response
comprising the following five categories: ‘terrible’ (coded 0), ‘poor’ (1),
‘fair’ (2), ‘good’ (3) and ‘excellent’ (4). The primary objective of the
study was to test whether the impact of the experimental treatment
(drug) on respiratory status varied significantly over time (trend), hav-
ing controlled for patient’s age (age), gender (male) and respiratory
Introduction 7
response at baseline (base). The version of the dataset used in this book
is respiratory2.tab.
The exercises illustrating the use of models for count data in Chapters
3 and 8 are based on the following examples:
Example 1.14. Headaches
A multi-period, two-treatment crossover trial was conducted to es-
tablish whether an artificial sweetener (aspartame) caused headaches.
The trial involved the random assignment of 27 patients to different se-
quences of placebo and aspartame. For the purposes of this example, we
ignore the crossover nature of the trial. The response of interest (y) was
the number of headaches counted up over several days (days). These
data (headache2.tab) have been analyzed previously by McKnight and
van den Eeden [79] and Hedeker [60].
Example 1.15. Epileptic seizures
A randomized controlled clinical trial for the treatment of epilepsy
was conducted to compare the drug Progabide with a placebo. 59 patients
were randomized either to the drug or to the placebo. The response of
interest (y) was the number of epileptic seizures counted up over a two-
week period. Each patient made up to four visits to the clinic. Visit time
(visit) may be regarded as a continuous covariate taking possible values
−0.3 (visit 1), −0.1 (visit 2), 0.1 (visit 3) and 0.3 (visit 4). An indicator
variable, v4, takes the value 1 for visit 4, and value 0 otherwise.
The primary objective in this study was to test whether the exper-
imental drug (treat: 1 if Progabide, 0 if placebo) reduced significantly
the number of epileptic seizures. The treatment effect was examined,
having adjusted for two secondary explanatory variables. The first was
the logarithm of the patient’s age (lage); the second was the logarithm
of a quarter of the number of seizures in the past eight weeks preceding
the trial, centred about its mean (lbas). The interaction between lbas
and treat, lbas.trt, was also of interest. These data (epilep.tab)
have been analyzed previously by Thall and Vail [102] and Breslow and
Clayton [21].
Example 1.16. Skin cancer deaths
This example uses the Langford et al. [72] data from the Atlas of
Cancer Mortality in the European Economic Community (EEC) [97].
Data were collected on male malignant melanoma deaths over the pe-
riod 1975 to 1981 for Germany, Ireland, Italy, the Netherlands and the
UK, and over the period 1971 to 1980 for four other EEC countries. In-
terest focusses on establishing the role of ultraviolet-B (UVB) light expo-
sure to malignant melanoma deaths. The dataset (deaths.tab) contains
8 Multivariate Generalized Linear Mixed Models Using R
the number of male deaths due to malignant melanoma (deaths) and
a mean-centred measure of the UVB dose reaching the earth’s surface
(uvb) by year in county i within region j of nation k.
The univariate mixed models discussed in Chapters 2 to 10 is gen-
eralized to multivariate, in particular bivariate, models in Chapter 11.
Chapter 12 describes models required to analyze event history data, with
specific reference to duration models, renewal models and competing risk
models. How to handle stayers and non-susceptibles through the use of
endpoints is considered in Chapter 13. Chapter 14 addresses the issue of
state dependence, outlines the initial conditions problem and considers
possible solutions. A discussion of fixed effects versus random effects in
the linear model is presented in Chapter 15.
The book concludes with two appendices. Appendix A provides in-
structions on how to install SabreR and discusses the structure of SabreR
commands. Appendix A also reviews topics of general interest, including
quadrature techniques, estimation methods, and presents a discussion of
endogenous versus exogenous variables. Appendix B provides an intro-
duction to those features of R that are salient to the use of SabreR.
2
Generalized linear models for
continuous/interval scale data
2.1 Introduction
The generalized linear model [77] has become widely recognized as one of
the major methodological developments of the second half of the twen-
tieth century. The main contributory factor towards the success of its
wide applicability over the last thirty years or so has been its flexibility.
The model, or more accurately, the family of models, may be applied
to a wide range of different types of data. These types include continu-
ous/interval scale, categorical (including binary and ordinal) and count
data, examples of which were introduced in Chapter 1. Each member
of the family of models is appropriate for a specific type of data. One
member of the family is outlined in this chapter, while other members
are introduced in the next chapter.
Returning to the current chapter, we start by presenting the null lin-
ear model, that is, the linear model without any explanatory variables,
for continuous/interval scale data. In Section 2.2, we illustrate how a
null model may be fitted in SabreR by way of an example introduced
in Chapter 1. In Section 2.3, we progress to the simple linear regression
model, which permits us to examine the relationship between a contin-
uous/interval scale response variable and a single explanatory variable.
The simple linear regression is extended to multiple linear regression,
which allows us to relate a continuous/interval scale response variable
to a set of explanatory variables. We examine the assumptions underly-
ing these models, and explain how these assumptions may be checked,
in Section 2.4. The likelihood theory associated with the multiple linear
regression is presented in Section 2.5. We must define the likelihood in
Section 2.5 before explaining how to compare nested models using the
deviance (or −2 log likelihood) in Section 2.6. This chapter concludes
with a full worked example illustrating the application of a multiple
linear regression in SabreR.
9
10 Multivariate Generalized Linear Mixed Models Using R
2.2 Continuous/interval scale data
We will use Example 1.1 (Psychological distress) to illustrate the appli-
cation of a generalized linear model to continuous/interval scale data. A
psychological distress score was calculated for each of twelve students on
each of the two occasions. The student-occasion-specific scores are saved
in the file ghq2.tab.
Let yij denote the psychological distress score of student j on occasion
i, i = 1, 2, j = 1, 2, ..., 12. The simplest model is equivalent to a one-way
analysis of variance in which there are no explanatory variables. This
model contains only random variation between observations, and may
be written in the form
yij = γ 00 + εij
where γ 00 is the population grand mean and εij is the error term for
student j on occasion i, i = 1, 2, j = 1, 2, ..., 12. In other words, each
observation has the ‘true mean’ γ 00 , and the observation for student j
on occasion i deviates from this true mean by some value, called εij . It
is assumed that the error terms εij have mean 0 and variance σ 2ε (the
between-observation variance). In other words, the between-observation
variance is the variance between observations about the true mean.
Throughout this book, we will use the software SabreR to perform
analyses. SabreR is a version of the software Sabre which runs within R.
Details on how to get started in R are given in Appendix B, Section B.1.
The dataset ghq2.tab may be read into SabreR. Information on how to
read data into SabreR is presented in Appendix B, Section B.2. The null
model may be fitted in SabreR using the following command:
sabre.model.1 <- sabre(ghq~1,case=student,
adaptive.quad=TRUE,first.family="gaussian")
More details on the structure of SabreR commands are provided in
Appendix 2, Section A.2. The above command produces the following
output:
Parameter Estimate Std. Err. Z-score
______________________________________________________
cons 10.167 1.2448 8.1673
sigma 6.0982
The estimate of the overall mean score (γ 00 ) is 10.167. The variation
between observations (σ 2ε ) is estimated to be 6.09822 = 37.188.
Generalized linear models for continuous/interval scale data 11
2.3 Simple and multiple linear regression models
We now wish to relate the response yij to a single explanatory variable
xij . In the psychological distress example, we take xij to be a binary
explanatory variable which takes value 1 for occasion 2 (i = 2), and
takes value 0 for occasion 1 (i = 1). The simple linear regression model
with this single explanatory variable xij is given by:
yij = γ 00 + γ 10 xij + εij ,
where γ 10 is the regression coefficient for xij . The population mean and
variance of the residuals εij are assumed to be zero and σ 2ε respectively.
The SabreR command required to fit the model which includes the
occasion effect is:
sabre.model.2 <- sabre(ghq~1+dg2,case=student,
adaptive.quad=TRUE,first.family="gaussian")
This command leads to the output:
Parameter Estimate Std. Err. Z-score
_______________________________________________________
cons 10.333 1.7993 5.7431
dg2 -0.33333 2.5446 -0.13100
sigma 6.2329
The interpretation of the parameter γ 00 has changed from its inter-
pretation in the null model. The parameter γ 00 now corresponds to the
mean score on occasion 1, which is estimated to be 10.333. The regres-
sion coefficient γ 10 may be interpreted as the difference in mean scores
for occasion 2 relative to occasion 1. The estimate of γ 10 (−0.333) in-
dicates that the mean score on occasion 2 is lower than the mean score
on occasion 1. The Z-score of −0.131 shows that this difference in mean
scores between occasions is not significantly different from zero.
More than one explanatory variable can be included in the model
to transform a simple linear regression into a multiple linear regression.
When the explanatory variables are denoted by x1 , · · · , xP , adding their
effects to the model leads to the following formula:
P
X
yij = γ 00 + γ p0 xpij + εij .
p=1
12 Multivariate Generalized Linear Mixed Models Using R
Assuming all explanatory variables are continuous covariates, the re-
gression parameters γ p0 (p = 1, · · · , P ) may be interpreted as follows:
an increase of one unit in the value of xp is associated with an average
increase of γ p0 units in y. Here, we have assumed all the explanatory
variables are continuous covariates. In practice, some of the variables xp
may be factors (categorical variables), interaction variables or non-linear
(for example, quadratic) transformations of the original continuous co-
variates.
The first part on the right-hand side of the above equation incorpo-
rates the regression coefficients:
P
X
γ 00 + γ p0 xpij .
p=1
This is called the fixed part of the model, because the coefficients are
fixed, in other words, not stochastic. The remaining term, varepsilonij ,
is called the random part of the model. It is again assumed that the
residuals εij are mutually independent and have zero means conditional
on the explanatory variables. The population variance of the residuals
εij is denoted by σ 2ε . An assumption made when fitting a multiple linear
regression is that these residuals are drawn from a Gaussian or normally
distributed population. We discuss how this assumption of normality
may be checked in the following section.
2.4 Checking assumptions in linear regression
models
Most regression assumptions are concerned with residuals. A residual is
defined to be the difference between the observed y and the y predicted
by the regression line. In a simple linear regression, the usual estimate
of the residual is:
eij = yij − γ̂ 00 − γ̂ 10 xij ,
where γ̂ 00 and γ̂ 10 denote the estimates of the regression coefficients γ 00
and γ 10 respectively. This residual estimate generalizes readily from sim-
ple linear regression to multiple linear regression. The variances of the
residual estimates depend in general on the values of the regression co-
efficients of the fixed effects so it is common to standardize the residuals
by dividing by the appropriate standard errors.
Estimated residuals are generally not of interest in their own right
Generalized linear models for continuous/interval scale data 13
but can be used to help check model assumptions. To examine the as-
sumption of linearity, for example, we can produce a residual plot against
predicted values of the dependent variable using the fixed part of the lin-
ear regression model for the prediction. The two particular assumptions
that can be studied readily are the assumption of normality and the as-
sumption of homoscedasticity, in other words, that the variances in the
model are constant.
A residual plot should show a random scatter of residuals around
the zero line. Even if the residuals are evenly distributed around zero,
the regression model is still questionable when there is a pattern in
the residuals. Ideally, you should not be able to detect any patterns,
including a change in variation.
To check the normality assumption, we can use a normal probability
plot. The standardized residuals are plotted against a theoretical normal
distribution in such a way that the points should form an approximate
straight line. Departures from this straight line indicate departures from
normality. We will return to residuals in Chapter 5. SabreR does not
currently make residuals available to R.
2.5 Likelihood: multiple linear regression
The likelihood associated with the multiple linear regression model is:
YY
L γ, σ 2ε |y, x = g (yij |xij ) ,
j i
where the Gaussian or normal probability density function of the re-
sponse yij , conditional on the set of explanatory variables xij , is defined
as: 2 !
1 yij − µij
g (yij |xij ) = √ exp − ,
2πσ ε 2σ 2ε
P
X
µij = γ 00 + γ p0 xpij .
p=1
14 Multivariate Generalized Linear Mixed Models Using R
2.6 Comparing model likelihoods
Each model that is fitted to the same set of data has a corresponding log
likelihood value that is calculated at the maximum likelihood estimates
for that model. These values are used to compare and test statistically
the significance of terms in the model.
The deviance test, or likelihood ratio test, is a quite general principle
for statistical testing. In applications involving linear models, this test is
used mainly for multi-parameter tests (that is, tests for the joint effects
of a set of explanatory variables) and for tests about the fixed part as
well as the random part of the model. The general principle is as follows.
When parameters of a statistical model are estimated by the method
of maximum likelihood, the estimation also provides the likelihood,
which can be transformed into the deviance, which is defined as mi-
nus twice the natural logarithm of the likelihood. This deviance can be
regarded as a measure of the lack of fit between the model and the data.
In most statistical models, one cannot interpret the deviance directly,
but one can interpret changes in deviance between nested models fitted
to the same data set.
In general, suppose that model one has t parameters, while model two
is a subset of model one with only r of the t parameters so that r < t.
Model one will have a higher log likelihood than model two. For large
sample sizes, the difference between these two log likelihoods, when mul-
tiplied by minus two, will behave like the chi-squared distribution with
(t − r) degrees of freedom. This can be used to test the null hypothesis
that the (t − r) parameters that are not in both models are not signif-
icantly different from zero. SabreR computes the log likelihoods log(L)
which are negative values. The difference between nested log likelihoods
is then multiplied by minus two. This is called the deviance, which is
denoted by D, where:
D = −2[log(Lr ) − log(Lt )],
log(Lt ) is the log likelihood for the extended model, and log(Lr ) is the
log likelihood for the simpler model. With large sample sizes, D approxi-
mately follows a chi-squared distribution with (t − r) degrees of freedom.
In the psychological distress example (see Section 2.3), model one
has two parameters (γ 00 and γ 10 ), while model two is the null model
with one parameter (γ 00 ). The log likelihoods for models one and two
are −76.926 and −76.936 respectively. To test the null hypothesis that
γ 10 is not significantly different from zero, we can compute the change
in deviance (or minus twice the log likelihood) from model one to model
Generalized linear models for continuous/interval scale data 15
two. This change in deviance is 0.020 which, when compared to the chi-
squared distribution on one degree of freedom, indicates that γ 10 is not
significantly different from zero. In other words, there is no significant
difference in mean score between occasions 1 and 2.
2.7 Application of a multiple linear regression model
Example 2.7.1. Mathematics achievement
The data used in this example are a sub-sample taken from the 1982
High School and Beyond (HSB) Survey [93]. The dataset (hsb.tab) in-
cludes information on 7,185 students nested within 160 schools, 90 of
which are public and the rest of which are Catholic. The numbers of
students per school vary between 14 and 67. A standardized measure of
mathematics achievement (mathach) is taken to be the outcome, yij , of
student i in school j. The student-specific explanatory variables of in-
terest in this example are gender (1: female; 0: male), minority (1: yes;
0: no) and socio-economic status (ses) which is a composite measure of
parental education, occupation and income.
To obtain some preliminary information about how much variation
in the outcome lies between observations, we may fit the simplest model,
the null model, to the data. This model is denoted by:
yij = γ 00 + εij ,
where yij is mathach, for i = 1, · · · , nj students in school j, and j =
1, · · · , 160. The mathematics achievement of student i in school j is
represented by a linear combination of the grand mean, γ 00 , and an
error term, εij . We refer to the variance of εij as the between-observation
variance.
The data can be read into SabreR (see Appendix B, Section B.2) and
this model can be estimated (see Appendix A, Sub-section A.2.2). The
SabreR command required to fit this model is:
sabre.model.1 <- sabre(mathach~1,case=school,
first.mass=64,first.family="gaussian")
This command results in the following output:
Log likelihood = -24049.866
on 7183 residual degrees of freedom
16 Multivariate Generalized Linear Mixed Models Using R
Parameter Estimate Std. Err. Z-score
_______________________________________________________
cons 12.748 0.81145E-01 157.10
sigma 6.8782
The estimate of the grand mean, γ 00 , is 12.748. This mean can be
interpreted as the expected value of the mathematics achievement score.
The sigma parameter is estimated to be 6.8782. The estimate of the
between-observation variance is (6.8782)2 = 47.310.
The null model yij = γ 00 + εij provides a baseline against which
we can compare more complex models. We can add the explanatory
variables ses, minority and gender to the model. The model becomes:
yij = γ 00 + γ 10 sesij + γ 20 minorityij + γ 30 genderij + u0j + εij .
We can proceed to fit this model in SabreR:
sabre.model.2 <- sabre(mathach~1+ses+minority+gender,
case=school,first.mass=64,first.family="gaussian")
The following output is produced:
Log likelihood = -23374.789
on 7180 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_______________________________________________________
cons 14.254 0.11775 121.05
ses 2.6830 0.98726E-01 27.176
minority -2.8365 0.17198 -16.494
gender -1.3766 0.14835 -9.2799
sigma 6.2627
We can compare the goodness of fit of this model with that of the
previous model using the analysis of deviance outlined in the previ-
ous section. The log likelihoods for the two models are −23374.789 and
−24049.866 on 7180 and 7183 residual degrees of freedom respectively.
The corresponding χ2 improvement is −2(−24049.866 + 23374.789) =
1350.153. When referred to the χ2 distribution on three degrees of free-
dom, this change in deviance is highly significant, indicating that the
joint effect of the explanatory variables is significantly different from
zero.
Generalized linear models for continuous/interval scale data 17
The Z-scores (parameter estimates divided by their respective stan-
dard errors) indicate that all three explanatory variables are highly sig-
nificant determinants of mathematics achievement. The corresponding
parameter estimates may be interpreted as follows. Students that are
from an ethnic minority do worse than those who are not. Female stu-
dents seem to perform more poorly than males. The higher the socio-
economic status score, the better a student fares.
2.8 Exercises on linear models
Exercise 2.8.1. Psychological distress
In this exercise, you are asked to reproduce the results for the psy-
chological distress example presented in this chapter:
1. Use SabreR to fit the null linear model on the psychological distress
score (ghq) that is, the model which includes a constant term only.
Obtain the log likelihood, parameter estimate and standard error.
Interpret the parameter estimate.
2. Add the occasion 2 identifier (dg2) to this model and re-fit the
model. Obtain the log likelihood, parameter estimates and stan-
dard errors. Compare the deviances (−2 times the change in log
likelihoods) and use the Z-scores to test whether the effect of dg2
is significantly different from zero. Interpret the results.
Exercise 2.8.2. Essay grading (continuous response)
In this exercise, we return to Example 1.2, in which 198 essays are
graded by two markers. The dataset grader2.tab contains the response
variable grade and the explanatory variable dg4 which takes value 1 for
marker 4, and takes value 0 for marker 1:
1. Use SabreR to fit the null linear model on grade, that is, the model
which includes a constant term only. Obtain the log likelihood,
parameter estimate and standard error. Interpret the parameter
estimate.
2. Add dg4 to this model and re-fit the model. Obtain the log likeli-
hood, parameter estimates and standard errors. Compare the de-
viances (−2 times the change in log likelihoods) and use the Z-
scores to test whether the effect of dg4 is significantly different
from zero. Interpret the results.
18 Multivariate Generalized Linear Mixed Models Using R
Exercise 2.8.3. Educational attainment
In this exercise, we return to Example 1.3, involving the educational
attainment of young people. The dataset neighbourhood.tab contains
the response variable attain and a set of explanatory variables (see
below):
1. Use SabreR to fit the null linear model on attainment (attain),
that is, the model without explanatory variables. Obtain the log
likelihood, parameter estimate and standard error. Interpret the
parameter estimate.
2. Add the student-specific explanatory variables: young person’s
gender (male); verbal reasoning quotient (p7vrq) and reading
ability (p7read) as measured by tests in primary school at age
11–12; father’s occupation (dadocc) and education (daded), to
this model and re-fit the model. Obtain the log likelihood, pa-
rameter estimates and standard errors. Compare the deviance
(−2 times the change in log likelihoods) and use the Z-scores to
test whether the effects of the explanatory variables are signifi-
cantly different from zero. Interpret the results.
Exercise 2.8.4. Unemployment claims
In this exercise, we return to Example 1.4. The primary objective
in this study was to establish whether those cities in an enterprise zone
had lower numbers of unemployment claims. The dataset ezunem2.tab
contains the response variable, the log number of unemployment claims
(luclms) and the explanatory variable ez which takes value 1 if a city
is in an enterprise zone, and takes value 0 otherwise:
1. Use SabreR to fit the null linear model on the log number of unem-
ployment claims (luclms), that is, the model without explanatory
variables. Obtain the log likelihood, parameter estimate and stan-
dard error. Interpret the parameter estimate.
2. Add the binary ez effect to this model and re-fit the model. Ob-
tain the log likelihood, parameter estimates and standard errors.
Compare the deviances (−2 times the change in log likelihoods)
and use the Z-scores to test whether the ez effect is significantly
different from zero. Interpret the results.
Exercise 2.8.5. Wage determinants
In this exercise, we return to Example 1.5, involving the wages of
Generalized linear models for continuous/interval scale data 19
young males. The dataset wagepan.tab contains the response variable,
log hourly wage (lwage) and a set of explanatory variables (see below).
1. Use SabreR to fit the null linear model on log hourly wage (lwage),
that is, the model without explanatory variables. Obtain the log
likelihood, parameter estimate and standard error. Interpret the
parameter estimate.
2. The time-invariant explanatory variable is ethnicity, which is ex-
pressed in terms of two indicator variables: black which takes value
1 if the respondent is black, and takes value 0 otherwise; hisp
which takes value 1 if the respondent is Hispanic, and takes value
0 otherwise.
3. The time-dependent explanatory variables include respondent de-
mographics: marital status (married), region of US lived in
(nrthcen, nrtheast, south), rural/urban area lived in (rur), ed-
ucation (educ), labour market experience (exper), trade union
membership (union) and year.
4. Add these explanatory variables to the model and re-fit the model.
Obtain the log likelihood, parameter estimates and standard er-
rors. Compare the deviances (−2 times the change in log likeli-
hoods) to test whether the joint effect of all explanatory variables
is significantly different from zero. Use the Z-scores to test whether
the effect of each explanatory variable is significantly different from
zero. Interpret the results.
5. Create interaction effects between year and education. Add these
effects to the previous model. Do the effects of education vary with
year?
Exercise 2.8.6. Pupil rating of school managers
In this exercise, we return to Example 1.6, in which 856 students
were asked to rate the performance of their school managers/directors.
The dataset manager.tab contains the item response variable score and
the explanatory variable, pupil gender (pupsex) which takes value 1 for
females, and takes value 2 for males:
1. Use SabreR to fit the null model on item responses (scores), that
is, the model without explanatory variables. Obtain the log like-
lihood, parameter estimate and standard error. Interpret the pa-
rameter estimate.
2. Add the explanatory variable, pupil gender (pupsex), to this model
20 Multivariate Generalized Linear Mixed Models Using R
and re-fit the model. Obtain the log likelihood, parameter es-
timates and standard errors. Compare the deviances (−2 times
log likelihoods) and use the Z-scores to test whether the effect of
pupsex is significantly different from zero. Interpret the results.
3
Generalized linear models for other types
of data
3.1 Binary data
3.1.1 Introduction
In the previous chapter, it was assumed that the response variable fol-
lowed a continuous distribution and that the random coefficients and
residuals were normally distributed. These models are appropriate where
the expected value of the response variable may be represented as a lin-
ear function of the explanatory variables. The linearity and normality
assumptions can be checked using standard graphical procedures. There
are other kinds of outcomes, however, for which these assumptions are
clearly not realistic, for example, discrete response variables. Important
instances of discrete response variables are binary variables (for exam-
ple, success versus failure of whatever kind) and counts (for example, in
the study of some kind of event, the number of events happening in a
predetermined time period).
For a binary variable yij that has probability µij for outcome 1 and
probability 1 − µij for outcome 0, the mean is:
E(yij ) = µij ,
and the variance is:
var(yij ) = µij (1 − µij ).
The variance is not a free parameter but is determined by the mean. This
has led to the development of models that differ from the usual multiple
linear regression models and that take account of the non-normal dis-
tribution of the response variable, its restricted range and the relation
between mean and variance. The best-known method for binary data is
the logistic regression model.
21
22 Multivariate Generalized Linear Mixed Models Using R
3.1.2 Logistic regression
We start by introducing a simple logistic regression model.
The data are structured as follows: observation i is grouped within
cluster j, i = 1, · · · , nj , j = 1, · · · , m. With only one explanatory vari-
able xij , the simple binary logistic regression model is written in terms
∗
of the latent response variable yij as:
∗
yij = γ 00 + γ 10 xij + εij .
∗
In practice, yij is unobservable, and this can be measured indirectly
by an observable binary variable yij defined by:
∗
1 if yij >0
yij =
0 otherwise,
such that:
∗
Pr (yij = 1 | xij ) = Pr yij >0
= Pr (γ 00 + γ 10 xij + εij > 0)
= Pr (εij > − {γ 00 + γ 10 xij })
Z ∞
= f (εij ) dεij
−{γ 00 +γ 10 xij }
= 1 − F (− {γ 00 + γ 10 xij })
= µij .
For symmetric distributions of εij like the normal or logistic, we have:
1 − F (− {γ 00 + γ 10 xij }) = F (γ 00 + γ 10 xij ) ,
where F (·) is the cumulative distribution function of εij .
We view the observed value yij as a realisation of a random variable
Yij that can take the values one and zero with probabilities µij and 1−µij
respectively. The distribution of yij is called a Bernoulli distribution with
parameter µij , and can be written as:
y 1−yij
g(yij |xij ) = µijij 1 − µij , yij = 0, 1.
To proceed, we need to impose an assumption about the distribution
of εij . If the cumulative distribution of εij is assumed to be logistic,
Generalized linear models for other types of data 23
we have logistic regression or the logit model, and if we assume that
εij ∼ N (0, 1), we have the probit model.
We complete the specification of the logit model by expressing the
functional form for µij in the following manner:
exp (γ 00 + γ 10 xij )
µij = .
1 + exp (γ 00 + γ 10 xij )
The probit model is based upon the assumption that the disturbances
εij are independent standard normal variates, such that:
µij = Φ(γ 00 + γ 10 xij ),
where Φ (·) denotes the cumulative distribution function for a standard
normal variable.
3.1.3 Logit and probit transformations
Interpretation of the parameter estimates obtained from either the logit
model or the probit model is best achieved on a linear scale such that,
for the logit model, we can re-express µij as:
µij
logit µij = log = γ 00 + γ 10 xij .
1 − µij
This equation represents the log odds of observing the response yij = 1.
This is linear in x, and so the effect of a unit change in xij is to increase
the log odds by γ 10 . The logit link function is non-linear, so the effect
of a unit increase in xij is harder to comprehend if measured on the
probability scale µij . The probit model may be rewritten as:
probit µij = Φ−1 µij = γ 00 + γ 10 xij .
The logistic and normal distributions are both symmetrical around
zero and have very similar shapes, except that the logistic distribution
has fatter tails. As a result, the conditional probability functions are
very similar for both models, except in the extreme tails. For both the
logit and probit link functions, any probability value in the range [0, 1]
is transformed so that the resulting values of logit(µij ) and probit(µij )
will lie between −∞ and +∞.
A further transformation of the probability scale that is sometimes
useful in modelling binomial data is the complementary log-log trans-
formation. This function again transforms a probability µij in the range
[0, 1] to a value in (−∞, +∞), using the relationship log[− log(1 − µij )].
24 Multivariate Generalized Linear Mixed Models Using R
3.1.4 General logistic regression
Suppose the observed binary responses are binomially distributed,
such that yij ∼ bin 1, µij , with conditional variance var(yij |µij ) =
µij 1 − µij . The general logistic regression model, with P explanatory
variables x1 , · · · , xP on the observations, has the following form:
P
X
logit(µij ) = γ 00 + γ p0 xpij + εij .
p=1
3.1.5 Likelihood
The likelihood associated with the general logistic regression model is:
YY
L (γ|y, x) = g (yij |xij ) ,
j i
where: 1−yij
y
g (yij |xij ) = µijij 1 − µij ,
( P
)!
X
µij = 1 − F − γ 00 + γ p0 xpij .
p=1
3.1.6 Example with binary data
Example 3.1.6. Repeating a grade
Raudenbush and Bhumirat [92] analyzed data on whether or not
children had to repeat a grade during their time at primary school. The
data were from a national survey of primary education in Thailand in
1988. We use a subset of the Raudenbush and Bhumirat [92] data from
411 schools.
The dataset thaieduc1.tab comprises 8582 observations (rows). We
take the variable repeat to be the binary response, the indicator of
whether a child has ever repeated a grade (0 = no, 1 = yes). The child-
specific explanatory variables are sex (0 = girl, 1 = boy) and whether
a child has had any pre-primary education pped (0 = no, 1 = yes). The
probability that a child will repeat a grade during the primary years,
µij , is of interest.
First, we use SabreR to estimate a logistic regression model with a
constant term only:
logit µij = γ 00 .
Generalized linear models for other types of data 25
The SabreR command required to fit this null model is:
sabre.model.1 <- sabre(repeat~1,case=schoolid)
This results in the following output:
Log likelihood = -3553.4906
on 8581 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
________________________________________________________
cons -1.7738 0.30651E-01 -57.870
Then we estimate a logistic regression model which includes the child-
specific explanatory variables sex and pped:
logit µij = γ 00 + γ 10 sexij + γ 20 ppedij .
The SabreR command needed to fit this model is:
sabre.model.2 <- sabre(repeat~1+sex+pped,case=schoolid)
This produces the output below:
Log likelihood = -3481.7830
on 8579 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
________________________________________________________
cons -1.7596 0.54203E-01 -32.464
sex 0.45958 0.62755E-01 7.3235
pped -0.58442 0.63026E-01 -9.2728
For the constant-only model, the estimated average log odds of repe-
tition across primary schools, γ 00 , is −1.774. As sex is a dummy variable
indicating whether the pupil is a girl or a boy, it can be helpful to write
down a pair of fitted models, one for each gender. By substituting the
values 1 for a boy and 0 for a girl in sex, we get the boy’s constant
−1.760 + 0.460 = −1.300, and we can write:
logit µij ; girl = −1.760 − 0.584ppedij ,
26 Multivariate Generalized Linear Mixed Models Using R
logit µij ; boy = −1.300 − 0.584ppedij .
The intercepts in these two models, in other words, the log odds of
repetition for girls and boys without pre-primary education, are quite
different. The boys are more likely than the girls to have repeated a
grade. A child with pre-primary education is less likely to have repeated
a grade than a child who has not had any pre-primary education.
3.2 Ordinal data
3.2.1 Introduction
Variables that have as outcomes a small number of ordered categories
are quite common in the social and biomedical sciences. Examples of
such variables are responses to questionnaire items (with outcomes, for
example, ‘strongly agree’, ‘agree’, ‘disagree’, ‘strongly disagree’), and a
test graded by a teacher as ‘fail’, ‘pass’, or ‘distinction’. Very useful
models for this type of data are the ordered logistic regression model,
also called the ordered logit model or the cumulative logit model or the
proportional odds model, and the closely related ordered probit model.
This section considers models where the response variable is such an
ordinal categorical variable.
When the number of categories is two, the dependent variable is
binary and the logit and probit models considered in the previous section
may be applied. When the number of categories is rather large (10 or
more), it may be possible to approximate the distribution by a normal
distribution and apply the linear model for continuous outcomes. The
main issue in such a case is the homoscedasticity assumption (see Section
2.4). If the number of categories is small, say between 3 and 7, and the
distribution cannot be well approximated by a normal distribution, then
statistical methods for ordered categorical outcomes can be useful.
It is usual to assign numerical values to the ordered categories, re-
membering that the values are arbitrary. The ordered categories are
assigned the values 1, 2, · · · , C, where C is the number of categories.
Thus, on the four-point scale mentioned above, the category ‘strongly
agree’ would be assigned the value 1, ‘agree’ would be represented by 2,
‘disagree’ by 3, and ‘strongly disagree’ by the value 4.
The ordered models themselves are not members of the family of
generalized linear models. They can be regarded as generalisations of the
models for binary data, such as the logit and probit models that were
Generalized linear models for other types of data 27
introduced in Section 3.1. The ordered models can also be formulated as
threshold models.
The real line is divided by thresholds into C intervals, correspond-
ing to the C ordered categories. The first threshold is γ 1 . Threshold γ 1
defines the upper bound of the interval corrsponding to observed out-
come 1. Similarly, threshold γ C−1 defines the lower bound of the interval
corrsponding to observed outcome C. Threshold γ c defines the boundary
between the intervals corresponding to observed outcomes c − 1 and c
∗
(for c = 2, · · · , C − 1). The latent response variable is denoted by yij and
∗
the observed categorical variable yij is related to yij by the ‘threshold
model’ defined as:
∗
1 if −∞ < yij ≤ γ1
2 if ∗
γ 1 < yij ≤ γ 2
yij = .. ..
. .
∗
C if γ C−1 < yij < +∞.
3.2.2 The ordered logit model
∗
Consider the latent response variable yij for observation i in cluster j and
∗
the observed ordinal categorical variable yij related to yij . The ordinal
∗
models can be written in terms of yij :
∗
yij = θij + εij ,
where:
P
X
θij = β 0 + β p xpij .
p=1
In the absence of explanatory variables, the response variable yij
takes on the value c with probability:
pij(c) = P r(yij = c),
for c = 1, · · · , C. Define the cumulative response probabilities for the C
categories of the ordinal outcome yij as:
c
X
Pij(c) = P r(yij ≤ c) = pij(k) , c = 1, · · · , C.
k=1
28 Multivariate Generalized Linear Mixed Models Using R
Note that this cumulative probability for the last category is 1; in
other words, Pij(C) = 1. Therefore, there are only (C − 1) cumulative
probabilities Pij(c) to estimate. If the cumulative density function of εij
is F , these cumulative probabilities are denoted by:
Pij(c) = F (γ c − θij ), c = 1, · · · , C − 1,
where γ 0 = −∞ and γ C = +∞. Equivalently, we can write the model
as a cumulative model:
G Pij(c) = γ c − θij ,
where G = F −1 is the link function.
If εij follows the logistic distribution, this results in the ordered lo-
gistic regression model, also called the ordered logit model or cumulative
logit model or proportional odds model. If εij follows the standard nor-
mal distribution, this leads to the ordered probit model. The differences
between these two models are minor and the choice between them is a
matter of fit and convenience.
Assuming the distribution of the error term εij of the latent response
∗
yij to be logistic, the cumulative probability function of yij can be writ-
ten as:
Pij(c) = P r(εij ≤ γ c − θ ij )
exp (γ c − θij )
= .
1 + exp (γ c − θij )
The idea of cumulative probabilities leads naturally to the cumulative
logit model:
Pij(c) P r (yij ≤ c)
log = log
1 − Pij(c) P r (yij > c)
= γ c − θij , c = 1, · · · , C − 1,
with (C − 1) strictly increasing model thresholds γ c (in other words,
γ 1 < γ 2 ... < γ C−1 ). In this case, the observed ordinal outcome yij = c if
∗
γ c−1 ≤ yij < γ c for the latent variable (with γ 0 = −∞ and γ C = +∞).
With explanatory variables, the model becomes
P
!
P r (yij ≤ c | xij ) X
log = γc − β p xpij ,
1 − P r (yij ≤ c | xij ) p=1
where γ c is the threshold parameter for category c = 1, · · · , C − 1.
Generalized linear models for other types of data 29
Since the regression coefficients β do not carry the c subscript, they
do not vary across categories. Thus, the relationship between the ex-
planatory variables and the cumulative logits does not depend on c. This
assumption of identical odds ratios across the (C − 1) partitions of the
original ordinal outcome is called the proportional odds assumption [76].
As written above, a positive coefficient for a regressor indicates that, as
the value of the regressor increases, so do the odds that the response is
greater than category c, for any c = 1, 2, · · · , C − 1.
This is a natural way of writing the model because it means that, for
a positive β, as x increases so does the value of y ∗ , but it is not the only
way of writing the model. In particular, the model is sometimes written
as: " # !
XP
P r yij ≤ c | xij , β 0j
log = γc + β p xpij ,
1 − P r yij ≤ c | xij , β 0j p=1
in which case the regression parameters β are identical in magnitude but
are of opposite sign [93].
3.2.3 Dichotomization of ordered categories
Models for ordered categorical outcomes are more complicated to fit
and to interpret than models for dichotomous outcomes. Therefore, it
can make sense also to analyze the data after dichotomizing the ordered
outcome variable whilst retaining the ordinality of the response cate-
gories. For example, if there are three outcomes, one could analyze the
dichotomization {1} versus {2, 3} and also {1, 2} versus {3}. Each of
these analyses, separately, is based, of course, on less information, but
may be easier to carry out and to interpret than an analysis of the orig-
inal ordinal outcome.
3.2.4 Likelihood
The likelihood associated with the ordered response model is:
YY
L γ, σ 2ε |y, x = g (yij |xij ) ,
j i
where:
Y yijc
g (yij |xij ) = P r(yij = c) ,
c
Y yijc
= Pij(c) − Pij(c−1) ,
c
30 Multivariate Generalized Linear Mixed Models Using R
and yijc = 1, if yij = c, and yijc = 0 otherwise,
( P )!!
X
Pij(c) = P r εij ≤ γ c − γ p0 xpij
p=1
( P
)!
X
=F γc − γ p0 xpij ,
p=1
where F (·) is the cumulative distribution function of εij .
3.2.5 Example with ordered data
Example 3.2.5. Choosing teaching as a profession
Rowan, Raudenbush and Cheong [91] analyzed data from a 1990 sur-
vey of teachers working in 16 public schools in California and Michigan.
The schools were specifically selected to vary in terms of size, organisa-
tional structure, and urban versus suburban location.
The survey asked the following question: ‘If you could go back to
college and start all over again, would you again choose teaching as a
profession?’ Possible responses were: 1 = yes; 2 = not sure; 3 = no.
We take the teachers’ answers to this question as the response variable
(tcommit) and try to establish if characteristics of the teachers help to
predict their response to this question.
We have two different versions of the data. The first version
(teacher1.tab) comprises 661 observations on 16 cases but has miss-
ing values in the covariates, so we have constructed a second version
(teacher2.tab) which has 650 complete observations on the 16 cases.
We fit two models to the data teacher2.tab: the first model without
covariates, the second with a single explanatory variable.
The response variable tcommit is the three-category measure of
teacher commitment. The single teacher-specific explanatory variable is
taskvar, the teachers’ perception of task variety. This variable assesses
the extent to which teachers followed the same teaching routines each
day, performed the same tasks each day, had something new happening
in their job each day, and liked the variety present in their work.
The response variable tcommit takes the values k = 1, 2, 3. In the
absence of the explanatory variable taskvar, these values occur with
probabilities:
pij(1) = P r(yij = 1) = P r(‘Yes’),
pij(2) = P r(yij = 2) = P r(‘Not sure’),
pij(3) = P r(yij = 3) = P r(‘No’).
Generalized linear models for other types of data 31
To assess the significance of the explanatory variable taskvar, we
initially specify the null ordered response model. This model includes
only the thresholds:
P r(yij ≤ c)
log = γc, c = 1, 2.
P r(yij > c)
The SabreR command to fit this null model is:
sabre.model.1 <- sabre(tcommit~1-1,case=schlid,
first.family="ordered")
This results in the following output:
Log likelihood = -653.01797
on 648 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_______________________________________________________
cut1 0.17894 0.78761E-01 2.2719
cut2 1.1867 0.92666E-01 12.806
Next, we consider the introduction of the explanatory variable
taskvar into this model. Rowan, Raudenbush and Cheong [91] hypoth-
esized that teachers would express high levels of commitment if they
had a job with a high degree of task variety. We add taskvar to the null
ordered response model:
P r(yij ≤ c | xij )
log = γ c − (β 1 taskvarij ) .
P r(yij > c | xij )
The SabreR command to fit this model is:
sabre.model.2 <- sabre(tcommit~taskvar-1,case=schlid,
first.family="ordered")
This produces the following output:
Log likelihood = -643.82094
on 647 residual degrees of freedom
32 Multivariate Generalized Linear Mixed Models Using R
Parameter Estimate Std. Err. Z-score
_______________________________________________________
taskvar -0.36885 0.86745E-01 -4.2522
cut1 0.18762 0.79754E-01 2.3525
cut2 1.2195 0.94122E-01 12.957
For the null model, the results indicate that the estimated values
of the threshold parameters are 0.179 (γ 1 ) and 1.187 (γ 2 ). The model
formulation summarizes the two equations as:
P r(yij ≤ 1)
log = 0.179,
P r(yij > 1)
P r(yij ≤ 2)
log = 1.187.
P r(yij > 2)
For the model with the explanatory variable taskvar included, the
two equations summarizing these results are:
P r(yij ≤ 1 | xij )
log = 0.188 − (−0.369taskvarij )
P r(yij > 1 | xij )
= 0.188 + 0.369taskvarij ,
P r(yij ≤ 2 | xij )
log = 1.220 + 0.369taskvarij .
P r(yij > 2 | xij )
The results indicate that taskvar is significantly related to commit-
ment (β 1 = −0.369, ztest = −4.252). The greater the teacher’s per-
ception of task variety in a particular school, the more likely it will be
that a teacher in that school will be more favourably disposed towards
again choosing teaching as a profession; in other words, a teacher in that
school will be more likely to reply ‘Yes’ (threshold 1) and will be more
likely to say either ‘Yes’ or ‘Not sure’ (threshold 2).
3.3 Count data
3.3.1 Introduction
Another important type of discrete data is count data. For example, for
a population of road crossings, one might count the number of accidents
in one year; or for a population of doctors, one could count how often
in one year they are confronted with a certain medical problem. The
set of possible outcomes of count data is the set of natural numbers:
Generalized linear models for other types of data 33
0, 1, 2, · · · . The natural starting point in the analysis of counts is the
Poisson distribution. Suppose yij is a variable distributed randomly as
P oisson(µij ). Then, we write:
y
exp(−µij )µijij
P r(yij ) = , yij = 0, 1, · · · .
yij !
The Poisson distribution has some neat mathematical properties that
we can utilise when modelling count data. For example, the expected or
mean value of y is equal to the variance of y, so that:
E(yij ) = var(yij ) = µij .
When we have Poisson distributed data, it is usual to use a logarith-
mic transformation to model the mean, in other words, log(µij ). This
is the natural parameter for modelling the Poisson distribution. There
is no theoretical restriction, however, on using other transformations of
µij , so long as the mean is positive [38].
Furthermore, if the counts tend to be large, their distribution can
be approximated by a continuous distribution. If all counts are large
enough, then it is advisable to use the square root of the counts as the
response variable and then to fit the model. The reason why this is a
good approach resides in the fact that the square root transformation
succeeds very well in transforming the Poisson distribution to an ap-
proximately homoscedastic normal distribution. The square root is the
so-called variance-stabilizing transformation for the Poisson distribution.
If all or some of the counts are small, an approximation using the normal
distribution will not be satisfactory.
3.3.2 Poisson regression models
In Poisson regression, it is assumed that the response variable yij has a
Poisson distribution given the explanatory variables x1ij , x2ij , · · · , xpij ,
yij |x1ij , x2ij , · · · , xpij ∼ P oisson(µij ),
where the log of the mean µij is assumed to be a linear function of the
explanatory variables. That is,
log(µij ) = β 0 + β 1 x1ij + β 2 x2ij + · · · + β p xpij ,
which implies that µij is the exponential function of independent vari-
ables,
µij = exp β 0 + β 1 x1ij + β 2 x2ij + · · · + β p xpij .
34 Multivariate Generalized Linear Mixed Models Using R
In models for counts, it is quite usual that there is a variable Mij
that is known to be proportional to the expected counts. For example,
if the count yij is the number of events in some time interval of non-
constant length mij , it is often natural to assume that the expected
count is proportional to this length of the time period. In order to let the
expected count be proportional to Mij , there should be a term log(mij )
in the linear model for log(µij ), with a regression coefficient fixed to 1.
Such a term is called an offset in the linear model [44, 77]. Therefore,
the Poisson regression model can be written in the following form:
log(µij ) = log (mij ) + β 0 + β 1 x1ij + β 2 x2ij + · · · + β p xpij .
The log(µij /mij ) is modelled now as a linear function of explanatory
variables.
Assume the observations i are nested within clusters j. Using the
logarithmic transformation, the model with P explanatory variables
x1 , · · · , xP may be written as:
yij ∼ P oisson(µij ),
P
X
log(µij ) = log(mij ) + β 0 + β p xpij ,
p=1
where β 0 is an intercept parameter, and β p , p = 1, · · · , P , are slope pa-
rameters associated with explanatory variables xpij . The term log(mij )
is included in the model as an offset.
3.3.3 Likelihood
The likelihood associated with the Poisson model is:
YY
L (γ|y, x) = g (yij |xij ) ,
j i
where: y
exp(−µij )µijij
g (yij |xij ) = .
yij !
3.3.4 Example with count data
Example 3.3.4. Demand for health care
Cameron and Trivedi [22] used various forms of overdispersed Pois-
son model to study the relationship between type of health insurance
Generalized linear models for other types of data 35
and various responses which measured the demand for health care, such
as the total number of prescribed medications used in the past two days
(prescrib). The data set they used in this analysis (racd.tab) was
from the Australian Health Survey for 1977–1978 and comprised 5,190
respondents. Type of health insurance is defined as a categorical variable
represented by the following dummy variables: levyplus: 1 if respondent
is covered by private health insurance fund for private patients in public
hospital (with doctor of choice), 0 otherwise; freepoor: 1 if respon-
dent is covered by government because low income, recent immigrant,
unemployed, 0 otherwise; freerepa: 1 if respondent is covered free by
government because of old-age or disability pension, or because invalid
veteran or family of deceased veteran, 0 otherwise.
Demographic explanatory variables are sex: 1 if respondent is fe-
male, 0 if male; age: respondent’s age in years divided by 100; agesq:
age squared. Income is defined as respondent’s annual income in Aus-
tralian dollars divided by 1000 (income). Explanatory variables used
to describe the respondent’s recent state of health are illness: num-
ber of illnesses in past two weeks, with five or more weeks coded as 5;
actdays: number of days of reduced activity in past two weeks due to
illness or injury; hscore: respondent’s general health questionnaire score
using Goldberg’s method, high score indicates poor health; chcond1: 1
if respondent has chronic condition(s) but is not limited in activity, 0
otherwise; chcond2: 1 if respondent has chronic condition(s) and is lim-
ited in activity, 0 otherwise. A copy of the original dataset and further
details about the variables in racd.tab can be obtained from the web
[23].
Like Cameron and Trivedi, we take prescrib to be the count re-
sponse variable and apply a Poisson model with the range of explana-
tory variables outlined above. The SabreR command required to fit this
model is:
sabre.model <- sabre(prescrib~sex+age+agesq+
income+levyplus+freepoor+freerepa+
illness+actdays+hscore+chcond1+chcond2+
1,case=id,first.family="poisson")
This produces the following output:
Log likelihood = -5443.3311
on 5176 residual degrees of freedom
36 Multivariate Generalized Linear Mixed Models Using R
Parameter Estimate Std. Err. Z-score
____________________________________________________
cons -2.8668 0.14908 -19.230
sex 0.56080 0.43164E-01 12.992
age 2.0861 0.73513 2.8377
agesq -0.26325 0.78264 -0.33636
income 0.30450E-01 0.65221E-01 0.46688
levyplus 0.27060 0.58009E-01 4.6649
freepoor -0.61759E-01 0.13676 -0.45159
freerepa 0.29172 0.69172E-01 4.2174
illness 0.20914 0.13260E-01 15.772
actdays 0.34688E-01 0.49475E-02 7.0112
hscore 0.21604E-01 0.81424E-02 2.6533
chcond1 0.77394 0.50771E-01 15.244
chcond2 1.0245 0.62314E-01 16.440
scale 0.52753 0.27207E-01 19.389
The results indicate that the following explanatory variables: sex,
illness, chcond1 and chcond2, have a highly significant effect on the
total number of prescribed medications used in the past two days. The
association between response and these explanatory variables is positive.
A female is likely to have significantly more prescribed medications than
a male. Those respondents with more illnesses in the past two weeks,
and those with a chronic condition, regardless of whether or not that
condition limits activity, are likely to have more prescribed medications.
Other significant factors with a positive effect on the response are
age, levyplus, freerepa, actdays and hscore. The older the respon-
dent, the worse his or her general state of health and the more days of
reduced activity in the past two weeks, then it is likely that the respon-
dent will have more prescribed medications.
Having adjusted for all of the above factors, only two of the three
types of health insurance examined have a significant and positive effect
on number of prescribed medications: private health insurance and free
cover for the elderly or disabled. Free cover for those on low incomes,
who were recent immigrants and who were unemployed does not have a
significant impact on number of prescribed medications.
Generalized linear models for other types of data 37
3.4 Exercises
Exercise 3.4.1. Essay grading (binary response)
Recall that, in Exercise 2.8.1, we treated the variable grade as a
continuous/interval scale response and applied a linear model. In the
current exercise, we regard the variable pass in the dataset essays2.tab
as a binary response which takes the value 1 for grades 5 to 10, and value
0 for grades 1 to 4:
1. Use SabreR to fit a null binary logistic regression model. Obtain the
log likelihood, parameter estimate and standard error. Interpret
the parameter estimate.
2. Add the four grader dummy variables to the model. Obtain the log
likelihood, parameter estimates and standard errors. Compare the
deviances (−2 times log likelihoods) and use the Z-scores to test
whether there are significant differences between graders. Interpret
the results.
Exercise 3.4.2. Trade union membership
In Exercise 2.8.5, we related data from the Youth Sample of the
US National Longitudinal Survey on log hourly wage to a time-invariant
factor (ethnicity) and a variety of time-dependent explanatory variables.
In the current exercise, we use the same dataset (wagepan.tab) and treat
trade union membership (union) as the binary response of interest:
1. Use SabreR to fit a null binary logistic regression model of union.
Obtain the log likelihood, parameter estimate and standard error.
Interpret the parameter estimate.
2. Add the indicator variables for year (d81 to d87) to the model.
Obtain the log likelihood, parameter estimates and standard er-
rors. Compare the deviances (−2 times log likelihoods) and use
the Z-scores to test whether there are significant differences be-
tween years. Interpret the results.
Exercise 3.4.3. Tower of London
Recall from Example 1.9 that the binary response dtlm in the dataset
tower1.tab takes the value 1 if the test taken by participant j from
family k was completed in the minimum number of moves on occasion
i, and takes the value 0 otherwise:
38 Multivariate Generalized Linear Mixed Models Using R
1. Use SabreR to fit a null binary logistic regression model of dtlm.
Obtain the log likelihood, parameter estimate and standard error.
Interpret the parameter estimate.
2. Add the covariate level to the model. Obtain the log likelihood,
parameter estimates and standard errors. Compare the deviances
(−2 times log likelihoods) and use the Z-scores to test whether
there is a significant effect of level of difficulty. Interpret the results.
Exercise 3.4.4. Immunization of Guatemalan children
Recall from Example 1.10 that the dataset (guatemala immun.tab)
contains the binary response immun which represents whether child i in
family j within community k was immunized (coded 1) or not (0):
1. Use SabreR to fit a null binary logistic regression model of immun.
Obtain the log likelihood, parameter estimate and standard error.
Interpret the parameter estimate.
2. Add the child-specific explanatory variables: age (kid2p) and birth
order (order23, order46 and order7p) to the model. Obtain the
log likelihood, parameter estimates and standard errors. Compare
the deviances (−2 times log likelihoods) and use the Z-scores to
test whether there is a significant effect of age and birth order.
Interpret the results.
Exercise 3.4.5. Essay grading (ordered response)
In Exercise 2.8.1, the original gradings of 198 essays by five experts
were recorded on a 10-point scale and were treated as continuous/interval
scale data. In Exercise 3.4.1, the original grades were converted into a
binary response. In the current exercise, the original grades are recoded
into an ordered response ngrade comprising four categories. The variable
ngrade takes the value 1 if the original grade was either 1 or 2; value 2
if the original grade was either 3 or 4; value 3 if the original grade was
either 5 or 6; value 4 otherwise. The ordinal response ngrade and the
explanatory variables are stored in the file essays ordered.tab:
1. Use SabreR to fit a null ordered probit model of ngrade. Obtain
the log likelihood, parameter estimate and standard error. Inter-
pret the parameter estimate.
2. Add the four dummy variables for graders (grader2 to grader5)
to the model. Obtain the log likelihood, parameter estimates and
standard errors. Compare the deviances (−2 times log likelihoods)
Generalized linear models for other types of data 39
and use the Z-scores to test whether there are significant differences
between graders. Interpret the results.
Exercise 3.4.6. Attitudes to abortion
Recall from Example 1.12 that respondents in the British Social At-
titudes Survey (BSAS) were presented with seven circumstances such
as ‘The woman became pregnant as a result of rape’ and were asked to
say whether abortion should be legal (coded 1) or not (0) under each
of those circumstances. Individuals were asked to respond on an an-
nual basis between 1983 and 1986. The strength of support for legalising
abortion was judged by combining the responses. The respondent’s total
score (score) was obtained by adding up the responses across all seven
circumstances. This total score was converted into an ordered response
(nscore) which took value 1 if score equalled 0,1 or 2 (as these values
were rare), value 2 if score equalled 3, value 3 if score equalled 4, value
4 if score equalled 5, and value 5 if score equalled 6 (as value 7 never
occurred). The data are stored in the file abortion2.tab:
1. Use SabreR to fit a null ordered logit model of nscore. Obtain the
log likelihood, parameter estimate and standard error. Interpret
the parameter estimate.
2. Create three dummy variables for year (year2 to year4). Add
these variables to the model. Obtain the log likelihood, parameter
estimates and standard errors. Compare the deviances (−2 times
log likelihoods) and use the Z-scores to test whether there are
significant differences between the years. Interpret the results.
Exercise 3.4.7. Respiratory status
Recall from Example 1.13 that the respiratory status score (status)
of patients in a clinical trial was regarded as an ordered response com-
prising the following five categories: ‘terrible’ (coded 0), ‘poor’ (1), ‘fair’
(2), ‘good’ (3) and ‘excellent’ (4). Respiratory status was determined
prior to randomisation (trend = 0) and at four later visits to the clinic
(trend = 1,2,3,4). The data are saved in the file respiratory2.tab:
1. Use SabreR to fit a null ordered logit model of status. Obtain the
log likelihood, parameter estimate and standard error. Interpret
the parameter estimate.
2. Add the explanatory variables drug, male, age and base to the
model. Obtain the log likelihood, parameter estimates and stan-
dard errors. Compare the deviances (−2 times log likelihoods) and
40 Multivariate Generalized Linear Mixed Models Using R
use the Z-scores to test whether any of the explanatory variables
are significant.
3. Add the linear trend variable to the model. Obtain the log
likelihood, parameter estimates and standard errors. Compare
the deviances (−2 times log likelihoods) and use the Z-score to
test whether there is a significant trend over time. Interpret the
results.
Exercise 3.4.8. Headaches (count data)
Recall from Example 1.14 that a trial was conducted to establish
whether an artificial sweetener (aspartame) caused headaches. The trial
involved the random assignment of 27 patients to different sequences of
placebo and aspartame. The response of interest (y) was the number of
headaches counted up over several days (days). The responses are in tem-
poral order, but we do not use that feature of the data (headache2.tab)
in this exercise. Hedeker [60] found no evidence of a sequence effect:
1. Use SabreR to fit a null Poisson model to the number of headaches
(y), with lt=log(days) as the offset and a log link. Obtain the log
likelihood, parameter estimate and standard error. Interpret the
parameter estimate.
2. Add the treatment indicator aspartame to the model. Obtain the
log likelihood, parameter estimates and standard errors. Compare
the deviances (−2 times log likelihoods) and use the Z-score to test
whether there is a significant treatment effect.
Exercise 3.4.9. Epileptic seizures
Recall from Example 1.15 that a trial for the treatment of epilepsy
was conducted to compare the drug Progabide with a placebo. 59 patients
were randomized to either the drug or the placebo. The response of
interest (y) was the number of epileptic seizures counted up over a two-
week period (visit). The primary objective was to test whether the
experimental drug (treat) reduced significantly the number of epileptic
seizures, having adjusted for patient age (lage) and number of seizures
in the eight weeks preceding the trial (lbas). The data are saved in the
file epilep.tab:
1. Use SabreR to fit a null Poisson model to the number of epileptic
seizures (y). Obtain the log likelihood, parameter estimate and
standard error. Interpret the parameter estimate.
Generalized linear models for other types of data 41
2. Add the terms lbas, treat, lbas.trt, lage and visit to the
model. Obtain the log likelihood, parameter estimates and stan-
dard errors. Compare the deviances (−2 times log likelihoods) and
use the Z-scores to test whether any of the explanatory variables
are significant. Do the results make sense intuitively?
3. Replace the variable visit with v4 (an indicator variable for the
fourth visit) and re-fit the model. Which model would you prefer?
Exercise 3.4.10 Skin cancer deaths
Recall from Example 1.16 that data were collected on male malignant
melanoma deaths over the period 1975 to 1981 for Germany, Ireland,
Italy, the Netherlands and the UK, and over the period 1971 to 1980
for four other EEC countries. Interest focussed on establishing the role
of ultraviolet-B (UVB) light exposure to malignant melanoma deaths.
The dataset (deaths.tab) contains the number of male deaths due to
malignant melanoma (deaths) and a mean-centred measure of the UVB
dose reaching the earth’s surface (uvb) by year in county i within region
j of nation k :
1. Use SabreR to fit a null Poisson model to the number of male
deaths (deaths). Use log expected deaths as an offset. Obtain the
log likelihood, parameter estimate and standard error. Interpret
the parameter estimate.
2. Add the continuous covariate uvb to the model. Obtain the log
likelihood, parameter estimates and standard errors. Compare the
deviances (−2 times log likelihoods) and use the Z-score to test
whether there is a significant uvb effect.
This page intentionally left blank
4
Family of generalized linear models
4.1 Introduction
The main models we have considered so far, namely linear models, bi-
nary response models (generalizable as ordered response models) and
Poisson models, are special cases of the generalized linear model (GLM)
or exponential family. It will help us in considering extensions of these
models to three levels, and to multivariate responses, if we can start to
write each of the models using GLM notation. In GLMs, the explana-
tory variables xij affect the response (yij ) via the linear predictor (θij ),
where:
P
X
θij = γ 00 + γ p0 xpij .
p=1
The GLM is obtained by specifying some function of the response
(yij ) conditional on the linear predictor and on other parameters, i.e.,
g (yij | θij , φ) = exp {[yij θij − b (θij )] /φ + c (yij , φ)} ,
where φ is the scale
parameter, b (θij ) is a function that gives the con-
ditional mean µij and variance of yij , namely:
′
E [yij | θ ij , φ] = µij = b (θij ) ,
′′
V ar [yij | θ ij , φ] = φb (θ ij ) .
In GLMs, the mean and variance are related so that:
′′
′−1
V ar [yij | θij , φ] = φb b (θij ) = φV µij .
′−1
V µij is called the variance function. The function b (θij ) which
′
expresses θij as a function of µij is called the link function, and b (θij )
is the inverse link function. Both b (θ ij ) and c (yij , φ) differ for different
members of the exponential family.
43
44 Multivariate Generalized Linear Mixed Models Using R
4.2 The linear model
If we rewrite the linear model from Chapter 2 as:
g (yij |xij ) = g (yij | θ ij , φ)
2 !
1 yij − µij
=√ exp − ,
2πσ ε 2σ 2ε
then we can write:
( ! !)
1 µ2ij ln (2πσ ε ) 2
yij
g (yij | θij , φ) = exp yij µij − + − 2 ,
2σ 2ε 2 2 2σ ε
so that:
θij = µij ,
φ = σ 2ε ,
θ2ij
b (θij ) = ,
2
2
ln (2πσ ε ) yij
c (yij , φ) = − 2.
2 2σ ε
The mean µij and variance functions are:
µij = θij ,
V µij = 1.
Note that, in the linear model, the mean and variance are not related
as:
φV µij = σ 2ε .
Also, the link function is the identity as θ ij = µij . We define this
model by Gaussian error distribution g, identity link function i.
4.3 The binary response model
If we rewrite the binary response model from Section 3.1 as:
g (yij |xij ) = g (yij | θij , φ)
y 1−yij
= µijij 1 − µij ,
Family of generalized linear models 45
then we can write:
g (yij | θij , φ) = exp yij ln µij + (1 − yij ) ln(1 − µij )
µij
= exp yij ln + ln(1 − µij ) ,
(1 − µij )
so that:
µij
θ ij = ln ,
(1 − µij )
φ = 1,
b (θij ) = ln(1 − µij ),
c (yij , φ) = 0.
The mean µij and variance functions are
exp (θij )
µij = ,
1 + exp (θij )
exp (θij )
V µij = 2.
{1 + exp (θij )}
Note that, in the binary response model, the mean and variance are
related as:
φV µij = µij 1 − µij .
µij
Also θij = ln 1−µ , and the logit model (logit link) has:
ij
exp (θ ij )
µij = .
1 + exp (θij )
The probit model (probit link) has µij = Φ (θij ), or Φ−1 µij =
θ ij , where Φ (.) is the standard normal cumulative distribution func-
tion. The complementary
log log model (cloglog link) has θij =
log − log 1 − µij , or µij = 1 − exp (− exp θij ). We define the binary
response model with binomial error distribution b, and logit, probit or
cloglog link.
46 Multivariate Generalized Linear Mixed Models Using R
4.4 The Poisson model
If we rewrite the Poisson model from Section 3.3 as:
g (yij |xij ) = g (yij | θij , φ)
y
exp(−µij )µijij
= ,
yij !
then we can write:
g (yij | θij , φ) = exp yij ln µij − µij − log yij !) ,
so that:
θij = ln µij ,
φ = 1,
b (θij ) = µij = exp θij ,
c (yij , φ) = − log yij !.
The mean µij and variance functions are:
µij = exp (θ ij ) ,
V µij = µij .
Note that, in the Poisson model, the mean and variance are related as:
φV µij = µij .
The link function is the log link as θij = ln µij . We define the Poisson
model with Poisson error distribution p, and either logit g, probit p or
cloglog c link.
4.5 Likelihood
We can now write the likelihood for the linear, binary response and
Poisson models in a general form:
YY
L (γ, φ|y, x) = g (yij | θ ij , φ) ,
j i
Family of generalized linear models 47
where:
g (yij | θij , φ) = exp {[yij θij − b (θij )] /φ + c (yij , φ)} ,
P
X
θij = γ 00 + γ p0 xpij .
p=1
In summary, for the linear model, we have identity link function
and Gaussian (normal) error distribution; for the binary model, we have
logit, probit or cloglog link function and binomial error distribution;
and for the Poisson model, we have log link function and Poisson error
distribution.
For further discussion on generalized linear models, see [77].
This page intentionally left blank
5
Mixed models for continuous/interval
scale data
5.1 Introduction
In the analysis of Example 1.2.3 (Psychological distress) in Chapter 2
(see Sections 2.2, 2.3 and 2.6), we assumed that each psychological dis-
tress score was independent of all other scores in the same dataset. More
specifically, we assumed that the scores computed for each student on
two different occasions were independent of each other. A more realistic
assumption would be that the two psychological distress scores observed
on each student are, in some way, dependent upon each other. In this
chapter, we develop linear models which will allow us to take into ac-
count this dependence between observations on the same individual.
5.2 Linear mixed model
When modelling these data, we should acknowledge the fact that each
student is contributing two scores. We do this by introducing the concept
of hierarchies or levels within the data; in other words, by regarding the
data as two-level, a specific case of multi-level data [43]. We maintain
the notation which is common to much literature on multi-level data.
The 12 students are regarded as the level-two units j within which the
two occasions are considered to be the level-one units i. Let yij denote
the psychological distress score of student j on occasion i, i = 1, 2, j =
1, 2, ..., 12.
The simplest two-level (or mixed) model is equivalent to a one-way
analysis of variance with an individual-specific random effect in which
there are no explanatory variables. In this model, the response variable
(psychological distress score) is expressed as the sum of a random inter-
cept for the level-two units (students) j, β 0j , and the residual effect for
49
50 Multivariate Generalized Linear Mixed Models Using R
the level-one units (occasions) i within these level-two units, εij :
yij = β 0j + εij .
Assuming the εij have zero means, the intercept β 0j can be thought of
as the mean score of student j. Students with a high value of β 0j tend to
have, on average, high scores whereas students with a low value of β 0j
tend to have, on average, low scores. The level-two equation also has no
predictors in its simplest form:
β 0j = γ 00 + u0j ,
where β 0j is now the dependent variable, γ 00 is the level-two intercept
and u0j is the level-two error with mean 0. In this equation, γ 00 repre-
sents the grand mean or the mean score of the student-specific intercepts.
The SabreR command required to fit this null model is:
sabre.model.1 <- sabre(ghq~1,case=student,
adaptive.quad=TRUE,first.family="gaussian")
This command results in the following output:
Log likelihood = -67.132857
on 21 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
____________________________________________________
cons 10.167 1.6784 6.0573
sigma 1.9149 0.39087 4.8990
scale 5.6544 1.2222 4.6265
This output indicates that the estimate of the overall mean score, γ 00 ,
is 10.167.
The term u0j in the above expression represents the deviation of
each student-specific mean score from the grand mean. When the aver-
age deviation is large, there are large differences between the students.
Rewriting the two equations above as a single equation, we have:
yij = γ 00 + u0j + εij
where γ 00 is the population grand mean, u0j is the specific effect of
level-two unit (student) j, and εij is the residual effect for level-one unit
(occasion) i within this level-two unit. In other words, student j has
Mixed models for continuous/interval scale data 51
the ‘true mean’ γ 00 + u0j , and each measurement on an occasion for this
student deviates from this true mean by some value, called εij . Level-two
units (students) differ randomly from one another, which is reflected by
the fact that u0j is a random variable and that this type of model is called
a ‘random effects model’. Some students have a high (low) true mean
score, corresponding to a high (low) value of u0j while other students
have a true mean score close to the average, corresponding to a value of
u0j close to zero.
It is assumed that the random variables u0j and εij are mutually in-
dependent, that the student-specific random effects u0j have population
mean 0 and variance σ 2u0 (the population between-student variance),
and that the residuals εij have mean 0 and variance σ 2ε (the population
within-student variance). In other words, the within-student variance
is the variance between occasions about the true student mean, while
the between-student variance is the variance between the students’ true
means.
From the above output, we can see that the estimate of the within-
student variance, σ 2ε , is 1.91492 = 3.6668, while the estimate of the
between-student variance, σ 2u0 , is 5.65442 = 31.9722.
The one-way analysis of variance examines the deviations of student-
specific mean scores from the grand mean. Here, it is assumed that the
students’ mean scores, represented by µij = γ 00 + u0j and, thus, their
deviations are varying randomly. Therefore, this model is equivalent to
the random effects ANOVA model. For further details of this model, see,
for example, [65, 90, 108].
5.3 The intraclass correlation coefficient
A basic measure for the degree of dependency in grouped observations
is the intraclass correlation coefficient. The term ‘class’ refers to the
level-two units in the classification system under consideration. There
are, however, several definitions of this coefficient, depending on the
assumptions about the sampling design.
Consider the model yij = γ 00 + u0j + εij . The total variance of yij
can be decomposed as the sum of the level-two (student) and level-one
(occasion) variances:
var(yij ) = var(u0j ) + var(εij ) = σ 2u0 + σ 2ε .
The covariance between responses of two level-one units (i and i′ , with
i 6= i′ ) in the same level-two unit j is equal to the variance of the
52 Multivariate Generalized Linear Mixed Models Using R
contribution u0j that is shared by these level-two units:
cov(yij , yi′ j ) = var(u0j ) = σ 2u0 .
The correlation between values of two randomly drawn level-one units
in the same, randomly drawn, level-two unit is given by
σ 2u0
ρ(yij , yi′ j ) = .
σ 2u0 + σ 2ε
This parameter is called the intraclass correlation coefficient or the
intra-level-two-unit correlation coefficient. The coefficient ρ is defined as:
population variance between level-two units
ρ= .
total variance
The intraclass correlation coefficient ρ measures the proportion of the
variance in the outcome that is between the level-two units. In the psy-
chological distress example , ρ is estimated to be 0.8971; in other words,
almost 90% of the overall variance in the psychological distress scores is
due to the variation between students.
We note that the true correlation coefficient ρ is restricted to take
non-negative values, i.e. ρ ≥ 0. Traditional estimation procedures such as
ordinary least squares (OLS) are used in multiple regression with fixed
effects, that is, when it can be assumed that σ 2u0 = 0. The existence
of a positive intraclass correlation coefficient ρ > 0, resulting from the
presence of more than one residual term in the model, means that such
traditional approaches are inapplicable.
Note that, conditional on being in group j:
E(y .j |β 0j ) = β 0j ,
σ 2ε
V ar(y .j |β 0j ) = .
nj
Across the population, the unconditional mean and variance are:
E(y .j ) = γ 00 ,
σ 2ε
V ar(y .j ) = σ 2u0 + .
nj
Note that the unconditional mean is equal to the expectation of the
conditional mean, and that the unconditional variance is equal to the
mean of the conditional variance plus the variance of the conditional
mean. We will use these relations in the next section.
Mixed models for continuous/interval scale data 53
5.4 Parameter estimation by maximum likelihood
There are three kinds of parameters that can be estimated: regression
parameters, variance components and random effects. In the current
context, there is only one regression parameter, the constant or grand
mean: γ 00 . The variance components are σ 2u0 and σ 2e . The random effects
are β 0j or, equivalently, when combined with γ 00 : u0j . The resulting
model is
yij = µij + εij ,
µij = γ 00 + u0j .
The likelihood function is given by:
+∞
YZ Y
L γ 00 , σ 2ε , σ 2u0 |y = g (yij |u0j ) f (u0j ) du0j ,
j −∞ i
where: 2 !
1 yij − µij
g (yij |u0j ) = √ exp − ,
2πσ ε 2σ 2ε
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
Maximisation of the likelihood function over the parameter space
gives maximum likelihood estimates (MLEs) for θ = γ 00 , σ 2ε , σ 2u0 .
SabreR evaluates the integral L γ 00 , σ 2ε , σ 2u0 |y for the linear model us-
ing standard Gaussian quadrature or adaptive Gaussian quadrature (nu-
merical integration). Note that the random effects u0j are latent variables
rather than statistical parameters or regression coefficients, and there-
fore are not estimated as part of the estimation procedure. They may
be predicted by a method known as empirical Bayes estimation which
produces so-called posterior means. The basic idea of this method is that
u0j can be predicted (or estimated) by combining two kinds of informa-
tion: (i) the data from group j, (ii) the fact that the unobserved u0j is
a random variable with mean 0 and variance σ 2u0 . In other words, data
information is combined with population information.
The posterior means for the level-two residuals u0j are given by:
σ 2u0
u
b0j = E (u0j |y, θ) = y −y ,
σ 2u0 + σ 2ε /nj .j
54 Multivariate Generalized Linear Mixed Models Using R
where θ are the model parameters [43] .
The estimate for the intercept β 0j will be the same as the estimate for
u0j plus γ 00 . Note that, if we used only group j, β 0j would be estimated
by the group mean:
β̂ 0j = y .j .
If we looked only at the population, we would estimate β 0j by its pop-
ulation mean, γ 00 . This parameter is estimated by the overall mean:
γ̂ 00 = y.
If we combine the information from group j with the population
information, the combined estimate for β 0j is a weighted average of the
two previous estimates:
EB
β̂ 0j = wj β̂ 0j + (1 − wj ) γ̂ 00 ,
σ 2u0
where wj = .
σ 2u0 + σ 2ε /nj
The factor wj is often referred to as a ‘shrinkage factor’ since it is
always less than or equal to one. As nj increases, this factor tends to
one, and as the number of level-one units in a level-two unit decreases,
the factor becomes closer to zero. In practice, we do not know the true
values of the variances σ 2u0 and σ 2ε , and we substitute estimated values
EB
to obtain β̂ 0j .
5.5 Regression with level-two effects
In multilevel analysis, the level-two unit means (group means for ex-
planatory variables) can be considered as an explanatory variable. A
level-two unit (student) mean for a given level-one explanatory variable
(occasion) is defined as the mean over all level-one units, within the
given level-two unit. The level-two unit mean of a level-one explanatory
variable allows us to express the difference between within-group and
between-group regressions. The within-group regression coefficient ex-
presses the effect of the explanatory variable within a given group; the
between-group regression coefficient expresses the effect of the group
mean of the explanatory variable on the group mean of the response
variable. In other words, the between-group regression coefficient is just
the coefficient in a regression analysis for data that are aggregated (by
averaging) to the group level. In the next section, we use a two-level
linear model to relate a continuous/interval scale response variable to a
set of level-one and level-two explanatory variables.
Mixed models for continuous/interval scale data 55
5.6 Two-level random intercept models
We start by relating the response to a single explanatory variable xij
which is allowed to vary between level-one units (occasions) of each level-
two unit (student). In the psychological distress example, xij could be
a binary indicator variable which takes the value 1 for occasion 2, and
value 0 otherwise. We assume that the intercept β 0j depends on the
level-two units but that the regression coefficient of xij is constant. The
resulting model with one explanatory variable xij is given by:
yij = β 0j + β 1j xij + εij .
For the level-two model, the group-dependent intercept can be split into
the grand mean and the group-dependent deviation:
β 0j = γ 00 + u0j ,
and the same fixed effect of xij for each level-two unit is assumed:
β 1j = γ 10 .
The grand mean is γ 00 and the regression coefficient for xij is γ 10 . Sub-
stitution now leads to the model:
yij = γ 00 + γ 10 xij + u0j + εij .
The random effects u0j are the level-two unit residuals, controlling for
the effects of variable xij . It is assumed that these residuals are drawn
from a normally distributed population having zero mean and a constant
variance σ 2u0 , given the values xij of the explanatory variable. The popu-
lation mean and variance of the level-one unit residuals εij are assumed
to be zero and σ 2ε respectively across the level-two units.
The variance of yij conditional on the value of xij is given by:
var(yij |xij ) = var(u0j ) + var(εij ) = σ 2u0 + σ 2ε ,
while the covariance between two different level-one units (i and i′ , with
i 6= i′ ) in the same level-two unit is:
cov(yij , yi′ j |xij , xi′ j ) = var(u0j ) = σ 2u0 .
The fraction of residual variability that can be attributed to level one is
given by:
σ 2ε
,
σ 2u0 + σ 2ε
56 Multivariate Generalized Linear Mixed Models Using R
and, for level two, this fraction is:
σ 2u0
.
σ 2u0 + σ 2ε
The residual intraclass correlation coefficient:
σ 2u0
ρ (yij |xij ) = ,
σ 2u0 + σ 2ε
is the correlation between the y-values of any two different level-one units
in the same level-two unit, controlling for variable x. It is analogous to
the usual intraclass correlation coefficient, but now controls for x. If the
residual intraclass correlation coefficient, or equivalently, σ 2u0 , is positive,
then the two-level (or mixed) model is a better analysis than ordinary
least squares regression.
An extension of this model allows for the introduction of level-two
predictors zj . In the psychological distress example, these could be stu-
dent characteristics such as age and gender. Using the level-two model:
β 0j = γ 00 + γ 01 zj + u0j ,
β 1j = γ 10 ,
the model becomes:
yij = γ 00 + γ 10 xij + γ 01 zj + u0j + εij ,
so that:
µij = γ 00 + γ 10 xij + γ 01 zj + u0j .
This model provides for a level-two predictor, zj , whilst also control-
ling for the effect of a level-one predictor, xij , and the random effects of
the level-two units, u0j .
5.7 General two-level models including random
intercepts
Just as in multiple regression, more than one explanatory variable can
be included in the random intercept model. When the explanatory vari-
ables at level one are denoted by x1 , · · · , xP , and those at level two by
Mixed models for continuous/interval scale data 57
z1 , · · · , zQ , adding their effects to the random intercept model leads to
the following formula:
P
X Q
X
yij = γ 00 + γ p0 xpij + γ 0q zqj + u0j + εij ,
p=1 q=1
so that
P
X Q
X
µij = γ 00 + γ p0 xpij + γ 0q zqj + u0j .
p=1 q=1
The regression parameters γ p0 (p = 1, · · · , P ) and γ 0q (q = 1, · · · , Q)
for level-one and level-two explanatory variables respectively again have
the same interpretation as regression coefficients in multiple regression
models: one unit increase in the value of xp (or zq ) is associated with an
average increase in y of γ p0 (or γ 0q ) units. Just as in multiple regression,
some of the explanatory variables xp and zq may be interaction variables,
or non-linear (for example, quadratic) transformations of the original
variables.
The first part on the right-hand side of the above equation incorpo-
rates the regression coefficients:
P
X Q
X
γ 00 + γ p0 xpij + γ 0q zqj .
p=1 q=1
This is called the fixed part of the model, because the coefficients are
fixed (i.e. not stochastic). The remaining part:
u0j + εij ,
is called the random part of the model. It is again assumed that all
residuals, u0j and εij , are mutually independent and have zero means
conditional on the explanatory variables. A somewhat less crucial as-
sumption is that these residuals are drawn from normally distributed
populations. The population variance of the level-one residuals εij is de-
noted by σ 2ε while the population variance of the level-two residuals u0j
is denoted by σ 2u0 .
58 Multivariate Generalized Linear Mixed Models Using R
5.8 Likelihood
The likelihood associated with the general two-level model considered in
the previous section is:
+∞
YZ Y
L γ, σ 2ε , σ 2u0 |y, x, z = g (yij |xij , zj , u0j ) f (u0j ) du0j ,
j −∞ i
where:
2 !
1 yij − µij
g (yij |xij , zj , u0j ) = √ exp − ,
2πσ ε 2σ 2ε
P
X Q
X
µij = γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
5.9 Residuals
In a single-level model, the usual estimate of the single residual term is
just the residual:
eij = yij − γ̂ 00 − γ̂ 10 xij .
In a multi-level (or mixed) model, however, there are several residuals
defined at different levels. In a random intercept model, the level-two
residual u0j can be predicted by the posterior means:
u
b0j = E (u0j |yj , xj , θ) ,
where θ are the model parameters. We can show that:
σ 2u0
u
b0j = ej ,
σ 2u0 + σ 2ε /nj
where the ej are averages of eij for level-two units j = 1, · · · , N . These
residuals have two interpretations. Their basic interpretation is as ran-
dom variables with a distribution whose parameter values tell us about
Mixed models for continuous/interval scale data 59
the variation among the level-two units, and which provide efficient esti-
mates for the fixed coefficients. A second interpretation is as individual
estimates for each level-two unit where we use the assumption that they
belong to a population of units to predict their values.
When the residuals at higher levels are of interest in their own right,
we need to be able to provide interval estimates and point estimates for
them. For these purposes, we require estimates of the standard errors of
the estimated residuals, where the sample estimate is viewed as a random
realisation from repeated sampling of the same higher-level units whose
unknown true values are of interest.
Note that we can now estimate the level-one residuals simply by using
the formula:
ε̂ij = eij − u
b0j .
The level-one residuals are generally not of interest in their own right
but are used rather for model checking, having first been standardized
using the diagnostic standard errors.
5.10 Checking assumptions in mixed models
Residual plots can be used to check model assumptions. There is one
important difference between ordinary regression analysis and multilevel
(or mixed) modelling: in the latter, there is more than one residual.
In fact, we have residuals for each random effect in the mixed model.
Consequently, many different residual plots can be constructed.
Most regression assumptions are concerned with residuals. Recall
from Chapter 2, Section 2.4, that a residual is defined to be the dif-
ference between the observed y and the y predicted by the regression
line. These residuals will be very useful to test whether or not the mixed
model assumptions hold.
As in single-level models, we can use the estimated residuals to help
check the model assumptions. The two particular assumptions that can
be studied readily are the assumption of normality and the assumption
that the variances in the model are constant. Because the variances of the
residual estimates depend in general on the values of the fixed coefficients
it is common to standardize the residuals by dividing by the appropriate
standard errors.
A residual plot against predicted values of the dependent variable,
using the fixed part of the mixed model for the prediction, may be used to
check the assumption of linearity (see Section 2.4). A normal probability
plot may be used to check the normality assumption (see Section 2.4).
60 Multivariate Generalized Linear Mixed Models Using R
5.11 Comparing model likelihoods
The principles underlying the formal comparison of nested models were
outlined in Section 2.6. The analysis of deviance, used in the context
of standard linear models in Chapter 2, may be applied in the current
context. We return to the psychological distress example. The null mixed
model was fitted in Section 5.2. We wish to add the occasion effect,
represented by the indicator variable dg2 which takes the value 1 for
occasion 2, and takes value 0 otherwise. The SabreR command needed
to fit this model is:
sabre.model.2 <- sabre(ghq~1+dg2,case=student,
adaptive.quad=TRUE,first.family="gaussian")
This produces the following output:
Log likelihood = -67.041252
on 20 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
___________________________________________________
cons 10.333 1.7227 5.9985
dg2 -0.33333 0.77579 -0.42967
sigma 1.9003 0.38789 4.8990
scale 5.6568 1.2216 4.6307
The change in deviance upon adding dg2 to the null model is
−2*(−67.132857 + 67.041252) = 0.18321. This change in deviance, when
compared to the chi-squared distribution on one degree of freedom, indi-
cates that there is no evidence of a significant difference in mean scores
between the two occasions. The difference in mean scores, for occasion
2 relative to occasion 1, is estimated to be −0.33333, which is not sig-
nificantly different from zero, as shown by the Z-score of −0.42967.
Mixed models for continuous/interval scale data 61
5.12 Application of a two-level linear model
Example 5.12.1 Mathematics achievement
We revisit Example 2.7.1, in which we used a standard linear re-
gression to model the outcome yij (mathach), a standardized measure
of mathematics achievement in the dataset hsb.tab . The student-level
(level-one) factors were gender (1: female; 0: male), minority (1: yes;
0: no) and socio-economic status (ses). In the current example, we wish
to add two school-level (level-two) explanatory variables: sector (1:
Catholic; 0: public) and meanses, which is an average of the student
ses values within each school. The two variables ses and meanses have
been centred at the grand mean.
Possible research questions arising from these data might include the
following:
• How much do the high schools vary in their mean mathematics
achievement?
• Do schools with high meanses also have high mathematics achieve-
ment?
• Is the strength of association between student ses and mathach
similar across schools?
• Is ses a more important predictor of mathematics achievement in
some schools than in others?
• How do public and Catholic schools compare in terms of mean
mathach and in terms of the strength of its relationship with ses,
after controlling for meanses?
To obtain some preliminary information about how much variation
in the outcome lies within and between schools, we may fit the simplest
model, the one-way ANOVA model, to the data. The student-level model
is:
yij = β 0j + εij ,
where yij is mathach, for i = 1, · · · , nj students in school j, and j =
1, · · · , 160 schools. At the school level (level two), each school’s mean
mathematics achievement, β 0j , is represented as a function of the grand
mean, γ 00 , plus a random error, u0j . We refer to the variance of u0j as
62 Multivariate Generalized Linear Mixed Models Using R
the school-level variance and to the variance of εij as the student-level
variance.
The combined model is given by:
yij = γ 00 + u0j + εij .
The data can be read into SabreR and this null model can be esti-
mated. The SabreR command required to fit the null model is:
sabre.model.1 <- sabre(mathach~1,case=school,
first.mass=64,first.family="gaussian")
This command results in the output below:
Log likelihood = -23557.905
on 7182 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
__________________________________________________
cons 12.637 0.24359 51.876
sigma 6.2569 0.52794E-01 118.51
scale 2.9246 0.18257 16.019
The estimate of the grand mean, γ 00 , is 12.637. This mean can
be interpreted as the expected value of the mathematics achievement
score for a student drawn at random from a school selected at ran-
dom. The sigma and scale parameters are estimated to be 6.2569 and
2.9246 respectively. The estimate of the within-school variance compo-
nent, (6.2569)2 = 39. 149, is nearly five times the size of the between-
school variance component, (2.9246)2 = 8.5533. These variance compo-
nent estimates give an intraclass correlation coefficient estimate of ρ̂ =
8.5533/(8.5533 + 39.149) = 0.179 indicating that about 18% of the vari-
ance in mathematics achievement is between schools.
The simplest model yij = β 0j + εij provides a baseline against which
we can compare more complex models. We begin with the inclusion
of one level-two variable, meanses, which indicates the average ses of
children within each school. Each school’s mean is now predicted by the
meanses of that school:
β 0j = γ 00 + γ 01 meansesj + u0j ,
where γ 00 is the intercept, γ 01 is the effect of meanses on β 0j and we
Mixed models for continuous/interval scale data 63
assume u0j ∼ N 0, σ 2u0 . Substituting the level-two equation into the
level-one model yields:
mathachij = [γ 00 + γ 01 meansesj ] + [u0j + εij ].
This model is the sum of two parts: a fixed part and a random part.
The two terms in the first pair of square brackets represent the fixed
part, which is a function of the two regression coefficients: the intercept
and the effect of meanses. The two terms in the second pair of square
brackets represent the random part, consisting of the u0j (which repre-
sents variation between schools) and the εij (which represents variation
within schools). The variance components σ 2u0 and σ 2ε now have differ-
ent meanings. In the model yij = β 0j + εij , there were no explanatory
variables, so σ 2u0 and σ 2ε were unconditional components. Once meanses
has been added to the model, σ 2u0 and σ 2ε become conditional compo-
2
nents. The variance σ u0 is a residual or conditional variance, that is,
var β 0j |meanses , the school-level variance in β 0j after controlling for
school meanses.
The SabreR command needed to fit this model is:
sabre.model.2 <- sabre(mathach~1+meanses,case=
school,first.mass=64,first.family="gaussian")
This command results in the following output:
Log likelihood = -23479.554
on 7181 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_____________________________________________________
cons 12.650 0.14834 85.274
meanses 5.8629 0.35917 16.324
sigma 6.2576 0.52800E-01 118.51
scale 1.6103 0.12314 13.078
The estimated regression equation is given by:
mathachij = [12.650 + 5.8629 meansesj ] + [u0j + εij ].
The coefficient of cons, 12.65, estimates γ 00 , the mean mathematics
achievement when the remaining predictors (here, just meanses) are 0.
The explanatory variable meanses is centred at the grand mean, so γ 00 is
the estimated mathach in a school of ‘average meanses’. The regression
64 Multivariate Generalized Linear Mixed Models Using R
coefficient of meanses, 5.8629, provides our estimate of the other fixed
effect, γ 01 , and tells us about the relationship between mathematics
achievement and meanses. Every unit increase in meanses results in an
average increase in mathach of 5.8629.
The conditional component for the within-school variance (the resid-
ual component representing σ 2ε ) has remained virtually unchanged (going
from (6.2569)2 to (6.2576)2 ). The variance component representing vari-
ation between schools, however, has diminished markedly (going from
2
(2.9246)2 to (1.6103) ). This indicates that the predictor meanses ex-
plains a large proportion of the between-school variation in mean math-
ematics achievement.
The estimated ρ is now a conditional intraclass correlation coeffi-
cient and measures the degree of dependence among observations within
schools after controlling for the effect of meanses. The conditional esti-
mate of:
2 2 2
ρ̂ = (1.6103) /((1.6103) + (6.2576) ) = 0.062
is smaller than the unconditional one (0.179).
We can then add the level-one (student level) explanatory variables
ses, minority and gender to the model. At the student level:
yij = β 0j + β 1j sesij + β 2j minorityij + β 3j genderij + εij .
At the school level:
β 0j = γ 00 + γ 01 meansesj + u0j ,
where u0j ∼ N 0, σ2u0 , and:
β pj = γ p0 , for p = 1, 2, 3.
In the combined form, the model is:
yij = γ 00 + γ 01 meansesj + γ 10 sesij + γ 20 minorityij
+ γ 30 genderij + u0j + εij .
The SabreR command required to fit this model is:
sabre.model.3 <- sabre(mathach~1+meanses+
ses+minority+gender,case=school,
first.mass=64,first.family="gaussian")
Mixed models for continuous/interval scale data 65
This produces the following output:
Log likelihood = -23166.634
on 7178 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
________________________________________________
cons 14.048 0.17491 80.316
meanses 2.8820 0.36521 7.8915
ses 1.9265 0.10844 17.765
minority -2.7282 0.20412 -13.365
gender -1.2185 0.16082 -7.5769
sigma 5.9905 0.50554E-01 118.50
scale 1.5480 0.11885 13.024
We can compare the goodness of fit of this model with that of
the homogeneous model, that is, the same model but without the ran-
dom effect, using the analysis of deviance outlined in Section 2.6. The
log likelihoods for the homogeneous and random effects models are
−23285.328 and −23166.634 on 7179 and 7178 residual degrees
of freedom respectively. The corresponding χ2 improvement is
−2(−23285.328 + 23166.634) = 237.39. When referred to the χ2 distri-
bution on one degree of freedom, this change in deviance is highly signif-
icant, thereby justifying the additional scale parameter in the random
effects model. The estimate of this scale parameter is highly significant
with a value of 1.5480 (standard error 0.11885). This suggests that stu-
dents in the same school have correlated responses.
The estimates of the residual variance σ 2ε and the random intercept
variance σ 2u0 are lower in the random effects model than those in the null
model. This shows that a part of the variability in mathach is explained
by including the explanatory variables at both levels. The residual intr-
aclass correlation coefficient is estimated by:
(1.5480)2
ρ̂ = = 0.062595.
(1.5480)2 + (5.9905)2
The intraclass correlation coefficient for the null model was 0.179. The
residual (or between-student) variation clearly dominates this model.
The explanatory variables have accounted for a good deal of the level-
two variance. Their parameter estimates generally have larger standard
errors in the random effects model than they do in the homogeneous
model. Students that are from an ethnic minority do worse than those
who are not. Female students seem to perform more poorly than males.
The higher the socio-economic status score, the better a student fares.
66 Multivariate Generalized Linear Mixed Models Using R
5.13 Two-level growth models
The multi-level model is very useful for analysing repeated measures, or
longitudinal data. For example, we can have a two-level model, with the
measurement occasions at the level-one units and the individuals at the
level-two units.
5.13.1 A two-level repeated measures model
In the simplest repeated measures model, there are no explanatory vari-
ables except for the measurement occasions, i.e.,
yij = γ 00 + αi + u0j + εij ,
where yij denotes the measurement for individual j at time i, u0j is a
random effect for individual j, αi is the fixed effect of time i, and εij
is a random error component which is specific to individual j at time
i. The usual assumptions
are that the random effects u0j are indepen-
dent N 0, σ 2u0 , the random errors εij are independent N 0, σ 2ε , and
the random effects u0j and the random error terms εij are independent.
With a constant in the model,
P the fixed effects αi are assumed to satisfy
the sum to zero constraints i αi = 0. Various structures have been pro-
posed for the relationship between the αi . Since repeated measurements
obtained over time are naturally ordered, it may be of interest to char-
acterize trends over time using low order polynomials. This approach to
the analysis of repeated measurements is called growth curve analysis.
5.13.2 A linear growth model
A simple growth model is defined as:
yij = β 0j + β 1 tij + εij ,
where tij is the age at time i for individual j. Here β 1 is the growth rate
for all individuals j over the data collection period and represents the ex-
pected change during a fixed unit of time. The intercept parameter (β 0j )
is the expected ability of individual j at tij = 0. The specific meaning
of β 0j depends on the scaling of the age measure. An important fea-
ture of this model is the assumption that the intercept parameters vary
across individuals. We use a two-level model to represent this variation.
Specifically, these parameters are allowed to vary at level two:
β 0j = u0j ,
Mixed models for continuous/interval scale data 67
where the variables (u0j ) are assumed to have a normal distribution with
expectation 0 and variance σ 2u0 .
5.13.3 A quadratic growth model
In a quadratic growth model, we include the squared value of time, t2ij .
The model at level one is now of the form:
yij = β 0j + β 1 tij + β 2 t2ij + εij .
We may assume further that there is some meaningful reference value
for tij , such as t0 . This could refer, for example, to one of the time
points, such as the first. The choice of t0 affects only the parameter
interpretation, not the fit of the model. At level two, we have:
β 0j = u0j
γ p0 = β p .
A modification of the above equation is then given by:
2
yij = γ 10 (tij − t0 ) + γ 20 (tij − t0 ) + u0j + εij
X
yij = γ p0 xpij + u0j + εij .
p
5.14 Likelihood
The likelihood associated with the two-level growth models discussed in
the previous section is:
+∞
Z
Y Y
L γ, σ 2ε , σ 2u0 |y, x = g (yij |xij , u0j ) f (u0j ) du0j ,
j −∞ i
where: 2 !
1 yij − µij
g (yij |xij , u0j ) = √ exp − ,
2πσ ε 2σ 2ε
X
µij = γ p0 xpij + u0j ,
p
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
68 Multivariate Generalized Linear Mixed Models Using R
5.15 Example using linear growth models
Example 5.15 Student evaluation of teachers
Snijders and Bosker [98] analyzed the development over time of
teacher evaluations by classes of students. Starting from the first year of
their career, 51 teachers were evaluated on their interpersonal behaviour
in the classroom. This happened repeatedly, at intervals of about one
year. In this example, results are presented about the ‘proximity’ di-
mension, representing the degree of cooperation or closeness between a
teacher and his or her students. The higher the proximity score of a
teacher, the more cooperation is perceived by his or her students. There
are four measurement occasions: after 0, 1, 2 and 3 years of experience.
Non-response at various times is treated as ignorable.
The first model to be estimated has the same population mean for
the four measurement occasions:
Yij = γ 00 + u0j + εij .
The second model we estimate allows the means to vary freely over time:
Yij = α1 d1i + α2 d2i + α3 d3i + α4 d4i + u0j + εij .
This model does not have the constant term γ 00 . We use standard Gaus-
sian quadrature with 64 mass points and starting values for the random
intercept variances of both models. The SabreR command required to
fit the first model is:
sabre.model.1 <- sabre(proximity~1,case=teacher,
first.mass=64,first.family="gaussian")
This leads to the following output:
Log likelihood = -61.688017
on 150 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
___________________________________________________
cons 0.64795 0.53346E-01 12.146
sigma 0.27155 0.19025E-01 14.273
scale 0.34388 0.42213E-01 8.1461
Mixed models for continuous/interval scale data 69
The SabreR command needed to fit the second model is:
sabre.model.2 <- sabre(proximity~d1+d2+d3+d4-1,
case=teacher,first.mass=64,
first.family="gaussian")
This produces the following output:
Log likelihood = -59.146451
on 147 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
___________________________________________________
d1 0.58508 0.62624E-01 9.3427
d2 0.71760 0.66132E-01 10.851
d3 0.67158 0.66631E-01 10.079
d4 0.63893 0.69505E-01 9.1926
sigma 0.26500 0.18573E-01 14.268
scale 0.34532 0.41967E-01 8.2284
The four occasion-specific means estimated in the second model sug-
gest an increase in proximity score from time 0 to time 1 and then
a decrease. However, the likelihood ratio test for the difference be-
tween the two random effects models is not significant: the change
in deviance = −2(−61.68802 + 59.14645) = 5.0831, resulting in a p-
value of 0.1658. Perhaps a model which uses a quadratic in time would
be more parsimonious. The results also suggest that individual (level-
two) variation is more important than differences between occasions
(level-one variation). The estimates σ̂ 2ε = (0.26500)2 = 0.070 and
2
σ 2u0 = (0.34532) = 0.119 indicate that the overall variation in proximity
scores is 0.119 + 0.070 = 0.189 and the intraclass correlation coefficient
is ρ̂ = 0.119/0.189 = 0.629.
5.16 Exercises using mixed models for continu-
ous/interval scale data
Exercise 5.16.1. Psychological distress
70 Multivariate Generalized Linear Mixed Models Using R
We revisit Exercise 2.8.1, in which standard linear models were used
to analyze data (ghq2.tab) on the psychological distress of students. In
the current exercise, you are asked to replicate the results presented in
Sections 5.2 and 5.11:
1. Use SabreR to fit the null linear mixed model on the psychological
distress score (ghq), that is, the model which includes a constant
term and the student-specific random effect. Use adaptive quadra-
ture with 12 mass points. Obtain the log likelihood, parameter esti-
mates and standard errors. Interpret the parameter estimates. Are
the student-specific random effects significant? What does this sig-
nificance mean? What impact do the student random effects have
on the model?
2. Add the occasion 2 identifier (dg2) to this model and re-fit the
model. Obtain the log likelihood, parameter estimates and stan-
dard errors. Compare the deviances (−2 times log likelihoods) and
use the Z-score to test whether the effect of dg2 is significantly
different from zero. Interpret the results.
Exercise 5.16.2. Essay grading (continuous response)
We revisit Exercise 2.8.2, in which standard linear models were ap-
plied to data (grader1.tab) on the grading of essays by two markers.
1. Use SabreR to fit the null mixed model on grade, that is, the
model which includes a constant term and essay-specific random
effect, using 20 mass points: mass 20. Obtain the log likelihood,
parameter estimate and standard error. Are the essay effects signif-
icant? What impact do they have on the model? Try using adaptive
quadrature to see if fewer mass points are sufficient.
2. Add dg4 to this model and re-fit the model. Use adaptive quadra-
ture with an increasing number of mass points until likelihood
convergence occurs. Obtain the log likelihood, parameter estimates
and standard errors. Compare the deviances (−2 times log likeli-
hoods) and use the Z-scores to test whether the effect of dg4 is
significantly different from zero. Interpret the results.
Exercise 5.16.3. Educational attainment
We return to Exercise 2.8.3, in which standard linear models were
applied to data (neighbourhood.tab) on the educational attainment of
young people:
Mixed models for continuous/interval scale data 71
1. Use SabreR to fit the null linear mixed model on attainment
(attain), that is, the model which includes the constant term and
the school-specific random effect (schid). Obtain the log likeli-
hood, parameter estimate and standard error. Interpret the pa-
rameter estimate.
2. Add the student-specific explanatory variables. Increase the num-
ber of mass points until the likelihood converges. How does the
magnitude of the school random effect change?
3. Add the neighbourhood effect (deprive). Check the number of
mass points required. Obtain the log likelihood, parameter esti-
mates and standard errors. How does the magnitude of the school-
specific random effect change?
4. A dataset sorted by the neighbourhood identifier (neighid)
is available on the Sabre web page. This dataset is called
neighbourhood2.tab. Re-fit the constant-only model, allowing for
the neighbourhood-specific random effect (neighid). Use adaptive
quadrature with 12 mass points. Is there a significant neighid
random effect?
5. Add the student-specific effects. How does the magnitude of the
neighid random effect change?
6. Add the observed neighbourhood effect deprive to the model. How
does the magnitude of the neighid random effect change?
7. What do the results of using either the schid or the neighid ran-
dom effects tell you about what effects are needed in the modelling
of attainment with this dataset? What do the two sets of results
suggest?
Exercise 5.16.4. Unemployment claims
We return to Exercise 2.8.4, in which standard linear models were
applied to the dataset ezunem2.tab in order to test whether the log
number of unemployment claims varied significantly between districts in
an enterprise zone and those not in an enterprise zone:
1. Use SabreR to fit the null linear mixed model on the log number
of unemployment claims (luclms), that is, the model which in-
cludes the constant term and the city-specific random effect. Use
adaptive quadrature with 12 mass points. Obtain the log likeli-
hood, parameter estimates and standard errors. Is this random
effect significant?
72 Multivariate Generalized Linear Mixed Models Using R
2. Add the binary ez effect to this model and re-fit the model. Ob-
tain the log likelihood, parameter estimates and standard errors.
How does the magnitude of the scale parameter estimate for the
city-specific random effect change? Is the enterprise zone effect sig-
nificant in this model?
3. Add the linear time effect (t) to this model and re-fit the model.
How does the magnitude of the city-specific random effect change?
Interpret your preferred model. Does ez have a significant effect
on the response, log number of unemployment claims?
Exercise 5.16.5. Wage determinants
We revisit Exercise 2.8.5, in which standard linear models were used
to analyze data (wagepan.tab) on the wages of young males:
1. Use SabreR to fit the null linear mixed model on log hourly wage
(lwage), that is, the model which includes the respondent-specific
random effect (nr). Use adaptive quadrature with 12 mass points.
Obtain the log likelihood, parameter estimates and standard er-
rors. Interpret the parameter estimates. Is this random effect sig-
nificant?
2. Add the respondent-specific explanatory variables and year to this
model and re-fit the model. How does the magnitude of the scale
parameter for the random effects change?
3. Add the interaction effects between year and education to the pre-
vious model. Do the effects of education vary with year? What do
the results show?
Exercise 5.16.6. Pupil rating of school managers
We return to Exercise 2.8.6, in which standard linear models were
applied to data (manager.tab) on how pupils rated their school man-
agers/directors:
1. Use SabreR to fit the null linear mixed model on item responses
(scores), that is, the model which includes the pupil-specific ran-
dom effect (id). Use adaptive quadrature with 12 mass points. Ob-
tain the log likelihood, parameter estimates and standard errors.
Interpret the parameter estimates. Is this random effect signifi-
cant?
2. Add the pupil-specific explanatory variable, pupil gender (pupsex),
Mixed models for continuous/interval scale data 73
to this model and re-fit the model. Obtain the log likelihood, pa-
rameter estimates and standard errors. Compare the deviances (−2
times log likelihoods) and use the Z-score to test whether the effect
of pupsex is significantly different from zero. Interpret the results.
Is the random effect significant?
This page intentionally left blank
6
Mixed models for binary data
6.1 Introduction
In this chapter, we return to the analysis of binary data. The standard
models for binary data, such as the logit and probit models, were in-
troduced in Chapter 3, Section 3.1. In the current chapter, we discuss
how these models may be extended to handle hierarchical or multi-level
binary data.
6.2 The two-level logistic model
We start by introducing a simple two-level model to illustrate the anal-
ysis of two-level binary data. Let j denote the level-two units (clusters)
and i denote the level-one units (nested observations). Assume that there
are j = 1, 2, · · · , m level-two units and i = 1, 2, · · · , nj level-one units
nested within each level-two unit j. The total number of level-one ob-
Pm
servations across level-two units is given by n = nj .
j=1
For a multi-level representation of a simple model with only one
explanatory variable xij , the level-one model is written in terms of the
∗
latent response variable yij as:
∗
yij = β 0j + β 1j xij + εij ,
and the level-two model becomes:
β 0j = γ 00 + u0j ,
β 1j = γ 10 .
75
76 Multivariate Generalized Linear Mixed Models Using R
∗
In practice, yij is unobservable, and this can be measured indirectly
by an observable binary variable yij defined by:
∗
1 if yij >0
yij =
0 otherwise,
such that:
∗
Pr (yij = 1 | xij , u0j ) = Pr yij > 0 | u0j
= Pr (γ 00 + γ 10 xij + u0j + εij > 0 | u0j )
= Pr (εij > − {γ 00 + γ 10 xij + u0j } | u0j )
Z ∞
= f (εij | u0j ) dεij
−{γ 00 +γ 10 xij +u0j }
= 1 − F (− {γ 00 + γ 10 xij + u0j })
= µij .
For symmetric distributions like the normal or logistic, for
f (εij | u0j ), we have:
1 − F (− {γ 00 + γ 10 xij + u0j }) = F (γ 00 + γ 10 xij + u0j ) ,
where F (·) is the cumulative distribution function of εij .
As in Chapter 3, Sub-section 3.1.2, we view the observed values yij
as a realisation of a random variable Yij that can take the values one
and zero with probabilities µij and 1 − µij respectively. The distribution
of yij is Bernoulli with parameter µij :
y 1−yij
g(yij |xij , u0j ) = µijij 1 − µij , yij = 0, 1.
To proceed, we need to impose an assumption about the distributions
of u0j and εij . As in the case of the mixed model for continuous/interval
scale data (see
Chapter 5, Section 5.6), we assume that u0j is distributed
as N 0, σ2u0 . Then, if the cumulative distribution of εij is assumed to
be logistic, we have the multi-level (or mixed) logit model. If we assume
that εij ∼ N (0, 1), we have the mixed probit model.
We complete the specification of the mixed logit model by expressing
the functional form for µij in the following manner:
exp (γ 00 + γ 10 xij + u0j )
µij = .
1 + exp (γ 00 + γ 10 xij + u0j )
The mixed probit model is based upon the assumption that the distur-
bances εij are independent standard normal variates, such that:
µij = Φ(γ 00 + γ 10 xij + u0j ),
where Φ (·) denotes the cumulative distribution function for a standard
normal variable.
Mixed models for binary data 77
6.3 General two-level logistic models
Suppose the observed binary responses are binomially distributed,
such that yij ∼ bin 1, µij , with conditional variance var(yij |µij ) =
µij 1 − µij . The multilevel logistic regression model, or mixed logit
model, with P level-one explanatory variables x1 , · · · , xP and Q level-
two explanatory variables z1 , · · · , zQ has the following form:
P
X Q
X
logit(µij ) = γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
where it is assumed that u0j has a normal distribution with zero mean
and variance σ 2u0 .
6.4 Intraclass correlation coefficient
For binary data, the intraclass correlation coefficient is often expressed
in terms of the correlation between the latent responses y ∗ . The logistic
distribution for the level-one residual, εij , implies a variance of π 2 /3 =
3.29. This means that, for a two-level random intercept logit model with
an intercept variance of σ 2u0 , the intraclass correlation coefficient is:
σ 2u0
ρ= .
σ 2u0 + π 2 /3
For the probit model, we assume that εij ∼ N (0, 1), and the level-one
residual variance of the unobservable variable y ∗ is fixed at 1 [96]. Thus,
for a two-level random intercept probit model, this type of intraclass
correlation coefficient becomes:
σ 2u0
ρ= .
σ 2u0 + 1
78 Multivariate Generalized Linear Mixed Models Using R
6.5 Likelihood
The likelihood associated with the mixed models for binary data consid-
ered in this chapter is:
+∞
YZ Y
L γ, σ 2u0 |y, x, z = g (yij |xij , zj , u0j ) f (u0j ) du0j ,
j −∞ i
where: 1−yij
y
g (yij |xij , zj , u0j ) = µijij 1 − µij ,
( P Q
)!
X X
µij = 1 − F − γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
SabreR evaluates the integral L γ, σ 2u0 |y, x, z for the binary re-
sponse model using standard Gaussian quadrature or adaptive Gaussian
quadrature (numerical integration). There is not an analytic solution for
this integral with normally distributed u0j . For further discussion on
binary response models with random intercepts, see [65, 90, 106].
6.6 Example using binary data
Example 6.6.1 Repeating a grade
In Example 3.1.6, we used a standard logistic regression model to an-
alyze data (thaieduc1.tab) on whether or not children had to repeat a
grade during their time at primary school. A second version of these data
(thaieduc2.tab) contains the same set of variables as thaieduc1.tab
with the addition of one further variable, a school-level variable msesc:
mean socio-economic status score. The second dataset (thaieduc2.tab)
has fewer cases (7,516) than the first (thaieduc1.tab) because of miss-
ing values on the additional school-level covariate, msesc. We model this
second version of the data.
Mixed models for binary data 79
As in Example 3.1.6, we take repeat to be the binary response vari-
able, the indicator of whether a child has ever repeated a grade (0 = no,
1 = yes). The level-one explanatory variables are sex (0 = girl, 1 = boy)
and child pre-primary education pped (0 = no, 1 = yes). The probabil-
ity that a child will repeat a grade during the primary years, µij , is of
interest.
We extend the analysis of Example 3.1.6 by incorporating a school-
specific random effect into the logit modelling framework. First, we es-
timate a two-level model which includes only a constant term and the
school-specific random effect:
logit µij = γ 00 + u0j ,
where u0j ∼ N 0, σ 2u0 . This will allow us to determine the magnitude
of variation between schools in grade repetition.
The SabreR command required to fit this model is:
sabre.model.1 <- sabre(repeat~1,case=schoolid)
This results in the following output:
Log likelihood = -2770.8326
on 7514 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
______________________________________________________
cons -2.1578 0.98557E-01 -21.894
scale 1.3151 0.11272 11.667
For this model, the estimated average log odds of repetition across
primary schools, γ 00 , is −2.1578, and the variance between schools in
2
terms of their log odds of repetition, σ 2u0 , is (1.351) = 1.8252.
The estimate of the intraclass correlation coefficient is given by
1.8252
ρ̂ = = 0.3568.
(1.8252 + π 2 /3)
Then we estimate a two-level model which includes the school-level vari-
able msesc and the child-level variables sex and pped:
logit µij = γ 00 + γ 10 msescj + γ 20 sexij + γ 30 ppedij + u0j .
80 Multivariate Generalized Linear Mixed Models Using R
The SabreR command needed to fit this model is:
sabre.model.2 <- sabre(repeat~msesc+sex+pped+1,
case=schoolid)
This produces the following output:
Log likelihood = -2720.7581
on 7511 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_____________________________________________________
cons -2.2280 0.10461 -21.298
msesc -0.41369 0.22463 -1.8417
sex 0.53177 0.75805E-01 7.0150
pped -0.64022 0.98885E-01 -6.4744
scale 1.3026 0.72601E-01 17.941
The variance between schools in thaieduc2.tab for the logit model
2
with msesc, sex and pped, σ 2u0 , is (1.3026) = 1.6968, which is highly
significant. The estimate of the residual intraclass correlation coefficient
is:
1.6968
ρ̂ = = 0.34027.
1.6968 + π 2 /3
The addition of msesec to the model has contributed towards a modest
reduction in the amount of between-school variation left unexplained, as
a proportion of the overall variation in the response.
As sex is a dummy variable indicating whether the pupil is a girl or
a boy, it can be helpful to write down a pair of fitted models, one for
each gender. By substituting the values 1 for a boy and 0 for a girl in
sex, we get the boys’ constant term −2.2280 + 0.53177 = −1.6962, and
we can write:
logit µij ; girl = −2.2280 − 0.4137msescj − 0.64022ppedij + u0j ,
logit µij ; boy = −1.6962 − 0.4137msescj − 0.64022ppedij + u0j .
The intercepts in these two models are quite different: the girls’ inter-
cept is lower than that of the boys. The regression coefficient for sex is
positive and significantly different from zero. These results indicate that,
on average, the boys are significantly more likely to repeat a grade than
the girls.
Mixed models for binary data 81
6.7 Exercises using mixed models for binary data
Exercise 6.7.1. Essay grading (binary response)
In Exercise 3.4.1, standard logistic regression models were applied
to data (essays2.tab) on the grading of essays by four markers. In
the current exercise, we extend that analysis by incorporating an essay-
specific random effect:
1. Use SabreR to fit a null mixed logit model of pass, with essay
as the random effect. Obtain the log likelihood, parameter esti-
mates and standard errors. Interpret the parameter estimates. Is
the essay random effect significant? How many adaptive quadra-
ture points should we use to estimate this model?
2. Add the four grader dummy variables to the model. Obtain the
log likelihood, parameter estimates and standard errors. Compare
the deviances (−2 times log likelihoods) and inspect the Z-scores.
What are the differences between the graders?
3. Add the six essay characteristics (wordlength, sqrtwords,
sentlength, prepos, commas, errors) to the model. Obtain the
log likelihood, parameter estimates and standard errors. Compare
the deviances (−2 times log likelihoods) and inspect the Z-scores.
Which of the essay characteristics are significant? How has includ-
ing the essay characteristics improved the model?
4. Create interaction effects between the grader-specific dummy vari-
ables and the explanatory variable sqrtwords. Add these effects
to the model. What do these results tell you?
Exercise 6.7.2. Trade union membership
In Exercise 3.4.2, standard logistic regression models were applied
to data (wagepan.tab) to examine the relationship between trade union
membership and a range of explanatory variables. In the current exercise,
we extend that analysis by incorporating an individual-specific random
effect:
1. Use SabreR to fit a null mixed logit model of union, with adaptive
quadrature. Treat respondent identifier (nr) as the random effect.
Obtain the log likelihood, parameter estimates and standard er-
rors. Interpret the parameter estimates. Is the random effect sig-
nificant? How many quadrature points should we use to estimate
this model?
82 Multivariate Generalized Linear Mixed Models Using R
2. Add the explanatory variables black, hisp, exper, educ,
poorhlth and married to the model. How does the magnitude of
the nr random effect change? Are any of these individual-specific
characteristics significant in this model? Do the results make sense
intuitively?
3. Add the contextual explanatory variables rur, nrthcen, nrtheast
and south to the model. How does the magnitude of the scale
parameter change? Are any of the contextual explanatory vari-
ables significant in this model? Do these new results make sense
intuitively?
4. Add the dummy variables for year (d81 to d87) to the model. Are
any of the year indicator variables significant in this model? Do
the new results make sense intuitively?
5. Create interaction variables between rur and nrthcen, nrtheast
and south. Add these interaction effects to the model. Are any of
these new effects significant?
6. How can the final model be simplified? Interpret your preferred
model.
Exercise 6.7.3. Tower of London
In Exercise 3.4.3, standard logistic regression models were applied
to data (tower1.tab) to test whether the odds of completing the Tower
of London task in the minimum number of moves varied significantly
between different levels of task difficulty. In the current exercise, we ex-
tend that analysis by incorporating an individual-specific random effect
and by comparing the three groups of participants: schizophenics, their
relatives and the controls:
1. Use SabreR to fit a null mixed logit model of dtlm, with participant
identifier (id) as the random effect. Use adaptive quadrature with
12 mass points. Obtain the log likelihood, parameter estimates and
standard errors. Interpret the parameter estimates. Is the random
effect significant?
2. Add the covariate level to the model. Obtain the log likelihood,
parameter estimates and standard errors. Compare the deviances
(−2 times log likelihoods) and use the Z-score to test whether there
is a significant effect of level of difficulty. Interpret the results.
3. Create indicator variables for group=2 (relatives) and for group=3
(schizophrenics). Add these variables to the model. Compare the
Mixed models for binary data 83
deviances (−2 times log-likelihoods) and use the Z-scores to test
whether there is a significant difference in performance between the
schizophrenics and their relatives when compared to the control
group of participants.
Exercise 6.7.4. Immunization of Guatemalan children
In Exercise 3.4.4, standard logistic regression models were applied to
data (guatemala immun.tab) to examine the relationship between the
odds of a child being immunized and a set of child-specific explanatory
variables. In the current exercise, we extend that analysis by incorpo-
rating a family-specific random effect and by adding a range of family-
specific factors:
1. Use SabreR to fit a null binary logistic mixed model of immun.
Allow for the family random effect (mom) and use adaptive quadra-
ture with 24 mass points. Obtain the log likelihood, parameter
estimates and standard errors. Interpret the parameter estimates.
Is this random effect significant?
2. Add the child-specific explanatory variables: age (kid2p) and birth
order (order23, order46 and order7p) to the model. Obtain the
log likelihood, parameter estimates and standard errors. Compare
the deviances (−2 times log likelihoods) and use the Z-scores to
test whether there is a significant effect of age and birth order.
Interpret the results.
3. Add the family-specific factors: mom25p, indnospa, indspa,
momedpri, momedsec, husedpri, husedsec, huseddk, momwork,
rural and pcind81 to the model. Obtain the log likelihood, pa-
rameter estimates and standard errors. Compare the deviances (−2
times log likelihoods) and inspect the Z-scores. Interpret the re-
sults. Which of the family-specific factors are significant? Is the
family-specific random effect significant?
This page intentionally left blank
7
Mixed models for ordinal data
7.1 Introduction
In this chapter, we revisit the analysis of ordinal categorical data. The
standard models for ordinal data, such as the ordered logit and probit
models, were introduced in Chapter 3, Section 3.2. In the current chap-
ter, we discuss how these models may be extended to handle hierarchical
or multi-level data.
7.2 The two-level ordered logit model
Recall from Sub-section 3.2.2 that, in the absence of explanatory vari-
ables and random intercepts, the response variable yij takes value c with
probability:
pij(c) = P r(yij = c),
for c = 1, · · · , C. The corresponding cumulative response probabilities
for the C categories of the ordinal outcome yij are defined as:
c
X
Pij(c) = P r(yij ≤ c) = pij(k) , c = 1, · · · , C.
k=1
The cumulative probability for the last category C is 1; in other
words, Pij(C) = 1. Only the first (C − 1) cumulative probabilities Pij(c)
need to be estimated. In the presence of explanatory variables and ran-
dom intercepts, the level-one model becomes:
" # P
!
P r yij ≤ c | xij , β 0j X
log = γ c − β 0j + β p xpij ,
1 − P r yij ≤ c | xij , β 0j p=1
where γ c is the threshold parameter for category c = 1, · · · , C − 1.
85
86 Multivariate Generalized Linear Mixed Models Using R
The level-two model takes the usual form:
Q
X
β 0j = γ 00 + γ 0q zqj + u0j ,
q=1
where the random effects u0j are normally distributed.
Note that the model which includes the intercept parameter γ 00 and
the threshold γ 1 is not identifiable. Let us consider a simple intercept
model with no explanatory variables. For the first category, we have:
P r (yij ≤ 1 | u0j )
log = γ 1 − (γ 00 + u0j ) .
1 − P r (yij ≤ 1 | u0j )
From this equation, it is apparent that parameters γ 1 and γ 00 cannot
be estimated separately and therefore these parameters are not identi-
fiable. In order to achieve identifiability, either the first threshold γ 1 or
the intercept γ 00 may be fixed at zero. The SabreR specification sets
γ 00 = 0.
7.3 Likelihood
The likelihood associated with the two-level ordered logit model is:
+∞
YZ Y
L γ, σ 2ε , σ 2u0 |y, x, z = g (yij |xij , zj, u0j ) f (u0j ) du0j ,
j −∞ i
where:
Y yijc
g (yij |xij , zj, u0j ) = P r(yij = c) ,
c
Y yijc
= Pij(c) − Pij(c−1) ,
c
and yijc = 1 if yij = c, and yijc = 0 otherwise,
( P Q
)!!
X X
Pij(c) = P r εij ≤ γ c − γ 00 + γ p0 xpij + γ 0q zqj + u0j
p=1 q=1
( P Q
)!
X X
=F γc − γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
Mixed models for ordinal data 87
where F (·) is the cumulative distribution function of εij and
!
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
SabreR evaluates the integral L γ, σ 2ε , σ 2u0 |y, x, z for the ordered re-
sponse model using standard Gaussian quadrature or adaptive Gaussian
quadrature (numerical integration). There is no analytical solution for
this integral with normally distributed u0j .
For further discussion on ordered response models with random in-
tercepts, see [90, 106].
7.4 Example using mixed models for ordered data
Example 7.4.1 Choosing teaching as a profession
We revisit Example 3.2.5, in which we analyzed data (teacher2.tab)
on teachers’ responses to the following question: ‘If you could go back
to college and start all over again, would you again choose teaching as
a profession?’ The response variable, teacher commitment (tcommit),
comprises the three ordered categories: 1 = yes; 2 = notsure; 3 = no.
In Example 3.2.5, we used a standard ordered response model to re-
late the response tcommit to a single teacher-level explanatory variable,
the teachers’ perception of task variety (taskvar). This variable assesses
the extent to which teachers followed the same teaching routines each
day, performed the same tasks each day, had something new happening
in their job each day and liked the variety present in their work.
In the current example, we wish to add a second explanatory variable
to the model. This is a school-level factor which is a measure of teacher
control (tcontrol). This variable was constructed by aggregating nine
item scale scores of teachers within a school. It indicates teacher control
over school policy issues such as student behaviour codes, content of
in-service programmes, student grouping, school curriculum and text
selection; and control over classroom issues such as teaching content and
techniques, and amount of homework assigned.
We also wish to incorporate a school-specific random effect into the
modelling framework by using the school identifier variable, schlid.
88 Multivariate Generalized Linear Mixed Models Using R
The response variable tcommit takes values k = 1, 2, 3. In the absence
of explanatory variables and random effects, these values occur with
probabilities:
pij(1) = P r(yij = 1) = P r(‘Yes’),
pij(2) = P r(yij = 2) = P r(‘Not sure’),
pij(3) = P r(yij = 3) = P r(‘No’).
To assess the magnitude of variation among schools in the absence of
explanatory variables, we specify a simple two-level model. This model
has only the thresholds and the school-specific intercepts as fixed effects:
P r(yij ≤ c | β 0j )
log = γ c − β 0j , c = 1, 2.
P r(yij > c | β 0j )
The two-level model is:
β 0j = γ 00 + u0j ,
which is identifiable when the parameter γ 00 is set to zero. This reduces
the two-level model to β 0j = u0j . We regard the school-specific intercepts
u0j as random effects with variance σ 2u0 .
The SabreR command we need to fit the null model is:
sabre.model.1 <- sabre(tcommit~1-1,case=schlid,
first.family="ordered")
This produces the following output:
Log likelihood = -650.84447
on 647 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
____________________________________________________
cut1 0.22145 0.12173 1.8192
cut2 1.2495 0.13381 9.3382
scale 0.33271 0.14005 2.3756
The results indicate that the estimated values of the threshold pa-
rameters are 0.221 (γ 1 ) and 1.249 (γ 2 ). The estimate of the variance of
2
the school-specific random effects, σ 2u0 , is (0.33271) = 0.11070.
Mixed models for ordinal data 89
The model formulation summarizes the two equations as:
P r(yij ≤ 1 | u0j )
log = 0.221 − u0j ,
P r(yij > 1 | u0j )
P r(yij ≤ 2 | u0j )
log = 1.249 − u0j .
P r(yij > 2 | u0j )
Next, we consider adding the two explanatory variables to this model.
Rowan, Raudenbush and Cheong [91] hypothesized that teachers would
express high levels of commitment if they had a job with a high degree
of task variety and also experienced a high degree of control over school
policies and teaching conditions. Conceptually, task variety varies at the
teacher level, while teacher control varies at the school level.
The level-one model is:
P r(yij ≤ c | xij , β 0j )
log = γ c − β 0j + β 1j taskvarij ,
P r(yij > c | xij , β 0j )
while the level-two model is:
β 0j = γ 01 tcontrolj + u0j ,
β 1j = γ 10 .
The combined model is:
P r(yij ≤ c | xij , zj , u0j )
log
P r(yij > c | xij , zj , u0j )
= γ c − (γ 01 tcontrolj + γ 10 taskvarij + u0j ).
The SabreR command used to fit this model is:
sabre.model.2 <- sabre(tcommit~taskvar+tcontrol-1,
case=schlid,first.family="ordered")
This leads to the following output:
Log likelihood = -634.05978
on 645 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
________________________________________________________
tcontrol -1.5410 0.36060 -4.2735
taskvar -0.34881 0.87745E-01 -3.9753
cut1 0.19283 0.80942E-01 2.3823
90 Multivariate Generalized Linear Mixed Models Using R
cut2 1.2477 0.95459E-01 13.071
scale 0.18773E-06 0.17659 0.10631E-05
The two equations summarizing these results are:
P r(yij ≤ 1 | xij , zj , u0j )
log
P r(yij > 1 | xij , zj , u0j )
= 0.193 − [(−0.349taskvarij − 1.541tcontrolj + u0j )]
= 0.193 + 0.349taskvarij + 1.541tcontrolj − u0j ,
P r(yij ≤ 2 | xij , zj , u0j )
log
P r(yij > 2 | xij , zj , u0j )
= 1.248 + 0.349taskvarij + 1.541tcontrolj − u0j .
The results indicate that, within schools, taskvar is significantly
related to commitment (γ 10 = 0.349, Z − score = 3.98); be-
tween schools, tcontrol is also strongly related to commitment
(γ 01 = 1.541, Z − score = 4.27). Inclusion of tcontrol reduced the
point estimate of the between-school variance to 0.000. This suggests
that we do not need random effects in the model with explanatory
variables.
7.5 Exercises using mixed models for ordinal data
Exercise 7.5.1. Essay grading (ordered response)
In Exercises 2.8.2 and 5.16.2, the original gradings of 198 essays
by five experts were recorded on a 10-point scale and were treated as
continuous/interval scale data. In Exercises 3.4.1 and 6.7.1, the original
grades were converted into a binary response. In Exercise 3.4.5, the
original grades were recoded into an ordinal response ngrade comprising
four categories which are defined in Exercise 3.4.5. The explanatory
variables include the six essay characteristic variables which are listed in
Example 1.7. The data are stored in the file essays ordered.tab.
In Exercise 3.4.5, standard probit models were applied to the data.
In the current exercise, you are asked to fit a series of mixed probit
models:
1. Use SabreR to fit a null mixed ordered probit model of ngrade,
Mixed models for ordinal data 91
with essay as the random effect. Obtain the log likelihood, param-
eter estimates and standard errors. Interpret the parameter esti-
mates. Is the essay random effect significant? How many adaptive
quadrature points should we use to estimate this model?
2. Add the four dummy variables for graders (grader2 to grader5)
to the model. Obtain the log likelihood, parameter estimates and
standard errors. Compare the deviances (−2 times log likelihoods)
and use the Z-scores to test whether there are significant differences
between the graders. Interpret the results.
3. Add the six essay characteristics (wordlength, sqrtwords,
sentlength, prepos, commas, errors) to the previous model. Ob-
tain the log likelihood, parameter estimates and standard errors.
Compare the deviances (−2 times log likelihoods) and inspect the
Z-scores. Which of the essay characteristics are significant? Has
including the essay characteristics improved the model?
4. Create interaction effects between the grader-specific dummy vari-
ables and the explanatory variable sqrtwords. Add these effects
to the model. What do these results tell you?
5. Repeat this exercise using a mixed ordered probit model on the
original response grade. Are there any differences between the two
sets of results obtained for the responses ngrade and grade? What
does this tell you?
Exercise 7.5.2. Attitudes to abortion
Recall from Example 1.12 that BSAS respondents’ strength of sup-
port for legalising abortion was summarized using a response (nscore)
comprising five ordered categories. In Exercise 3.4.6, standard ordered
logit models were applied to the data, which are saved in the file
abortion2.tab. In the current exercise, you are asked to fit a series
of mixed ordered logit models:
1. Use SabreR to fit a null mixed ordered logit model of nscore, with
person identifier (person) as the random effect. Obtain the log
likelihood, parameter estimates and standard errors. Is the random
effect significant? How many adaptive quadrature points should we
use to estimate this model?
2. Add the explanatory variables male, age and the three sets of
dummy variables: religion, political affiliation and self-assessed so-
cial class (dr, dp and dc respectively) to the model. Obtain the log
92 Multivariate Generalized Linear Mixed Models Using R
likelihood, parameter estimates and standard errors. Compare the
deviances (−2 times log likelihoods) and inspect the Z-scores. How
does the magnitude of the person-specific random effect change?
Are any of the individual characteristics significant? Do the results
make sense intuitively?
3. Repeat this exercise using district as the random effect. In or-
der to do this, you will need to use a version of the dataset
(abortion3.tab) which has been sorted by district. Does the
significance of the explanatory variables change? Do the results
make sense intuitively?
4. Interpret the preferred model. Can you simplify this model?
5. Are there any interaction terms you would consider adding to the
model? If so, why?
Example 7.5.3. Respiratory status
Recall from Example 1.13 that the respiratory status of patients
in a clinical trial was regarded as an ordered response comprising five
categories: ‘terrible’ (coded 0), ‘poor’ (1), ‘fair’ (2), ‘good’ (3) and ‘ex-
cellent’ (4). Respiratory status was determined prior to randomisation
(trend = 0) and at four later visits to the clinic (trend = 1,2,3,4). In
Exercise 3.4.7, standard ordered logit models were applied to the data,
which are stored in the file respiratory2.tab. In the current exercise,
you are asked to fit a series of mixed ordered logit models:
1. Use SabreR to fit a null mixed ordered logit model of status,
allowing for the patient random effect. Obtain the log likelihood,
parameter estimates and standard errors. Are the patient-specific
random effects significant? How many adaptive quadrature points
should we use to estimate this model?
2. Add drug, male, age and base to the model. How does the mag-
nitude of the patient-specific random effect change? Are any of
these explanatory variables significant? Do the results make sense
intuitively?
3. Add the linear trend variable to the model. Create an interac-
tion variable between trend and drug. Add this interaction to the
model. Does the impact of treatment vary significantly with visit?
8
Mixed models for count data
8.1 Introduction
In this chapter, we return to analyzing count data. Models for count
data, such as the Poisson model, were introduced in Chapter 3, Section
3.3. In the current chapter, we demonstrate how these models may be
extended to handle hierarchical or multi-level count data.
8.2 The two-level Poisson model
Let yij be the count for level-one unit i in level-two unit j, and µij be
the expected count, given that level-one unit i is in level-two unit j and
given the values of the explanatory variables. Then µij is necessarily a
non-negative number, which could lead to difficulties if we considered
using the identity link function in this context. The natural logarithm
is mostly used as the link function for expected counts. For single-level
data, this leads to the Poisson regression model for the natural logarithm
of the counts, log(µij ). For multi-level data, mixed Poisson models are
considered for the logarithm of µij .
Consider a two-level Poisson model by assuming the level-one units i
are nested within level-two units j. Using the logarithmic transformation,
the level-one model with P explanatory variables x1 , · · · , xP may be
written as:
yij ∼ P oisson(µij ),
P
X
log(µij ) = log(mij ) + β 0j + β pj xpij ,
p=1
where β 0j is an intercept parameter, and β pj , p = 1, · · · , P , are the
regression coefficients associated with the explanatory variables xpij . The
optional term log(mij ) is included in the model as an offset.
93
94 Multivariate Generalized Linear Mixed Models Using R
The level-two Poisson model has the same form as the level-two model
in the linear, binary and ordinal response models. Consider, for example,
the random intercept model which is formulated as a standard Poisson
model plus a random intercept for the logarithm of the expected count.
As we are limiting ourselves to random intercepts, we have:
β pj = γ p0 ,
Q
X
β 0j = γ 00 + γ 0q zqj + u0j ,
q=1
so that:
P
X Q
X
log(µij ) = log(mij ) + γ 00 + γ p0 xpij + γ 0q zqj + u0j .
p=1 q=1
The variance of the random intercepts is denoted by σ 2u0 .
To transform the linear model back to the expected counts, the inverse
transformation of the natural logarithm must be used. Therefore, the
explanatory variables and the level-two random effects in the (additive)
mixed Poisson regression model have multiplicative effects on the ex-
pected counts.
8.3 Likelihood
The likelihood associated with the two-level Poisson model is:
+∞
Z
2
Y Y
L γ, σ u0 |y, x, z = g (yij |xij , zj, u0j ) f (u0j ) du0j ,
j −∞ i
where: y
exp(−µij )µijij
g (yij |xij , zj, u0j ) = ,
yij !
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
SabreR evaluates the integral L γ, σ 2u0 |y, x, z for the mixed Pois-
son model using standard Gaussian quadrature or adaptive Gaussian
quadrature (numerical integration). There is no analytic solution for this
integral with normally distributed u0j . For further discussion on Poisson
models with random intercepts, see [22, 90, 106].
Mixed models for count data 95
8.4 Example using mixed models for count data
Example 8.4.1. Demand for health care
In Example 3.3.4, we used a standard Poisson model to relate the
total number of prescribed medications used in the past two days
(prescrib) to a set of explanatory variables. These included type of
health insurance (levyplus, freepoor and freerepa), gender (sex),
age, income and a number of factors describing the respondent’s state
of health (illness, actdays, hscore, chcond1 and chcond2). The data
are stored in the file racd.tab.
In the current example, like Cameron and Trivedi, we take prescrib
to be the count response variable and apply a mixed Poisson model
with a random intercept and the range of explanatory variables outlined
above. The SabreR command required to fit this model is:
sabre.model <- sabre(prescrib~sex+age+agesq+
income+levyplus+freepoor+freerepa+illness+
actdays+hscore+chcond1+chcond2+1,case=id,
first.family="poisson")
This results in the following output:
Log likelihood = -5443.3311
on 5176 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
______________________________________________________
cons -2.8668 0.14908 -19.230
sex 0.56080 0.43164E-01 12.992
age 2.0861 0.73513 2.8377
agesq -0.26325 0.78264 -0.33636
income 0.30450E-01 0.65221E-01 0.46688
levyplus 0.27060 0.58009E-01 4.6649
freepoor -0.61759E-01 0.13676 -0.45159
freerepa 0.29172 0.69172E-01 4.2174
illness 0.20914 0.13260E-01 15.772
actdays 0.34688E-01 0.49475E-02 7.0112
hscore 0.21604E-01 0.81424E-02 2.6533
chcond1 0.77394 0.50771E-01 15.244
96 Multivariate Generalized Linear Mixed Models Using R
chcond2 1.0245 0.62314E-01 16.440
scale 0.52753 0.27207E-01 19.389
Even with all these explanatory variables included in the model, there
is still a highly significant amount of unexplained between-respondent
variation in the total number of prescribed medications used in the past
two days, as indicated by the scale parameter estimate of 0.52753 (s.e.
0.027207).
The parameter estimates of the explanatory variables in the mixed
model differ slightly from those of the standard model, which are pre-
sented below:
Log likelihood = -5530.7669
on 5177 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
__________________________________________________________
cons -2.7412 0.12921 -21.215
sex 0.48377 0.36639E-01 13.204
age 2.6497 0.61491 4.3091
agesq -0.88778 0.64292 -1.3808
income -0.44661E-02 0.55766E-01 -0.80086E-01
levyplus 0.28274 0.52278E-01 5.4083
freepoor -0.45680E-01 0.12414 -0.36798
freerepa 0.29584 0.59667E-01 4.9583
illness 0.20112 0.10530E-01 19.100
actdays 0.29261E-01 0.36746E-02 7.9629
hscore 0.20103E-01 0.63664E-02 3.1577
chcond1 0.77565 0.46130E-01 16.814
chcond2 1.0107 0.53895E-01 18.754
If the mixed model were the true model, then asymptotically both the
standard model and the mixed model estimates would tend to the same
limit. As expected, the standard errors for the mixed model estimates
are larger than those of the standard model.
In this analysis, we only have one response for each respondent. We
do not need multiple responses per individual in order to identify the
extra variation in Poisson counts. However, having multiple responses
for each subject would allow two ways of identifying the extra variation:
(i) from the extra variation in each one of a respondent’s responses, and
(ii) from the correlation between the different responses of each subject.
Mixed models for count data 97
8.5 Exercises using mixed models for count data
Exercise 8.5.1. Headaches
In Exercise 3.4.8, standard Poisson models were fitted to data
(headache2.tab) on the number of headaches. In the current exercise,
you are asked to extend this analysis by fitting mixed Poisson models:
1. Use SabreR to fit a null mixed Poisson model to the number of
headaches (y), with lt=log(days) as the offset, subject-specific
(id) random effect and a log link. Obtain the log likelihood, pa-
rameter estimates and standard errors. Interpret the parameter
estimates. Is the id random effect significant? How many adaptive
quadrature points should we use to estimate this model?
2. Add the treatment indicator aspartame to the model. Obtain the
log likelihood, parameter estimates and standard errors. Compare
the deviances (−2 times log likelihoods) and inspect the Z-scores.
Is the random effect significant? Is there a significant treatment
effect?
Exercise 8.5.2. Epileptic seizures
In Exercise 3.4.9, data (epilep.tab) on the number of epileptic
seizures were analyzed using standard Poisson models. In the current
exercise, you are asked to fit a series of mixed Poisson models to these
data:
1. Use SabreR to fit a null mixed Poisson model to the number of
epileptic seizures (y), with patient identifier (subj) as the random
effect. Use adaptive quadrature with 12 mass points. Obtain the
log likelihood, parameter estimates and standard errors. Interpret
the parameter estimates. Are the patient-specific random effects
significant?
2. Add the terms lbas, treat, lbas.trt, lage and visit to the
model. Obtain the log likelihood, parameter estimates and stan-
dard errors. Compare the deviances (−2 times log likelihoods)
and inspect the Z-scores. How does the magnitude of the patient-
specific random effect change? Are any of the explanatory variables
significant? Do the results make sense intuitively?
98 Multivariate Generalized Linear Mixed Models Using R
3. Replace the variable visit with v4 (an indicator variable for the
fourth visit) and re-fit the model. Which model would you prefer?
Interpret your results.
4. Can your preferred model be simplified?
5. Are there any other interaction effects that you would consider
adding to this model? If so, why?
Exercise 8.5.3. Skin cancer deaths
In Exercise 3.4.10, standard Poisson models were fitted to data
(deaths.tab) on the number of male deaths due to malignant melanoma.
In the current exercise, you are asked to extend this analysis by fitting
mixed Poisson models.
1. Use SabreR to fit a null mixed Poisson model to the number of
male deaths (deaths), with region as the random effect. Use log
expected deaths as an offset. Use accurate arithmetic and adap-
tive quadrature with 12 mass points. Obtain the log likelihood,
parameter estimates and standard errors. Interpret the parameter
estimates. Is the region-specific random effect significant?
2. Add the covariate uvb to the model. Obtain the log likelihood,
parameter estimates and standard errors. Compare the deviances
(−2 times log likelihoods) and inspect the Z-scores. Is the region-
specific random effect significant? Is there a significant uvb effect?
9
Family of two-level generalized linear
models
9.1 Introduction
The main mixed models we have discussed so far, namely linear, binary
response (generalizable to ordered response models) and Poisson models,
are special cases of the generalized linear mixed model (GLMM). It will
help us when considering extensions of these models to three levels, and
to multivariate responses, if we can start to write each of the models
using GLMM notation. In GLMMs, the explanatory variables and the
random effects (for a two-level model, these are xij , z j and u0j ) affect
the response (for a two-level model, this is yij ) via the linear predictor
(θij ), where:
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + u0j .
p=1 q=1
The GLMM is obtained by specifying some function of the response (yij )
conditional on the linear predictor and other parameters, i.e.
g (yij | θij , φ) = exp {[yij θij − b (θij )] /φ + c (yij , φ)} ,
where φ is the scale
parameter, b (θij ) is a function that gives the con-
ditional mean µij and variance of yij , namely:
′
E [yij | θ ij , φ] = µij = b (θij ) ,
′′
V ar [yij | θ ij , φ] = φb (θ ij ) .
In GLMMs, the mean and variance are related so that:
′′
′−1
V ar [yij | θij , φ] = φb b (θij ) = φV µij .
′−1
V µij is called the variance function. The function b (θij ) which
′
expresses θij as a function of µij is called the link function, and b (θij )
is the inverse link function. Both b (θ ij ) and c (yij , φ) differ for different
GLMMs.
99
100 Multivariate Generalized Linear Mixed Models Using R
9.2 The mixed linear model
If we rewrite the mixed linear model from Chapter 5 as:
g (yij |xij , zj , u0j ) = g (yij | θij , φ)
2 !
1 yij − µij
= √ exp − ,
2πσ ε 2σ 2ε
then we can write:
( ! !)
1 µ2ij ln (2πσ ε ) 2
yij
g (yij | θij , φ) = exp yij µij − + − 2 ,
2σ 2ε 2 2 2σ ε
so that:
θij = µij ,
φ = σ 2ε ,
θ2ij
b (θij ) = ,
2
2
ln (2πσ ε ) yij
c (yij , φ) = − 2.
2 2σ ε
The mean µij and variance functions are:
µij = θij ,
V µij = 1.
Note that, in the mixed linear model, the mean and variance are not
related as:
φV µij = σ 2ε .
Also, the link function is the identity as θij = µij . We define this model
by Gaussian error distribution g and identity link function i.
9.3 The mixed binary response model
If we rewrite the mixed binary response model from Chapter 6 as
g (yij |xij , zj , u0j ) = g (yij | θij , φ)
y 1−yij
= µijij 1 − µij ,
Family of two-level generalized linear models 101
then we can write:
g (yij | θij , φ) = exp yij ln µij + (1 − yij ) ln(1 − µij )
µij
= exp yij ln + ln(1 − µij ) ,
(1 − µij )
so that:
µij
θ ij = ln ,
(1 − µij )
φ = 1,
b (θij ) = ln(1 − µij ),
c (yij , φ) = 0.
The mean µij and variance functions are:
exp (θij )
µij = ,
1 + exp (θij )
exp (θij )
V µij = 2.
{1 + exp (θij )}
Note that, in the mixed binary response model, the mean and variance
are related as:
φV µij = µij 1 − µij .
µij
Also θij = ln 1−µ , and the logit model (logit link) has:
ij
exp (θ ij )
µij = .
1 + exp (θij )
The probit model (probit link) has µij = Φ (θij ), or Φ−1 µij =
θ ij , where Φ (.) is the standard normal cumulative distribution func-
tion. The complementary
log log model (cloglog link) has θij =
log − log 1 − µij , or µij = 1 − exp (− exp θij ). We define the mixed
binary response model with binomial error distribution b and either logit,
probit or cloglog link.
102 Multivariate Generalized Linear Mixed Models Using R
9.4 The mixed Poisson model
If we rewrite the mixed Poisson model from Chapter 8 as:
g (yij |xij , zj , u0j ) = g (yij | θ ij , φ)
y
exp(−µij )µijij
= ,
yij !
then we can write:
g (yij | θij , φ) = exp yij ln µij − µij − log yij !) ,
so that:
θij = ln µij ,
φ = 1,
b (θij ) = µij = exp θij ,
c (yij , φ) = − log yij !.
The mean µij and variance functions are:
µij = exp (θ ij ) ,
V µij = µij .
Note that, in the mixed Poisson model, the mean and variance are related
as
φV µij = µij .
The link function is the log link as θij = ln µij . We define the mixed
Poisson model with Poisson error distribution p and either logit g, probit
p or cloglog c link.
9.5 Likelihood
We can now write the two-level GLMM likelihood for the mixed linear,
mixed binary response and mixed Poisson models in a general form, i.e.
Z
2
Y Y
L γ, φ, σ u0 |y, x, z = g (yij | θij , φ) f (u0j ) du0j ,
j i
Family of two-level generalized linear models 103
where:
g (yij | θij , φ) = exp {[yij θij − b (θij )] /φ + c (yij , φ)} ,
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
In summary, for the mixed linear model, we have identity link func-
tion and Gaussian (normal) error distribution; for the mixed binary re-
sponse model, we have logit, probit and cloglog link functions, and bi-
nomial error distribution; and for the mixed Poisson model, we have log
link function and Poisson error distribution. For further discussion on
GLMMs, see [3, 4].
SabreR evaluates the integral L γ, φ, σ 2u0 |y, x, z for GLMMs using
standard Gaussian quadrature or adaptive Gaussian quadrature (numer-
ical integration). For more details on quadrature methods, see Appendix
A, Section A.3.
This page intentionally left blank
10
Three-level generalized linear models
10.1 Introduction
The extension of the two-level generalized linear model to three and
more levels is reasonably straightforward. In this chapter, we consider
the three-level generalized linear model.
10.2 Three-level random intercept models
In the mathematics achievement example considered previously in Sec-
tions 2.7 and 5.12, students were nested within schools. A more usual
hierarchical structure of educational data tends to be students nested
within classes nested within schools. In this book, we concentrate on
‘simple’ three-level hierarchical data structures. For information on more
complex hierarchical data structures, see [44].
The response variable now needs to acknowledge the additional third
level and is denoted by yijk , referring to, for example, the response of
student i in class j in school k. More generally, one can talk about level-
one unit i in level-two unit j in level-three unit k. The three-level model
for such data with one level-one explanatory variable may be formulated
through the linear predictor. In this simple example, we only use one
level-one covariate xijk , so that:
θ ijk = β 0jk + β 1jk xijk ,
where β 0jk is the intercept in level-two unit j within level-three unit k.
For the intercept, we have the level-two model:
β 0jk = δ 00k + u0jk ,
β 1jk = γ 100 ,
where δ 00k is the average intercept in level-three unit k. For this average
105
106 Multivariate Generalized Linear Mixed Models Using R
intercept, we have the level-three model:
δ 00k = γ 000 + v00k ,
and hence, by substitution, the linear predictor takes the form:
θ ijk = γ 000 + γ 100 xijk + v00k + u0jk .
10.3 Three-level generalized linear models
By using ijk subscripts for various terms of a generalized linear mixed
model (GLMM), and by adding the level-three explanatory variables wk
and the level-two explanatory variables zjk , we obtain the three-level
generalized linear model, where:
g (yijk | θijk , φ) = exp {[yijk θijk − b (θ ijk )] /φ + c (yijk , φ)} ,
P
X Q
X R
X
θijk = γ 000 + γ p00 xpijk + γ 0q0 zqjk + γ 00r wrk + v00k + u0jk .
p=1 q=1 r=1
The conditional mean µijk and variance of yijk become:
′
E [yijk | θijk , φ] = µijk = b (θ ijk ),
′′
V ar [yijk | θijk , φ] = φb (θ ijk ),
and: ′′
′−1
V ar [yijk | θijk , φ] = φb b (θ ijk ) = φV µijk ,
where b (θ ijk ) and c (yijk , φ) differ for different GLMMs.
For GLMMs, we can consider two covariances. The first covariance
is between the linear predictors θijk and θi′ jk of different pupils i and i′
in the same class of a given school. This covariance is:
covar (θijk , θ i′ jk | xijk , zjk , wk ) = σ 2u0 + σ 2v00 .
The second covariance is between the linear predictors θijk and θ i′ j ′ k of
different pupils i and i′ in different classes j and j ′ in the same school.
This covariance is:
covar (θijk , θi′ j ′ k | xijk , zjk , wk ) = σ 2v00 .
It follows from this result that the covariance between linear predictors
of different pupils in the same class in a given school is higher than that
of different pupils in different classes of a given school.
Three-level generalized linear models 107
10.4 Linear models
For the linear regression model:
yijk = θijk + εijk ,
there are three residuals, as there is variability on three levels. Their
variances are denoted by:
var(εijk ) = σ 2ε , var(u0jk ) = σ 2u0 , var(v00k ) = σ 2v00 .
The total variance between all level-one units now equals σ 2ε +σ 2u0 +σ 2v00 ,
and the total variance of the level-two units is σ 2u0 + σ 2v00 .
There are several kinds of intraclass correlation coefficient in a three-level
model:
• Proportion of the total variance from level one:
σ 2ε
;
σ 2ε + σ 2u + σ 2v
0 00
• Proportion of the total variance from level two:
σ 2u0
;
σ 2ε + σ 2u + σ 2v00
0
• Proportion of the total variance from level three:
σ 2v00
;
σ 2ε + σ 2u0 + σ 2v00
• Proportion of the total variance from levels one and two:
σ 2ε + σ 2u0
.
σ 2ε + σ 2u + σ 2v00
0
The correlation between different level-one units (pupils) of a given level-
two unit (class) and level-three unit (school) is:
σ 2u0 + σ 2v00
cor (yijk , yi′ jk | x, z, w) = ,
σ 2ε + σ 2u0 + σ 2v00
108 Multivariate Generalized Linear Mixed Models Using R
and the correlation between different level-one units of different level-two
units for a given level-three unit is:
σ 2v00
cor (yijk , yi′ j ′ k | x, z, w) = ,
σ 2ε + σ 2u0 + σ 2v00
such that cor (yijk , yi′ jk | x, z, w) > cor (yijk , yi′ j ′ k | x, z, w) , i 6= i′ , j 6=
j′.
10.5 Binary response models
Discussion of the binary response model focuses on correlations between
∗
the different latent responses, for example, yijk , yi∗′ jk and yijk
∗
, yi∗′ j ′ k , i 6=
′ ′
i , j 6= j , where:
∗
yijk = θijk + εijk .
For the probit model, these correlations are:
∗
σ 2u0 + σ 2v
cor yijk , yi∗′ jk | x, z, w = 00
,
σ 2u0 + σ 2v00 + 1
∗
σ 2v00
cor yijk , yi∗′ j ′ k | x, z, w = ,
σ 2u0 + σ 2v00 + 1
as var(εijk ) = 1.
π2
For the logit model, var(εijk ) = 3 and we replace the 1 in the denomi-
2
nators by π3 .
10.6 Likelihood
The three-level GLMM likelihood takes the form:
L γ, φ, σ 2u0 , σ 2v00 |y, x, z, w
+∞ +∞
Y Z Z YY
= g (yijk | θijk , φ) f (u0jk ) f (v00k ) du0jk dv00k ,
k −∞ −∞ j i
Three-level generalized linear models 109
where:
g (yijk | θijk , φ) = exp {[yijk θijk − b (θ ijk )] /φ + c (yijk , φ)} ,
P
X Q
X R
X
θijk = γ 000 + γ p00 xpijk + γ 0q0 zqjk + γ 00r wrk + v00k + u0jk ,
p=1 q=1 r=1
and: !
1 u20jk
f (u0jk ) = √ exp − 2 ,
2πσ u0 2σ u0
2
1 v00k
f (v00k ) = √ exp − 2 .
2πσ v00 2σv00
For the linear model, we have identity link function and Gaussian
(normal) error distribution; for the binary model, we have logit, probit
and cloglog link functions, and binomial error distribution; for the Pois-
son model, we have log link function and Poisson error distribution. For
further discussion on three-level models, see [43, 90, 93].
SabreR evaluates the integral L γ, φ, σ 2u0 , σ 2v00 |y, x, z, w for the
three-level GLMM using standard Gaussian quadrature or adaptive
Gaussian quadrature (numerical integration). For more details on these
quadrature procedures, see Appendix A, Section A.3.
10.7 Example using three-level generalized linear
models
Example 10.7.1. Linear model of log wages
In Exercises 2.11.6 and 5.16.6 respectively, you were asked to use
SabreR to fit one-level and two-level linear models of responses (scores)
given by 856 pupils on six questions relating to their school man-
agers/directors. Pupil-specific explanatory variables included pupil gen-
der (pupsex), which was coded 1 for females and 2 for males. A level-two
random effect (id) was used to take into account residual heterogeneity
between pupils. The SabreR output relating to the one-level and two-
level models is presented below:
Log likelihood = -7758.0889
on 4975 residual degrees of freedom
110 Multivariate Generalized Linear Mixed Models Using R
Parameter Estimate Std. Err. Z-score
_________________________________________________________
cons 2.1708 0.70508E-01 30.788
dirsex 0.91255E-01 0.32600E-01 2.7992
fschtype ( 1) 0.0000 ALIASED [I]
fschtype ( 2) 0.37444 0.38193E-01 9.8038
fschtype ( 3) 0.15259 0.43772E-01 3.4861
pupsex -0.21601E-01 0.33829E-01 -0.63852
sigma 1.1492
Log likelihood = -7272.8266
on 4974 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_________________________________________________________
cons 2.1638 0.11778 18.371
dirsex 0.10048 0.54458E-01 1.8452
fschtype ( 1) 0.0000 ALIASED [I]
fschtype ( 2) 0.39401 0.63790E-01 6.1766
fschtype ( 3) 0.19282 0.72611E-01 2.6555
pupsex -0.21618E-01 0.56559E-01 -0.38222
sigma 0.91863 0.10132E-01 90.665
scale 0.69752 0.22281E-01 31.306
We also have a school identifier (school) available in the dataset
(manager.tab). In the current example, we extend the previous anal-
yses by treating school as the third level of variation. We wish to ex-
plain the variation in item response between the 94 schools by us-
ing two school-specific explanatory variables: gender of school man-
ager/director (dirsex), coded 1 for females and 2 for males, and school
type (schtype). School type is coded as follows: 1: general (AVO), 2:
professional (MBO&T), 3: day/evening.
We handle residual heterogeneity between schools by incorporating
a level-three random effect (school) into the modelling framework. We
use SabreR to fit this three-level model. Adaptive quadrature with 24
mass points is used for both levels two and three. The SabreR command
used to fit this model is:
sabre.model.1 <- sabre(scores~dirsex+factor(schtype)+
pupsex+1,case=list(id,school),first.family=
"gaussian",first.mass=24,second.mass=24,
adaptive.quad=TRUE)
Three-level generalized linear models 111
This command results in the following output:
Log likelihood = -7223.1596
on 4973 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_________________________________________________________
cons 2.2429 0.16818 13.336
dirsex 0.10251 0.92085E-01 1.1132
fschtype ( 1) 0.0000 ALIASED [I]
fschtype ( 2) 0.39067 0.10834 3.6060
fschtype ( 3) 0.19933 0.12026 1.6576
pupsex -0.77852E-01 0.53255E-01 -1.4619
sigma 0.91881 0.10137E-01 90.641
scale2 0.58396 0.21798E-01 26.789
scale3 0.38029 0.38309E-01 9.9270
Are both these random effects significant? Is the three-level model a
significant improvement over the one-level and the two-level models?
The log likelihood of the homogeneous (one-level) model is
−7758.0889, and the log likelihood of the three-level random effects
model is −7223.1596. The change in deviance is −2(−7758.0889 +
7223.1596) = 1069.9. The sampling distribution of this test statistic is
not chi-squared with two degrees of freedom. The null hypothesis is that
scale2 and scale3 have the value 0. They can only take values >0 un-
der the alternative hypothesis. The correct p value for this test statistic
is obtained by dividing the naive p value of 1069.9 on two degrees of
freedom by 2, and so it is clearly significant, suggesting that the scores
from pupils to six different questions within the same school are highly
correlated. The higher correlation occurs between scores of the same
pupil than between scores of different pupils in the same school, as
scale2 is greater than scale3.
The log likelihood of the two-level model is −7272.8266, and log likeli-
hood of the three-level model is −7223.1596. The change in log likelihood
is −2(−7272.8266 + 7223.1596) = 99.334. The sampling distribution of
this test statistic is not chi-squared with one degree of freedom. The null
hypothesis is that scale3 has the value 0. It can only take values >0
under the alternative hypothesis. The correct p value for this test statis-
tic is obtained by dividing the naive p value of 99.334 on one degree of
freedom by 2, and so it is clearly significant.
Which explanatory variables have a significant effect on the scores?
How do the results change when allowing for pupil-level (level-two) ran-
dom effects and then school-level (level-three) random effects?
112 Multivariate Generalized Linear Mixed Models Using R
The significant terms in the one-level and two-level models are
fschtype(2) and fschtype(3), but only fschtype(2) remains signifi-
cant in the three-level model. The main change that occurs when mov-
ing from the one-level model to the two-level model is that the standard
errors of the covariates become noticeably larger. The standard errors
tend to become larger again as we move from the two-level model to the
three-level model.
10.8 Exercises using three-level generalized linear
mixed models
Exercise 10.8.1. Binary response model of Tower of London test
performance
In Exercises 3.4.3 and 6.7.3 respectively, you were asked to use
SabreR to fit one-level and two-level logit models of the binary response
dtlm, which takes the value 1 if each Tower of London test was completed
in the minimum number of moves, and takes the value 0 otherwise. There
are three groups of participants: (i) subjects with schizophrenia (coded
3); (ii) subjects’ relatives (coded 2); (iii) control participants (coded
1). Create indicator variables for group=2 (relatives) and for group=3
(schizophrenics). The test was repeated at three different levels of diffi-
culty. Treat level as a continuous covariate.
A level-two random effect (id) was used in Exercise 6.7.3 to take
into account residual heterogeneity between participants. The data have
a three-level structure: occasion i for participant j in family k. We have
a family identifier (famnum) available in the dataset (tower1.tab). In
the current exercise, we extend the previous analyses by treating family
as the third level of variation. We handle residual heterogeneity between
families by incorporating a level-three random effect (famnum) into the
modelling framework:
1. Use SabreR to fit this three-level model. Allow for both the level-
two participant-specific random effect (id) and the level-three
family-specific random effect (famnum). Use adaptive quadrature
with 12 mass points for both levels two and three. Are both these
random effects significant? Is this model a significant improvement
over the models estimated in Exercises 3.4.3 and 6.7.3 ?
Three-level generalized linear models 113
2. How do the effects for group=2 (relatives) and for group=3
(schizophrenics) change when you allow for the participant-specific
and the family-specific random effects?
Exercise 10.8.2. Binary response model of immunization of Guatemalan
children
In Exercises 3.4.4 and 6.7.4 respectively, you were asked to use
SabreR to fit one-level and two-level logit models of the binary re-
sponse immun, which takes the value 1 if a child was immunized, and
takes the value 0 otherwise for child i in family j within community k.
The child-specific explanatory variables are age (kid2p) and birth or-
der (order23, order46 and order7p). The family-specific factors are
mom25p, indnospa, indspa, momedpri, momedsec, husedpri, husedsec,
huseddk, momwork, rural and pcind81.
A level-two random effect (mom) was used in Exercise 6.7.4
to take into account residual heterogeneity between families. We
have a community identifier (cluster) available in the dataset
(guatemala immun.tab). In the current exercise, we extend the previ-
ous analyses by treating community as the third level of variation. We
handle residual heterogeneity between communities by incorporating a
level-three random effect (cluster) into the modelling framework:
1. Use SabreR to fit this three-level model. Allow for both the
level-two family-specific random effect (mom) and the level-three
community-specific random effect (cluster). Use adaptive quadra-
ture with 32 mass points for both levels two and three. Are both
these random effects significant? Is this model a significant im-
provement over the models estimated in Exercises 3.4.4 and 6.7.4 ?
2. How do the effects of the explanatory variables change when
you allow for the family-specific and community-specific random
effects?
Exercise 10.8.3. Poisson model of skin cancer deaths
In Exercises 3.4.10 and 8.5.3 respectively, you used SabreR to fit
one-level and two-level Poisson models to the number of male malignant
melanoma deaths (deaths). The explanatory variable of primary interest
is uvb, a measure of the UVB dose reaching the earth’s surface.
A level-two random effect (region) was used in Exercise 8.5.3 to
take into account residual heterogeneity between regions. The data have
114 Multivariate Generalized Linear Mixed Models Using R
a three-level structure: county i within region j in nation k. We have a
nation identifier (nation) available in the dataset (deaths.tab). In the
current exercise, we extend the previous analyses by treating nation as
the third level of variation. We handle residual heterogeneity between
nations by incorporating a level-three random effect (nation) into the
modelling framework:
1. Use SabreR to fit this three-level model. Allow for both the level-
two random effect (region) and the level-three random effect
(nation). Use accurate arithmetic, and adaptive quadrature with
96 mass points for both levels. Are both these random effects sig-
nificant? Is this model a significant improvement over the models
estimated in Exercises 3.4.10 and 8.5.3 ?
2. How does the uvb effect change when you allow for the region-
specific and nation-specific random effects?
11
Models for multivariate data
11.1 Introduction
Thus far in this book, we have considered models for univariate data;
that is, models which have allowed us to relate a single response to a
set of one or more explanatory variables. In the rest of this book, we
will assume that the response is multivariate in nature; in other words,
there is more than one response process being observed simultaneously.
In this book, attention will be concentrated on the modelling of data
comprising two and three response processes, known as bivariate and
trivariate data respectively.
The multiple responses may be of the same type. In this chapter,
we highlight two examples of such bivariate data. In the first example,
Cameron and Trivedi [22] examined various measures of demand for
health care. These measures included two count variables: the number
of consultations with a doctor or specialist in the past two weeks, and
the number of prescribed medications used in the past two days. These
responses can be regarded as bivariate count data to be related to a
variety of explanatory variables, including type of health insurance. We
will return to this example in Section 11.3.
The second example of bivariate data, which comprises two responses
of the same type, involves attitudes towards gender roles. The data are
taken from the British Household Panel Survey (BHPS) [101]. Respon-
dents were asked to rate a number of Likert items relating to gender
roles. Berridge, Penn and Ganjali [87] selected one of these items: ‘the
husband should earn, the wife should stay at home’ and examined re-
sponses to this item from BHPS waves in 1991 and 2003. They treated
the responses as ordinal categorical data, and used marginal and condi-
tional ordered logit models [76] to relate the ordinal response to a set of
personal and socio-economic characteristics of the respondents.
In this book, we extend those analyses. We select two different items
and four different waves of the BHPS. We treat the responses to those
two items across those four waves as bivariate repeated ordered data. In
Section 11.4, we will fit a series of increasingly complex models, culmi-
115
116 Multivariate Generalized Linear Mixed Models Using R
nating in a bivariate mixed ordered logit model with correlated random
effects.
A further example of bivariate data could be the wages and trade
union membership of an individual over successive years. In this case,
the two responses are of different types: wages is a continuous/interval
scale variable, while trade union membership is binary in nature. We will
return to this example in Section 11.5. This example can be extended to
provide an illustration of trivariate data. We may wish to explore how
the relationship between wages, training and promotion varies over time.
A more complex example of bivariate data arises in the modelling
of the sequence of months of job vacancies, which last until they are
either filled successfully or withdrawn from the market. The two binary
responses: a job is filled and a job is withdrawn, can be considered as
competing risks: for any given vacancy, either of the two responses is
possible, but both outcomes cannot occur simultaneously. We will return
to this example in the context of modelling event history data in the next
chapter.
Any bivariate and trivariate models we develop should permit us to
assess the extent to which the multiple responses are associated with
each other, as well as allowing us to examine the degree to which each
response is correlated with a set of explanatory variables. Furthermore,
the association between responses should be separated into two compo-
nents: first, the direct effects of the responses on each other, and second,
the indirect effects of the correlation between random effects.
In the current chapter, we propose a joint model of simultaneous
responses which will allow us to disentangle the direct effects of the
different responses on each other from any correlation that may occur
in the random effects. Without a multivariate generalized linear mixed
model, for complex social processes such as those outlined above, we
risk making inferential errors. We illustrate the application of this model
through the health care, gender roles and wage/trade union membership
examples mentioned previously.
11.2 Multivariate two-level generalized linear model
We introduce the superscript r to enable us to distinguish the differ-
ent models, explanatory variables and random effects of a multivariate
response, in particular a bivariate response. Model characteristics associ-
ated with the first and second responses will be indexed by r=1 and r=2
respectively. In the healthcare example, the first response is assumed to
Models for multivariate data 117
1
be the number of consultations with a doctor or specialist yij ,r = 1
made by individual j in the past two weeks. The second response is
2
taken to be the number of prescribed medications yij , r = 2 used by
individual j in the past two days. There may be repeated observations
on each individual; in other words, i > 1, as in the gender roles and the
wages/trade union membership examples. In the case of the healthcare
data, we only have a single pair of responses on each individual, so i =
1.
In general terms, the multivariate two-level generalized linear model
is obtained from the univariate two-level generalized linear model (see
r
Chapters 5 to 9) by specifying the probability of the response yij
conditional on the linear predictor and other parameters for each of R
responses (r = 1, 2, · · · , R):
r r
g r yij
r
| θrij , φr = exp yij θij − br θrij /φr + cr yijr
, φr ,
where φr is the scale parameter, br θrij is a function that gives the
conditional mean µrij and variance of yij r
, namely:
r ′
E yij | θrij , φr = µrij = br θrij ,
r ′′
V ar yij | θrij , φr = φr br θ rij ,
and the linear predictor θ rij is given by:
P
X Q
X
θ rij = γ r00 + γ rp0 xpij + γ r0q zqj + ur0j , r = 1, 2, · · · , R.
p=1 q=1
Both br θrij and cr yij r
, φr differ for different members of the ex-
ponential family and can be different for different r, r = 1, 2, · · · , R. We
apply these models to the health care, gender roles and wage/trade union
membership examples in the following three sections.
11.3 Bivariate Poisson model: example
Example 11.3.1. Demand for health care
Cameron and Trivedi [22] used various forms of the overdispersed
Poisson model to study the relationship between type of health insurance
and various measures of demand for health care such as the number of
118 Multivariate Generalized Linear Mixed Models Using R
consultations with a doctor or specialist and the number of prescriptions.
The dataset they used in this analysis is from the Australian Health
survey for 1977–1978. In later work, Cameron and Trivedi [22] estimated
a bivariate Poisson model for the two measures of demand for health care
mentioned previously.
We use a version of the Cameron and Trivedi [22] dataset (visit-
prescribe.tab) for the bivariate model. In this example, we only have
one pair of responses r (dvisits, prescrib) for each sampled individual:
the number of consultations with a doctor or specialist in the last two
weeks (dvisits) and the number of prescribed medications taken in the
last two days (prescrib). A copy of the original dataset can be obtained
from the web [23].
The primary explanatory variable of interest is type of health in-
surance which is classified into four categories and is represented by
three indicator variables: levyplus, freepoor and freerepa. Respon-
dents may be covered by a private health insurance fund for private
patients in a public hospital (with their doctor of choice) (levyplus).
Respondents may be covered by the government because they are on
a low income, are a recent immigrant or are unemployed (freepoor).
Respondents may be covered free of charge by the government because
they have an old-age or disability pension, or because they are an invalid
veteran or a member of a family of a deceased veteran (freerepa).
Secondary explanatory variables include annual income and demo-
graphics such as gender and age. Explanatory variables used as indica-
tors of a respondent’s recent state of health are the number of illnesses
in the past two weeks (illness), the number of days of reduced activity
in the past two weeks due to illness or injury (actdays), the respon-
dent’s general health questionnaire score using Goldberg’s method in
which a high score indicates poor health (hscore), whether the respon-
dent has chronic condition(s) but is not limited in activity (chcond1),
and whether the respondent has chronic condition(s) and is limited in
activity (chcond2). Further details about all the variables included in
visit-prescribe.tab are available on the web [23].
Like Cameron and Trivedi, we take dvisits and prescrib to be
count variables and model them using a bivariate Poisson model with a
random intercept and the set of explanatory variables outlined above.
We crosstabulate dvisits by prescrib in Table 11.1.
Does Table 11.1 provide evidence of an association between dvisits
and prescrib? In order to answer this question, we propose a bivariate
Poisson model in order to test formally whether there is a significant as-
sociation between the two responses dvisits and prescrib, having con-
trolled for the set of explanatory variables listed previously. The linear
Models for multivariate data 119
prescrib
dvisits 0 1 2 3 4 5 6 7 8
0 2789 726 307 171 76 32 16 15 9
1 224 212 148 85 50 35 13 5 9
2 49 34 38 11 23 7 5 3 4
3 8 10 6 2 1 1 2 0 0
4 8 8 2 2 3 1 0 0 0
5 3 3 2 0 1 0 0 0 0
6 2 0 1 3 1 2 1 0 2
7 1 0 3 2 1 2 1 0 2
8 1 1 1 0 1 0 1 0 0
9 0 0 0 0 0 0 0 0 1
TABLE 11.1
Crosstabulation of dvisits by prescrib
predictor of the bivariate Poisson model takes the form:
r r
P
X Q
X
θrij = γ r00 + γ rp0 xrpij + γ r0q zqj
r
+ ur0j .
p=1 q=1
The parameters of this model are γ = γ 1 , γ 2 , where γ r represents
1
the parameters of the linear predictors,
1 2 plus the two variances σ u0 and
2
σ u0 of the random intercepts u0j , u0j and their correlation, which is
denoted by ρ12 . In the health care example, i = 1 for both responses as
we only observe one dvisits response and one prescrib response for
each individual, in which case the σ 1u0 , σ 2u0 and ρ12 can be identified.
These three parameters are not always identifiable, as we shall see in
Section 11.5.
The SabreR command required to fit the bivariate Poisson model is:
sabre.model <- sabre(dvisits~sex+age+agesq+
income+levyplus+freepoor+freerepa+illness+
actdays+hscore+chcond1+chcond2+1,
prescrib~sex+age+agesq+income+levyplus+
freepoor+freerepa+illness+actdays+hscore+
chcond1+chcond2+1,case=id,
first.family="poisson",
second.family="poisson")
This command produces the following output:
120 Multivariate Generalized Linear Mixed Models Using R
Log likelihood = -8551.2209
on 10351 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
________________________________________________________
r1 -2.6694 0.24673 -10.819
r1_sex 0.27506 0.73571E-01 3.7387
r1_age -0.96132 1.3337 -0.72079
r1_agesq 1.4568 1.4522 1.0032
r1_income -0.11897 0.11257 -1.0568
r1_levyplus 0.15202 0.89966E-01 1.6898
r1_freepoor -0.62151 0.23768 -2.6149
r1_freerepa 0.17419 0.12109 1.4385
r1_illness 0.22347 0.25097E-01 8.9045
r1_actdays 0.13872 0.81816E-02 16.955
r1_hscore 0.39132E-01 0.14129E-01 2.7697
r1_chcond1 0.15663 0.83179E-01 1.8830
r1_chcond2 0.26404 0.10820 2.4402
r2 -2.9069 0.15064 -19.297
r2_sex 0.57019 0.43558E-01 13.090
r2_age 2.0381 0.74431 2.7382
r2_agesq -0.19637 0.79300 -0.24764
r2_income 0.32556E-01 0.65766E-01 0.49502
r2_levyplus 0.27330 0.58470E-01 4.6742
r2_freepoor -0.91061E-01 0.13849 -0.65754
r2_freerepa 0.29736 0.69972E-01 4.2497
r2_illness 0.21674 0.13479E-01 16.080
r2_actdays 0.40222E-01 0.50644E-02 7.9421
r2_hscore 0.21171E-01 0.81907E-02 2.5848
r2_chcond1 0.77259 0.51285E-01 15.065
r2_chcond2 1.0204 0.63007E-01 16.195
scale1 0.99674 0.43107E-01 23.123
scale2 0.56067 0.26892E-01 20.849
corr 0.83217 0.52118E-01 15.967
These results show that there is significant overdispersion in both
the responses. The first response, dvisits, has scale1 parameter equal
to 0.99674, with corresponding standard error (s.e.) of 0.043107. The
second response, prescrib, has scale2 parameter equal to 0.56067 (s.e.
0.026892). The two responses are highly positively correlated, with the
corr parameter equal to 0.83217 (s.e. 0.052118). Had we not been in-
terested in estimating the correlation between responses, we could have
performed separate analyses of each response. This would be a legitimate
Models for multivariate data 121
strategy to adopt in this case, as there are no simultaneous direct effects,
for example, dvisits on prescrib, to incorporate into this model. The
assumption of no simultaneous direct effects also applies to the gender
roles example to be discussed in the next section.
In Section 11.5, we return to the wages/trade union membership ex-
ample and extend the modelling framework in two distinct ways. First,
we will model two responses of two different types: wages will be treated
as continuous/interval scale, while trade union membership will be re-
garded as a binary variable. Second, we will assume a simultaneous direct
effect between responses: we will assume that trade union membership
has a direct effect on wages.
11.4 Bivariate ordered response model: example
Example 11.4.1. Attitudes to gender roles
The analysis in this section is based on four waves of the biennial
‘Living in Britain’ supplementary survey to the British Household Panel
Study (BHPS). The data relate to 1991 (wave 1), 1993 (wave 3) and 1997
(wave 7), and consist of attitudinal responses to two statements concern-
ing gender roles. The outcomes are ordinal categorical and thus lead to
a series of ordered response models. We estimate separate univariate
models for the two survey questions and then a bivariate random effects
model to take account of the correlation between the pairs of responses.
The first statement (opfama) is ‘pre-school child suffers if mother
works’ and the second (opfamf) is ‘husband should earn, wife stay at
home’. Both attitudinal responses are given on the scale: 1 = strongly
agree, 2 = agree, 3 = neither agree nor disagree, 4 = disagree, 5 =
strongly disagree. In both cases, agreement with the statement is taken
to indicate that the respondent holds a traditionalistic rather than an
egalitarian viewpoint in regard to gender roles.
The data are in the form of longitudinal panels, with each individual
observed at wave 1 and then again in subsequent waves up to a maximum
of four repeated observations. The panels are subject to dropout, but
there are no new entrants to the survey after the first wave, although
dropouts themselves may return to the survey at later waves. For the
dataset opfama.tab, there are 9,220 individuals at wave 1 and this figure
reduces due to dropout by 2,118 at wave 3, another 511 at wave 5 and
a further 282 at wave 7, giving 6,309 individuals in wave 7, 5,631 of
122 Multivariate Generalized Linear Mixed Models Using R
whom are observed at all four waves. This results in a total of 29,222
observations (rows) in opfama.tab. There are 29,238 observations on
9,245 individuals in opfamf.tab.
We use a subset of the data with the following variables: pid: cross-
wave person identifier; opfam: response variable (opfama in opfama.tab,
opfamf in opfamf.tab); cbornnuk: country of birth: 1 = non-UK, 0 =
UK; csex: 1 = female, 0 = male; hhch12: whether there are children
under 12 in the household (1) or not (0); wave: BHPS wave; age: age on
1st December; mastat: marital status; qfedhi: highest educational qual-
ification; vote: political party supported; region: region/metropolitan
area; jbft: employment status. Note that not all variables are used in
the model specifications below.
The SabreR command required to fit a univariate two-level ordered
response model to the data opfama.tab is:
sabre.model.opfama<-sabre(A$opfam~factor(A$wave)+
factor(A$age)+factor(A$csex)+factor(A$jbft),
case=list(A$pid),first.mass=16,
first.family="ordered",adaptive.quad=TRUE)
This results in the following output:
Log likelihood = -36851.516
on 29207 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
___________________________________________________
factor(awave)3 0.37563 0.32377E-01 11.602
factor(awave)5 0.50493 0.33475E-01 15.084
factor(awave)7 0.55861 0.34529E-01 16.178
factor(aage)2 -0.45519 0.54786E-01 -8.3086
factor(aage)3 -1.1163 0.63887E-01 -17.473
factor(aage)4 -1.5433 0.71076E-01 -21.713
factor(aage)5 -1.8928 0.73412E-01 -25.784
acsex 1.0302 0.52870E-01 19.486
factor(ajbft)2 -0.22437 0.53907E-01 -4.1621
factor(ajbft)3 -0.48431 0.46795E-01 -10.350
cut1 -3.6507 0.62241E-01 -58.655
cut2 -0.63412 0.56767E-01 -11.171
cut3 1.4665 0.57780E-01 25.381
cut4 4.6017 0.69264E-01 66.437
scale 2.1037 0.27489E-01 76.531
Models for multivariate data 123
The respective set of results for opfamf.tab is:
Log likelihood = -35086.832
on 29220 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
___________________________________________________
factor(fwave)3 0.20734 0.33054E-01 6.2726
factor(fwave)5 0.23744 0.34312E-01 6.9199
factor(fwave)7 0.28587 0.35380E-01 8.0800
factor(fage)2 -0.59426 0.57257E-01 -10.379
factor(fage)3 -1.2173 0.66690E-01 -18.253
factor(fage)4 -1.9452 0.74872E-01 -25.980
factor(fage)5 -2.9841 0.79333E-01 -37.614
acsex 1.0190 0.53950E-01 18.888
factor(fqfedhi)2 -0.35973 0.74736E-01 -4.8133
factor(fqfedhi)3 -0.62491 0.62477E-01 -10.002
factor(fqfedhi)4 -1.4499 0.66998E-01 -21.641
factor(fjbft)2 -0.23605 0.55734E-01 -4.2352
factor(fjbft)3 -0.64694 0.48270E-01 -13.402
cut1 -6.4992 0.82266E-01 -79.001
cut2 -3.9107 0.72972E-01 -53.592
cut3 -1.6567 0.69284E-01 -23.912
cut4 1.6796 0.70050E-01 23.977
scale 2.1187 0.28561E-01 74.180
For both responses, all of the age group regression coefficients are
increasingly negative and highly significant, showing that individuals be-
come more traditionalistic as they get older. Females are more egalitarian
than males. Unemployed people (jbft = 3) are seen to be more tradi-
tionalistic than those who work full time (jbft = 1). For the opfamf
response only, decreasing levels of educational qualifications lead to
strengthening traditionalistic views compared to the baseline group con-
sisting of those respondents with a degree or equivalent.
124 Multivariate Generalized Linear Mixed Models Using R
The random effects models for both responses give significantly bet-
ter model fits than their homogeneous (or standard) counterparts, the
regression coefficients for which are not presented here. There are large
increases in the log likelihoods: from −41018.398 in the homogeneous
model to −36851.516 in the random effects model of opfama; from
−39038.412 to −35086.832 for opfamf. High levels of unobserved het-
erogeneity are accounted for by scale parameters with values of above
2 in each model.
In order to fit the bivariate ordered response model, we construct
a dataset opfamaf.tab which comprises 58,460 observations on 9,301
cases. In opfamaf.tab, the variables from the univariate datasets are
augmented by the following: r: a response indicator which takes the
value 1 for opfama and takes value 2 for opfamf; y: the response variable
containing the values of opfama and opfamf; r1: a dummy variable for
r = 1; r2: a dummy variable for r = 2.
The SabreR command needed to fit the bivariate mixed ordered re-
sponse model is:
sabre.model.opfamaf<-sabre(A$opfam~factor(A$wave)+
factor(A$age)+factor(A$csex)+factor(A$jbft),
F$opfam~factor(F$wave)+factor(F$age)+
factor(F$csex)+factor(F$qfedhi)+factor(F$jbft),
case=list(A$pid,F$pid),first.mass=6,
second.mass=6,first.family="ordered",
second.family="ordered",adaptive.quad=TRUE)
This results in the following output:
Log likelihood = -70608.220
on 58426 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_____________________________________________________
factor(awave)3.1 0.35648 0.33515E-01 10.637
factor(awave)5.1 0.48041 0.34283E-01 14.013
factor(awave)7.1 0.53046 0.33930E-01 15.634
factor(aage)2.1 -0.38393 0.50949E-01 -7.5356
factor(aage)3.1 -1.0592 0.59363E-01 -17.843
factor(aage)4.1 -1.5009 0.67753E-01 -22.152
factor(aage)5.1 -1.7976 0.73179E-01 -24.565
acsex.1 1.0058 0.52864E-01 19.026
factor(ajbft)2.1 -0.17462 0.49885E-01 -3.5005
Models for multivariate data 125
factor(ajbft)3.1 -0.38681 0.42581E-01 -9.0841
factor(fwave)3.2 0.20874 0.34141E-01 6.1142
factor(fwave)5.2 0.24250 0.34633E-01 7.0020
factor(fwave)7.2 0.29099 0.35626E-01 8.1679
factor(fage)2.2 -0.62801 0.52526E-01 -11.956
factor(fage)3.2 -1.1596 0.60483E-01 -19.172
factor(fage)4.2 -1.8826 0.69342E-01 -27.149
factor(fage)5.2 -2.9566 0.74006E-01 -39.951
fcsex.2 0.99741 0.53426E-01 18.669
factor(fqfedhi)2.2 -0.32024 0.66065E-01 -4.8474
factor(fqfedhi)3.2 -0.56950 0.54667E-01 -10.418
factor(fqfedhi)4.2 -1.3623 0.58145E-01 -23.429
factor(fjbft)2.2 -0.22288 0.52202E-01 -4.2695
factor(fjbft)3.2 -0.60084 0.44188E-01 -13.597
cut1_1 -3.5610 0.60733E-01 -58.633
cut1_2 -0.55163 0.56872E-01 -9.6995
cut1_3 1.5464 0.57702E-01 26.800
cut1_4 4.6976 0.65424E-01 71.803
cut2_1 -6.4174 0.72106E-01 -89.001
cut2_2 -3.8298 0.66726E-01 -57.397
cut2_3 -1.5766 0.64420E-01 -24.473
cut2_4 1.7582 0.64680E-01 27.183
scale1 2.1143 0.24988E-01 84.611
scale2 2.1249 0.25088E-01 84.699
corr 0.67196 0.84997E-02 79.057
This bivariate random effects ordered response model is an amalga-
mation of the two separate univariate models for the opfama and opfamf
responses. This model also handles any potential association that exists
between the pair of outcomes by allowing the individual random effects
to be correlated. The log likelihood for this correlated joint model is sig-
nificantly larger than the combined log likelihood for the two univariate
models: −70608.3 compared to (−36851.5) + (−35086.8) = −71938.3.
The parameter estimates on the explanatory variables are consistent
with those from the univariate models, with the same set of individ-
ual characteristics having the strongest effects on a person’s attitude to
gender roles, namely age, gender, employment status and educational
qualifications.
The levels of unobserved heterogeneity in the two response panels are
virtually unchanged, with values of about 2.1 for both scale parameters.
The correlation of 0.67 between the two random effects is positive, large
and highly significant, suggesting that individual responses to the two
statements concerning gender roles are strongly positively correlated.
126 Multivariate Generalized Linear Mixed Models Using R
This is as expected, since someone in agreement/disagreement with the
first statement would more than likely agree/disagree with the second.
A traditionalistic attitude would tend to make a person agree with both
statements, while someone with a more egalitarian viewpoint would be
more inclined to disagree with each of the statements. It is only by fitting
such correlated bivariate models that we are able to quantify the levels
of association between pairs of ordered responses, whilst controlling for
the effects of explanatory variables.
11.5 Bivariate linear-probit model: example
Example 11.5.1. Wages and trader union membership
We now illustrate the application of a bivariate two-level general-
ized linear model with different link functions. The datasets we use here
(nls.tab and nlswage-union.tab ) are versions of the National Lon-
gitudinal Study of Youth (NLSY) data. The data are for young women
who were aged between 14 and 26 in 1968. The women were surveyed
each year from 1970 to 1988, except for the years 1974, 1976, 1979, 1981,
1984 and 1986. We have removed records with missing values on one or
more of the response variables and the explanatory variables we want to
use in our analysis of the joint determinants of wages and trade union
membership.
The response variables are defined as follows: ln wage is ln(wage/GNP
deflator) in a particular year, and union takes value 1 if a woman is a
member of a trade union, and equals 0 otherwise. The explanatory vari-
ables include demographics such as ethnicity (more specifically, whether
a woman is black or not, black), and linear (age) and quadratic (age2)
terms in respondent’s age. Respondent’s marital status is represented by
the variable msp, which takes the value 1 if a woman is married and the
spouse is present, and which equals 0 otherwise. Respondent’s education
is represented by the number of years of schooling completed (grade),
which takes values between 0 and 18. Region in which a respondent lives
is classified into three categories represented by two dummy variables:
whether a woman is living outside a standard metropolitan statistical
area (smsa) (not smsa) or is living in the South (south). Respondent’s
work history is summarized in terms of the number of years of job tenure,
which takes values between 0 and 26.
There are 4,132 women (idcode) with between one year and twelve
years of observations being in waged employment (that is, not in full-
Models for multivariate data 127
time education) and earning more than $1 per hour but less than $700
per hour.
ywij εwij
εuij y*uij yuij
FIGURE 11.1
The relationship between wages and trade union membership: I
Figure 11.1 shows the dependence between trade union membership
u w
(yij ) and wages (yij ). There are no random effects affecting either wages
or trade union membership. The binary response variable, trade union
∗
u
membership, yij = 1, 0, is based on the latent variable yiju . This model
can be estimated using any software that estimates standard generalized
linear models.
εwij
ywij uw0j
εuij
uu0j y*uij yuij
FIGURE 11.2
The relationship between wages and trade union membership: II
Figure 11.2 also shows the dependence between trade union member-
ship and wages. This time, there are random effects affecting both wages
and trade union membership. However, the random effects uuij and uw ij
are independent, with variances σ 2u and σ 2w respectively. This model can
be estimated using any software that estimates multi-level generalized
linear (mixed) models by treating the wage and trade union models as
independent.
Figure 11.3 once more shows the dependence between trade union
membership and wages. This time, there is a correlation ρuw between
the random effects affecting trade union membership and wages, uuij and
uwij respectively. This model can be estimated in SabreR as a bivariate
128 Multivariate Generalized Linear Mixed Models Using R
εwij
ywij uw0j
εuij
uu0j y*uij yuij
FIGURE 11.3
The relationship between wages and trade union membership: III
generalized linear mixed model by allowing for a correlation between
trade union membership and wage response variables at each wave i
of the panel. How do the results change as the model becomes more
comprehensive, especially with regard to the direct effect of trade union
membership on wages?
The character string union is a reserved name in R, and thus it can-
not be used as a variable label. To circumvent this problem, the variable
label union is changed to tunion. We then take the variables ln wage
(with identity link and Gaussian distribution) and tunion (with probit
link and binomial distribution) to be the responses and model them with
a random intercept and the set of explanatory variables outlined above.
Besides allowing for the overdispersion in ln wage and tunion, and the
correlation between them, the ln wage equation contains tunion as an
explanatory variable. We start by estimating separate two-level models
on the sequences of ln wage and tunion from the dataset nls.tab, then
we estimate the bivariate model.
The SabreR commands required to fit the two separate two-level
models are:
sabre.model.1 <- sabre(ln_wage~black+msp+
grade+not_smsa+south+tunion+tenure+1,
case=idcode,first.family="gaussian")
Models for multivariate data 129
sabre.model.2 <- sabre(tunion~age+age2+black+
msp+grade+not_smsa+south+1,case=idcode,
first.link="probit")
These commands result in the following two sets of output:
Log likelihood = -4892.5205
on 18985 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_____________________________________________________
cons 0.75217 0.26994E-01 27.865
black -0.70564E-01 0.12656E-01 -5.5756
msp -0.12989E-02 0.59885E-02 -0.21690
grade 0.72967E-01 0.19959E-02 36.558
not_smsa -0.14528 0.88414E-02 -16.432
south -0.73888E-01 0.89322E-02 -8.2721
union 0.11024 0.65211E-02 16.905
tenure 0.28481E-01 0.64979E-03 43.831
sigma 0.26176 0.15024E-02 174.23
scale 0.27339 0.35702E-02 76.575
Log likelihood = -7647.0998
on 18986 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_____________________________________________________
cons -2.5916 0.38587 -6.7163
age 0.22417E-01 0.23566E-01 0.95122
age2 -0.22314E-03 0.37641E-03 -0.59280
black 0.82324 0.68871E-01 11.953
msp -0.71011E-01 0.40905E-01 -1.7360
grade 0.69085E-01 0.12453E-01 5.5479
not_smsa -0.13402 0.59397E-01 -2.2563
south -0.75488 0.58043E-01 -13.005
scale 1.4571 0.35516E-01 41.026
130 Multivariate Generalized Linear Mixed Models Using R
The SabreR command needed to fit the bivariate model is
sabre.model.3 <- sabre(ln_wage~black+msp+
grade+not_smsa+south+tunion+tenure+1,
tunion~age+age2+black+msp+grade+not_smsa+
south+1,case=idcode,first.family="gaussian",
second.link="probit")
This command produces the output:
Log likelihood = -12529.120
on 37970 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
______________________________________________________
r1 0.75162 0.26753E-01 28.095
r1_black -0.69805E-01 0.12511E-01 -5.5795
r1_msp -0.14237E-02 0.59871E-02 -0.23780
r1_grade 0.73275E-01 0.19736E-02 37.127
r1_not_smsa -0.14524 0.88679E-02 -16.378
r1_south -0.74533E-01 0.89063E-02 -8.3685
r1_union 0.96328E-01 0.70837E-02 13.599
r1_tenure 0.28328E-01 0.65261E-03 43.407
r2 -2.5481 0.38382 -6.6387
r2_age 0.20406E-01 0.23558E-01 0.86618
r2_age2 -0.18467E-03 0.37617E-03 -0.49092
r2_black 0.84621 0.69172E-01 12.233
r2_msp -0.64955E-01 0.41090E-01 -1.5808
r2_grade 0.64562E-01 0.12164E-01 5.3076
r2_not_smsa -0.10254 0.58471E-01 -1.7537
r2_south -0.73260 0.56972E-01 -12.859
sigma1 0.26170 0.15009E-02 174.36
scale1 0.27466 0.36213E-02 75.845
scale2 1.4765 0.37284E-01 39.601
corr 0.11927 0.24144E-01 4.9399
The results of fitting these models demonstrate the different levels
of overdispersion in the different responses and a significant positive
correlation between the random intercepts. The effect of trade union
membership in the wage equation changes from 0.11024 (in the model
which does allow for the overdispersion of the different responses but
not the correlation between them) to 0.09633 (in the model which also
Models for multivariate data 131
accounts for the correlation between responses). This suggests that the
effect of trade union membership on log wages is slightly endogenous.
11.6 Multivariate two-level generalized linear model
likelihood
We can write the likelihood associated with the multivariate two-level
generalized linear model in a general form:
∞
YZ Z YY
L (γ, φ, Σu0 |y, x, z) = ··· g r yij
r
| θrij , φr f (u0j ) du0j ,
j −∞ i r
where γ = γ 1 , γ 2 , ..., γ R , γ r contains the regression
h coefficients iof the
linear predictor θ ij , the scale parameters are φ = φ1 , φ2 , ..., φR , and
r
f (u0j ) is a multivariate normal distribution of dimension R with mean
zero and variance-covariance structure Σu0 .
SabreR evaluates the integral L (γ, φ, Σu0 |y, x, z) in up to three
dimensions using standard Gaussian quadrature or adaptive Gaussian
quadrature (numerical integration).
11.7 Exercises using multivariate generalized linear
mixed models
Exercise 11.7.1. Bivariate data of the same type: two continu-
ous/interval scale responses: expiratory flow rates
Bland and Altman [18] reported on a study to compare the Stan-
dard Wright peak flow meter with the (then) new Mini Wright peak
flow meter. In this study, 17 volunteers had their peak expiratory flow
rate (PEFR) measured on a total of four occasions: twice using the Stan-
dard Wright peak flow meter and twice using the new Mini Wright peak
flow meter. To avoid instrument effects being confounded with prior ex-
perience effects, the instruments were used in a random order.
The dataset to be analyzed in this exercise comprises four variables:
(i) the person identifier (id); (ii) the occasion number (occasion) which
132 Multivariate Generalized Linear Mixed Models Using R
takes values 1 and 2; (iii) the PEFR recorded using the Standard Wright
meter (wp); (iv) the PEFR recorded using the Mini Wright meter (wm).
The data are stored in the data file pefr.tab:
1. Use SabreR to estimate a linear model for the response wp with
occasion 2 (occ2) as a binary indicator variable and an id random
effect. Is occ2 significant? Are the person-specific random effects
(id) significant? Use adaptive quadrature with mass 12 and set
the starting value for scale to 110.
2. Estimate a linear model for the response wm with occasion 2 (occ2)
as a binary indicator variable and an id random effect. Is occ2
significant? Are the person-specific random effects (id) significant?
Use adaptive quadrature with mass 12 and set the starting value
for scale to 100.
3. Estimate a joint model for wp and wm with occ2 as a binary indica-
tor variable in both linear predictors, and use adaptive quadrature
with 12 mass points for both dimensions. As this is a very small
dataset, the likelihood is not well defined. Use the following start-
ing values: 0.9 for rho, 20 for both values of sigma, 110 for the
first scale parameter and 110 for the second. How significant is
the correlation between the random effects of each type of meter?
How does the significance of the occ2 effect change, relative to that
obtained in questions 1 and 2?
4. On the basis of these results, would you be prepared to replace the
Standard Wright flow meter with the new Mini Wright meter?
Exercise 11.7.2. Bivariate data of different types: one continu-
ous/interval scale response (wages) and one binary response (trade union
membership)
We have previously used data on 545 males from the Youth Sample
of the US National Longitudinal Survey. In Exercises 2.8.5 and 5.16.5,
we related log hourly wage to a time-invariant factor (ethnicity) and a
variety of time-dependent explanatory variables. In Exercises 3.4.2 and
6.7.2, we modelled trade union membership.
In the current exercise, we use the same dataset (wagepan.tab ) and
start by re-estimating separate models for log hourly wage and for trade
union membership. We then estimate a joint model which allows trade
union membership to be endogenous in the wage equation:
1. Use SabreR to estimate a linear model which relates lwage (log
of hourly wage) to the explanatory variables (educ, black, hisp,
Models for multivariate data 133
exper, expersq, married and tunion), with the respondent iden-
tifier (nr) as the random effect. Use adaptive quadrature with 12
mass points. Is this random effect significant?
2. Estimate a logit model for trade union membership (tunion), with
the explanatory variables (black, hisp, exper, educ, poorhlth,
married, rur, nrthcen, nrtheast and south). Use adaptive
quadrature with 64 mass points. Use case nr (respondent iden-
tifier). Is this random effect significant?
3. Using the model specifications for log wages and trade union mem-
bership you have just used, estimate a joint model of the deter-
minants of log wages and trade union membership. Use adaptive
quadrature, with mass 4 for the linear model and mass 64 for the
binary response.
4. What is the magnitude and significance of the correlation between
the random effects for log wages and union membership? How do
the magnitude and significance of the direct effect of tunion in the
wage equation change? What are the reasons for this? Have any
other features of the models changed? What does this imply?
This page intentionally left blank
12
Models for duration and event history
data
12.1 Introduction
An important type of discrete data occurs when we model the duration to
some pre-specified event. Examples of such data include: the duration in
unemployment from the start of a spell of unemployment until the start
of work, the time between shopping trips or the time to first marriage.
This type of discrete data has several important features such as left and
right censoring, time-varying explanatory variables and competing risks.
We will examine each of these features in the following sub-sections.
12.1.1 Left censoring
The start of the observation period may cut into an ongoing spell, for
example, a period of unemployment. When this happens, we say that
the duration is left censored. We will assume throughout this chapter
that left censoring is non-informative for event history models; in other
words, that the likelihood of an event occurring in a given interval is the
same, irrespective of whether that particular spell is ongoing or not.
12.1.2 Right censoring
Complete durations or times to the event of interest are often not ob-
served for all the sampled subjects or individuals. This often happens
because the event of interest had not happened by the end of the ob-
servation window. When this happens, we say that the spell is right
censored. This feature is represented in Figure 12.1. In Figure 12.1, the
single pre-specified event has occurred for cases 1, 2 and 3. However, the
event has not happened to case 4 during the period of observation.
135
136 Multivariate Generalized Linear Mixed Models Using R
Case 1: |-------------------------------------------x (event)
Case 2: |--------------------------------x (event)
Case 3: |----------------------------------------x (event)
Case 4: |---------------------------------------------------|- - -?
time
FIGURE 12.1
Duration data
12.1.3 Time-varying explanatory variables
The temporal scale of most social processes is so large (months or years)
that it is inappropriate to assume that all explanatory variables remain
constant. For example, in a spell of unemployment, the local labour mar-
ket conditions, including the unemployment rate, will vary on a monthly
basis as the national economic climate changes. Other explanatory vari-
ables, like the subject’s age, change automatically with time.
12.1.4 Competing risks
The durations or spells can be of different types. For example, the dura-
tion that a household spends in rented accommodation until they move
to another rented property could have different characteristics to the du-
ration of a household living in rented accommodation until they become
owner occupiers. This type of data can be modelled using competing risk
models. The theory of competing risks provides a structure for inference
in problems where subjects are exposed to several types of failure. Com-
peting risk models are used in many fields, for example, in the prepa-
ration of life tables for biological populations and in the reliability and
safety of engineering systems. We will examine competing risk models
in more detail in Section 12.4.
Models for duration and event history data 137
12.2 Duration data in discrete time
There is a large body of literature on the modelling of duration data as
they are termed in a social science context, otherwise known as survival
data in medicine and as failure time data in reliability. In the social
sciences, a duration is typically observed over a sequence of intervals, for
example, over a number of weeks or months, so we are going to focus
on discrete-time models in this chapter. We do not reduce our modelling
options by considering our data in discrete time, as durations measured
at finer intervals of time such as days, hours, or even seconds can also
be written down as a sequence of intervals. We can also group the data
by using larger intervals (such as weeks or months) than those at which
the durations are measured.
12.2.1 Single-level models for duration data
Suppose we have a binary indicator yij for individual j, which takes
the value 1 if the spell ends in a particular interval i and 0 otherwise.
Individual j’s duration can be viewed as a series of events over consecu-
tive time periods (i = 1, 2, ..., Tj ) which can be represented by a binary
sequence:
yj = y1j , y2j , ..., yTj j .
If we only observe a single spell for each subject, this would be a
sequence of 0s, which would end with a 1 if the spell is complete and
with a 0 if it is right censored. We can apply the two-level binary response
model (see Chapter 6): the probability that yij = 1 for individual j at
interval i, given that yi′ j = 0, ∀i′ < i, is given by:
Pr (yij = 1 | θij ) = 1 − F (θij )
= µij .
But, instead of using the logit link or the probit link, as we did in Chapter
6, we use the complementary log log link, which gives:
µij = 1 − exp[− exp (θ ij )].
This model was derived by Prentice and Gloeckler [89]. The linear pre-
dictor takes the form:
X
θij = β 0j + β pj xpij + ki ,
p
where the ki are interval-specific constants, and the xpij are explanatory
138 Multivariate Generalized Linear Mixed Models Using R
Subject identifier Duration Censored
j Tj (1=No, 0=Yes)
1 4 1
2 3 0
3 1 1
TABLE 12.1
Sample of duration data in continuous time
Subject Interval Response Interval-specific constants
identifier j i yij k1 k2 k3 k4
1 1 0 1 0 0 0
1 2 0 0 1 0 0
1 3 0 0 0 1 0
1 4 1 0 0 0 1
2 1 0 1 0 0 0
2 2 0 0 1 0 0
2 3 0 0 0 1 0
3 1 1 1 0 0 0
TABLE 12.2
Sample of duration data, reconfigured in discrete time
variables describing individual and contextual characteristics as before.
In the current context, the ki may be interpreted in terms of the inte-
grated baseline hazard. The ki are given by:
ki = log {Λ0 (ti ) − Λ0 (ti−1 )} ,
where the Λ0 (ti−1 ) and Λ0 (ti ) are the values of the integrated baseline
hazard at the start and at the end of the i-th interval.
To help clarify the notation, we give an example of what the data
structure would look like for three spells (without explanatory variables).
Suppose we had the data as shown in Table 12.1. In Table 12.1, subjects
1 and 3 have their event occurring in intervals 4 and 1 respectively,
while subject 2 has a spell of length 3, which is right censored. The data
structure we need to model these duration data in discrete time is given
in Table 12.2.
To identify the model, we need to fix the constant at zero or remove
one of the interval-specific constants ki . We often fix the constant at
zero. The likelihood of a subject that is right censored at the end of the
Models for duration and event history data 139
Tj -th interval is:
Tj Tj
Y Y y 1−yij
1 − µij = µijij 1 − µij ,
i=1 i=1
where yTj j = 0, while that of a subject whose spell ends without a
censoring in the Tj -th interval is:
Tj −1 Tj
Y Y y 1−yij
µiTj 1 − µij = µijij 1 − µij ,
i=1 i=1
as yTj j = 1.
12.2.2 Two-level models for duration data
Recall that the binary response yij takes the value 1 if the spell for
individual j ends in interval i and takes the value 0 otherwise such that
individual j’s duration can be represented by the sequence of binary
responses:
yj = y1j , y2j , ..., yTj j .
We might expect that any two observations on the same individual
are likely to be more highly correlated with each other than two responses
observed on two different individuals. In other words, we would expect
the binary responses yij and yi′ j , i 6= i′ , to be more highly dependent
upon each other than the responses yij and yij ′ , j 6= j ′ . We can account
for this dependence by incorporating random effects into the modelling
framework. To allow for the random intercept in the linear predictor:
X
θij = β 0j + β pj xpij + ki ,
p
we can use multi-level substitutions, with the constraint γ 00 = 0, so that:
Q
X
β 0j = γ 0q zpj + u0j , β pj = γ p0 .
q=1
The general model then becomes:
P
X Q
X
θ ij = γ p0 xpij + γ 0q zqj + ki + u0j ,
p=1 q=1
and the likelihood becomes:
+∞
YZ Y
L γ, k, φ, σ 2u0 |y, x, z = g (yij | θij , φ) f (u0j ) du0j ,
j −∞ i
140 Multivariate Generalized Linear Mixed Models Using R
with complementary log log link c and binomial error b so that φ =
1, µij = 1 − exp (− exp θij ),
y 1−yij
g (yij |xij , zj , u0j ) = µijij 1 − µij ,
!
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
SabreR evaluates the integral L γ, k, φ, σ 2u0 |y, x, z for this binary
response model using numerical quadrature (integration). Various iden-
tifiability issues arise when fitting multi-level duration models because
of the internal nature of the duration effects on the linear predictor.
Identifiability was first discussed for two-level continuous time models
by Elbers and Ridder [40], and later by Heckman and Singer [56, 57].
These authors show that explanatory variables are needed to identify
most two-level duration models, when the random effect (mixing) dis-
tribution has a finite mean (like the Gaussian distribution). The main
exception is the Weibull model, which is identified without explanatory
variables. These results follow through into discrete time models. The
identifiability of competing risk models is similar [55]. Random effect
distributions with an infinite mean are beyond the scope of this book.
For a discussion on these distributions, see [62, 63].
12.2.3 Three-level models for duration data
Thus far in this chapter, we have considered one-level and two-level
models for duration data. We can also apply three-level models. We
acknowledge the extra level k by denoting the binary response variable
by yijk . We illustrate the application of a three-level duration model by
way of an example.
Example 12.2.1. Time to fill vacancies
This example involves three-level event history data: the length of
time (level one, observed at the weekly level) needed to fill vacancies
(level two) by employers (level three) [20]. Thus the binary response
yijk = 1 if vacancy j of firm k is filled in interval i, and yijk = 0 otherwise.
The dataset (vwks4 30k.tab) comprises a total of 28,791 weeks required
by 515 firms to fill 1,736 vacancies. We estimate a stock model relating
the duration of the vacancy to the firm’s characteristics and to those of
the vacancy. We control for explanatory variables which represent the
stock of the labour market at the current duration; in other words, the
(log) total number of job-seekers and the (log) total number of vacancies
Models for duration and event history data 141
in the local labour market. We would expect that the binary responses
yijk and yi′ jk to be more similar than those of different j. We include
a vacancy-specific random effect in the model in order to take these
similarities into account. The SabreR command required to fit this two-
level model is:
sabre.model.1 <- sabre(match~factor(t)+loguu+
logvv+nonman+written+size+wage+grade+
dayrel-1,case=vacref,first.link="cloglog",
adaptive.quad=TRUE,first.mass=48)
This results in the following output:
Log likelihood = -2268.2074
on 28772 residual degrees of freedom
Parameter Estimate Std. Err.
______________________________________________
ft( 1) -10.660 1.3780
ft( 2) -10.458 1.3499
ft( 3) -10.728 1.3365
ft( 4) -10.715 1.3324
ft( 5) -11.294 1.3435
ft( 6) -11.318 1.3329
ft( 7) -10.756 1.3412
ft( 8) -10.643 1.3635
ft( 9) -10.883 1.3841
ft( 10) -11.280 1.4424
loguu 1.0886 0.15437
logvv -0.26518 0.13096
nonman -0.44384 0.19154
written -0.94262 0.21713
size 0.87120E-01 0.63396E-01
wage 0.60059E-01 0.91802E-01
grade 0.56564E-01 0.10113
dayrel -0.66028 0.22303
scale 1.9924 0.20134
The scale parameter in this model indicates that there is significant
variation in duration between vacancies left unexplained by the explana-
tory variables. We would also expect that the duration of vacancies of a
particular firm to be more similar than the duration of vacancies of dif-
ferent firms. We add a firm-specific random effect to the model in order
142 Multivariate Generalized Linear Mixed Models Using R
to handle these dependencies. The SabreR command needed to fit this
three-level model is:
sabre.model.2 <- sabre(match~factor(t)+loguu+
logvv+nonman+written+size+wage+grade+
dayrel-1,case=list(vacref,empref),
first.link="cloglog",adaptive.quad=TRUE,
first.mass=64,second.mass=64)
This command produces the output:
Log likelihood = -2247.6656
on 28771 residual degrees of freedom
Parameter Estimate Std. Err.
_________________________________________________
ft( 1) -9.7980 1.4117
ft( 2) -9.6039 1.3854
ft( 3) -9.8799 1.3725
ft( 4) -9.8826 1.3689
ft( 5) -10.452 1.3803
ft( 6) -10.451 1.3703
ft( 7) -9.8342 1.3806
ft( 8) -9.6961 1.4088
ft( 9) -9.8826 1.4293
ft( 10) -10.246 1.4852
loguu 1.1429 0.16637
logvv -0.48556 0.14794
nonman -0.44829 0.20378
written -0.79079 0.22718
size 0.72855E-01 0.78514E-01
wage 0.11520E-01 0.95085E-01
grade 0.15733E-01 0.10515
dayrel -0.66339 0.23044
scale2 1.5626 0.19974
scale3 1.2265 0.15780
The change in log likelihood between the two models, and the esti-
mate of the parameter scale3, indicate that there is significant variation
in vacancy duration between the firms left unexplained by the model.
Models for duration and event history data 143
12.3 Renewal data
12.3.1 Introduction
When a subject experiences repeated events of the same type during a
period of observation, we can apply a renewal model. A diagrammatic
representation of such data is presented in Figure 12.2.
|————————x———————————-x—————–| censored
|———————————————x————| censored
|—————x————————————————————–| censored
|—————————————–x————————————| censored
|——————————————————————–x———| censored
|——————————————————x——x—————| censored
|——————————————————————————| censored
|—————————————————| censored
FIGURE 12.2
Diagrammatic representation of renewal data
In Figure 12.2, the subjects that are still present at the end of the
observation window have their last event right censored. Two subjects
leave the survey before the end of the observation window. Two subjects
experience two events each before censoring. Four subjects have one
event occurring before they are censored. Two subjects do not experience
any events before censoring.
To help clarify the notation, we give an example of what the data
structure would look like for three subjects observed over four intervals
(without explanatory variables). Suppose we had a set of renewal data,
a sample of which is shown in Table 12.3. In Table 12.3, subject 1 expe-
riences an event after two intervals, followed by two intervals without an
event. Subject 2 has an event occurring at the end of interval 1, and is
then right censored by the end of interval 4. Subject 3 progresses through
all four intervals without experiencing any events.
We now use duration constants (instead of interval constants) to de-
fine the duration that occurs in the i-th interval. The renewal data given
in Table 12.3 can be reconfigured in discrete time form, as presented in
Table 12.4.
144 Multivariate Generalized Linear Mixed Models Using R
Subject identifier Duration Censored
j Tj (1=No, 0=Yes)
1 2 1
1 2 0
2 1 1
2 3 0
3 4 0
TABLE 12.3
Sample of renewal data in continuous time
Subject Interval Duration Response Duration-specific constants
identifier j i d yij k1 k2 k3 k4
1 1 1 0 1 0 0 0
1 2 2 1 0 1 0 0
1 3 1 0 1 0 0 0
1 4 2 0 0 1 0 0
2 1 1 1 1 0 0 0
2 2 1 0 1 0 0 0
2 3 2 0 0 1 0 0
2 4 3 0 0 0 1 0
3 1 1 0 1 0 0 0
3 2 2 0 0 1 0 0
3 3 3 0 0 0 1 0
3 4 4 0 0 0 0 1
TABLE 12.4
Sample of renewal data, reconfigured in discrete time
We form the likelihood for the renewal model with the product of
y 1−yij
µijij 1 − µij over the complete sequence. The yij deal with the
occurrence/non-occurrence of the event and the kd deal with the dura-
tion of the spell in the i-th interval. These duration data may then be
analyzed using a binary response model with a subject-specific random
effect, as illustrated in the following sub-section.
Models for duration and event history data 145
12.3.2 Example: renewal models
Example 12.3.1 Residential mobility
In 1986, the ESRC funded the Social Change and Economic Life Ini-
tiative, SCELI [86]. Under this initiative, work and life histories were
collected for a sample of individuals from six different geographical ar-
eas in the UK. One of these locations was Rochdale. The Rochdale
dataset (roch.tab) contains annual data on male respondents’ residen-
tial behaviour since entering the labour market. These are residence
histories comprising a total of 6,349 observations on 348 Rochdale men
aged between 20 and 60 at the time of the survey. We are going to use
these data to study the determinants of residential mobility.
The binary response is the variable move which takes value 1 if a
residential move occurs during the current year, and takes value 0 oth-
erwise. The explanatory variable dur, which measures the number of
years since the last move, is endogenous; in other words, it is internally
related to the process of interest. Employment status at the beginning of
the year, emp, is coded as: 1: self employed; 2: employee; 3: not working.
This categorical variable is represented by two dummy variables: emp2:
1 if employment status at the beginning of the year is ‘employee’, 0 oth-
erwise; emp3: 1 if employment status at the beginning of the year is ‘not
working’, 0 otherwise.
Three marriage-related variables are examined: fm: 1 if the respon-
dent is married for the first time during the year, 0 otherwise; mar: 1 if
the respondent is already married at the beginning of the year, 0 oth-
erwise; mbu: 1 if the respondent’s marriage breaks up during the year,
0 otherwise. The age variable (in years) is centred on 30. We create
quadratic (age2) and cubic (age3) terms in age to allow more flexibility
when modelling this variable; in other words, to allow for a non-linear
relationship between age and the log odds of moving. We fit a comple-
mentary log log model with a respondent-specific random effect in order
to relate the binary response variable (move) to the explanatory variables
age, dur, fm, mbu, mar, emp2 and emp3. The SabreR command needed
to fit this model is:
sabre.model.1 <- sabre(move~age+dur+fm+mbu+mar+
emp2+emp3+1,case=case,first.link="cloglog")
146 Multivariate Generalized Linear Mixed Models Using R
This command leads to the output:
Log likelihood = -1092.8370
on 6340 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_____________________________________________________
cons -2.4485 0.38744 -6.3197
age 0.20791E-02 0.13319E-01 0.15610
dur -0.11510 0.20926E-01 -5.5004
fm 0.59640 0.21071 2.8305
mbu 1.2865 0.60746 2.1178
mar -0.52053 0.17935 -2.9024
emp2 -0.15696 0.32218 -0.48717
emp3 -0.22194E-01 0.37914 -0.58537E-01
scale 0.95701 0.12322 7.7669
Then we add the quadratic and cubic age effects, age2 and age3, to this
model. The SabreR command required to fit this model is:
sabre.model.2 <- sabre(move~age+dur+fm+mbu+mar+
emp2+emp3+age2+age3+1,case=case,
first.link="cloglog")
This command produces the following output:
Log likelihood = -1085.6462
on 6338 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
__________________________________________________
cons -2.2152 0.40755 -5.4354
age -0.41466E-01 0.20697E-01 -2.0035
dur -0.11896 0.22185E-01 -5.3624
fm 0.37503 0.21795 1.7207
mbu 1.2371 0.60712 2.0377
mar -0.65709 0.18325 -3.5857
emp2 -0.17667 0.32416 -0.54502
emp3 -0.64809E-01 0.38327 -0.16909
age2 -0.27919E-02 0.97393E-03 -2.8666
age3 0.25579E-03 0.88150E-04 2.9018
scale 0.95151 0.12350 7.7043
Models for duration and event history data 147
The addition of the variables age2 (coefficient -0.0027919, s.e.
0.00097393) and age3 (coefficient 0.00025579, s.e. 0.000088150) to the
model significantly reduces the log likelihood. Age clearly has a compli-
cated relationship with the probability of moving. The duration effect
dur has coefficient −0.11896 (s.e. 0.022185), which suggests that the re-
spondent is less likely to move the longer he stays in his current home.
The respondent-specific random effect is highly significant: the scale
parameter takes the value 0.95151 (s.e. 0.12350).
12.4 Competing risk data
12.4.1 Introduction
The theory of competing risks provides a structure for inference in prob-
lems where subjects are exposed to several types of event. In Sub-section
12.1.4, we presented the example of a household in rented accommo-
dation. The members of this household could either move to different
rented accommodation or become owner occupiers (two possible des-
tination types of housing tenure). In the context of labour markets, a
spell of unemployment could end in employment in a skilled occupation,
a semi-skilled occupation or an unskilled occupation (three possible des-
tination states).
The same subjects are exposed to the possibility of different types of
event occurring. We would expect the probability of a particular event
occurring at a given interval to be correlated with the probability of that
event occurring at another interval. We would also expect the probabil-
ities of the different types of event occurring to be correlated.
Figure 12.3 shows failure due to two mechanisms A and B. Three
observations are terminated by events of type A. Events of type B occur
for three further subjects. Two observations are censored.
Suppose we wish to model the failures of type A. Define an event as
a time when a failure of type A occurs, and treat all other observations
as censored. In other words, if a failure of type B occurs at time t1 , this
is regarded as a censoring at time t1 as far as process A is concerned
because a failure of type A has not yet occurred by time t1 .
Thus the original failure data given in Figure 12.3 are reconfigured
as the data presented in Figure 12.4. This reconfiguration of the origi-
nal data is replicated for each failure type. The data required to model
failures of type B are shown in Figure 12.5.
148 Multivariate Generalized Linear Mixed Models Using R
|———————————————————-x A
|———————————————x B
|————————————————————————–| censored
|—————————————–x A
|——————————————————————–x B
|——————————————————xB
|———————————————————————-xA
|—————————————————| censored
FIGURE 12.3
Example of competing risk data: failure due to two failure mechanisms
|———————————————————-x A
|———————————————|censored
|————————————————————————–| censored
|—————————————–x A
|——————————————————————–| censored
|——————————————————|censored
|———————————————————————-xA
|—————————————————| censored
FIGURE 12.4
Data required to model failure due to mechanism A
In Table 12.5, we present a sample of competing risk data concerning
the times to events of two types (A and B) occurring on three subjects.
Subject 1 has an event of type A occurring by the end of interval 2.
Subject 2 is censored at the end of interval 1 without an event occurring.
Subject 3 experiences an event of type B by the end of interval 4.
These data may be reconfigured in discrete time: a binary response
(an event of a specific type occurring) to be related to duration-specific
constants. The original competing risk data presented in Table 12.5 are
reconfigured as the data shown in Table 12.6.
12.4.2 Likelihood
The likelihood associated with the competing risk model presented in
this chapter is:
∞
YZ Z YY
L (γ, k, φ, Σu0 |y, x, z) = ··· g r yij
r
| θrij , φr f (u0j ) du0j ,
j −∞ i r
Models for duration and event history data 149
|———————————————————-|censored
|———————————————x B
|————————————————————————–| censored
|—————————————–|censored
|——————————————————————–x B
|——————————————————xB
|———————————————————————-|censored
|—————————————————| censored
FIGURE 12.5
Data required to model failure due to mechanism B
Subject identifier Duration Event Censored
j Tj (1=A,2=B) (1=No, 0=Yes)
1 2 1 1
1 2 2 0
2 1 1 0
2 1 2 0
3 4 1 0
3 4 2 1
TABLE 12.5
Sample of competing risk data in continuous time
Duration-specific
Subject Interval Duration Response Event constants
identifier j i d yij 1=A,2=B k1 k2 k3 k4
1 1 1 0 1 1 0 0 0
1 2 2 1 1 0 1 0 0
1 1 1 0 2 1 0 0 0
1 2 2 0 2 0 1 0 0
2 1 1 0 1 1 0 0 0
2 1 1 0 2 1 0 0 0
3 1 1 0 1 1 0 0 0
3 2 2 0 1 0 1 0 0
3 3 3 0 1 0 0 1 0
3 4 4 0 1 0 0 0 1
3 1 1 0 2 1 0 0 0
3 2 2 0 2 0 1 0 0
3 3 3 0 2 0 0 1 0
3 4 4 1 2 0 0 0 1
TABLE 12.6
Sample of competing risk data, reconfigured in discrete time
150 Multivariate Generalized Linear Mixed Models Using R
r
with complementary log log link c and binomial error b so that φ =
r r
1, µij = 1 − exp − exp θij :
yr 1−yij
r
g r yij
r
|θrij , φr = µrij ij 1 − µrij ,
P
X Q
X
θrij = γ rp0 xpij + γ r0q zqj + kir + ur0j ,
p=1 q=1
where R is the number of different event types (competing risks),
r = 1, 2, ..., R, γ = γ 1 , γ 2 , ..., γ R , γ r contains the regression coef-
ficients of the linear predictor θ rij , k = k1 , k2 , ..., kR and f (u0j ) is
a multivariate normal distribution of dimension R with mean zero
and variance-covariance structure Σu0 . SabreR evaluates the integral
L (γ, k, φ, Σu0 |y, x, z) using standard Gaussian quadrature or adaptive
Gaussian quadrature (numerical integration).
12.4.3 Example: competing risk data
Example 12.4.1 Filled and lapsed vacancies
This example is from a study of the determinants of employer search
in the UK. The analysis involves modelling a job vacancy duration until
it is either filled successfully or withdrawn from the market (lapsed).
The respective binary response variables are filled: 1 if the vacancy
is filled, 0 otherwise; lapsed: 1 if the vacancy lapses, 0 otherwise. The
model has a ‘filled’ random effect for the ‘filled’ sequence and a ‘lapsed’
random effect for the ‘lapsed’ sequence. Rather than treat the ‘filled’
and ‘lapsed’ response sequences as if they were independent from each
other, we allow for a correlation between the two random effects. There
are 7,234 filled vacancies and 5,606 lapsed vacancies. For further details,
see [10].
The variable logt records log vacancy duration in weeks. Vacancy
characteristics are hwage: hourly wage; nonman: 1 if the vacancy is for a
non-manual job, 0 otherwise; skilled: 1 if the vacancy is for a skilled
occupation, 0 otherwise. Size of firm is categorized as follows: noemps1: 1
if the firm has less than or equal to 10 employees, 0 otherwise; noemps2:
1 if the firm has between 11 and 30 employees, 0 otherwise; noemps3:
1 if the firm has between 31 and 100 employees, 0 otherwise; noemps4:
1 if the firm has more than 100 employees, 0 otherwise. Only three of
these indicator variables need to be included in the model to represent
fully size of firm. In this case, we use the last three variables noemps2 to
noemps4.
For each type of risk, we use a Weibull baseline hazard; in other
words, with log t in the linear predictor of the complementary log log
Models for duration and event history data 151
link and, for simplicity, the same six explanatory variables. The com-
bined dataset (vacancies.tab ) has 22,682 observations, with the 2,374
vacancies being represented by ‘filled’ and ‘lapsed’ indicators. In order
to model the durations taken to fill vacancies, each ‘filled’ sequence com-
prises a series of 0s ending in a 1 at the point where the vacancy is filled,
and the ‘lapsed’ sequence comprises a complete series of 0s. In order to
model the points in time when vacancies become lapsed, each ‘lapsed’
sequence consists of a series of 0s terminating in a 1 at the point when
the vacancy lapses, and the ‘filled’ sequence consists of a complete series
of 0s.
First, we fit the bivariate model with uncorrelated random effects;
in other words, we add vacancy-specific random effects (corr=0). The
SabreR command required to fit this model is:
sabre.model.1 <- sabre(filled~logt+noemps2+
noemps3+noemps4+hwage+nonman+skilled+1,
lapsed~logt+noemps2+noemps3+noemps4+hwage+
nonman+skilled+1,case=vacnum,
first.link="cloglog",second.link="cloglog",
first.mass=32,second.mass=32,correlated="no")
This command leads to the following output:
Log likelihood = -7287.7119
on 22664 residual degrees of freedom
Parameter Estimate Std. Err.
___________________________________________________
r1 -0.73882 0.12176
r1_logt -0.33126 0.11457
r1_noemps2 -0.62864E-02 0.84908E-01
r1_noemps3 0.95274E-01 0.92602E-01
r1_noemps4 -0.24901 0.10348
r1_hwage -0.50311 0.10228
r1_nonman -0.91256E-01 0.78260E-01
r1_skilled -0.24494 0.77469E-01
r2 -2.3474 0.19325
r2_logt 0.39002 0.14721
r2_noemps2 -0.21549 0.10585
r2_noemps3 -0.49738 0.13083
r2_noemps4 -0.33570 0.11693
r2_hwage -0.21624 0.10120
r2_nonman 0.88611E-01 0.89750E-01
152 Multivariate Generalized Linear Mixed Models Using R
r2_skilled -0.18809 0.91930E-01
scale1 0.71227 0.20805
scale2 0.76498 0.23191
The results show that the scale1 (filled) parameter is estimated
to be 0.71227 (s.e. 0.20805), the scale2 (lapsed) parameter is estimated
to be 0.76498 (s.e. 0.23191) and the respective parameters on logt are
−0.33126 and 0.39002 respectively.
Next, we allow for a correlation between the random effects of the
filled and lapsed durations. The SabreR command used to fit this corre-
lated random effects model is:
sabre.model.2 <- sabre(filled~logt+noemps2+
noemps3+noemps4+hwage+nonman+skilled+1,
lapsed~logt+noemps2+noemps3+noemps4+hwage+
nonman+skilled+1,case=vacnum,
first.link="cloglog",second.link="cloglog",
first.mass=32,second.mass=32,correlated="yes")
This results in the following output:
Log likelihood = -7217.7072
on 22663 residual degrees of freedom
Parameter Estimate Std. Err.
___________________________________________________
r1 -0.96329 0.15666
r1_logt -0.35523 0.87898E-01
r1_noemps2 0.37481E-01 0.10638
r1_noemps3 0.18021 0.11994
r1_noemps4 -0.23653 0.13044
r1_hwage -0.53793 0.12288
r1_nonman -0.10936 0.95910E-01
r1_skilled -0.26235 0.96725E-01
r2 -7.6478 1.5206
r2_logt 2.7385 0.72471
r2_noemps2 -0.75970 0.38966
r2_noemps3 -1.6889 0.51056
r2_noemps4 -1.0762 0.44983
r2_hwage -0.26480 0.39838
r2_nonman 0.47016 0.32326
r2_skilled -0.36773 0.33192
Models for duration and event history data 153
scale1 1.2887 0.16222
scale2 5.2516 1.1412
corr -0.89264 0.35399E-01
There is a change in −2 times log likelihood of:
−2(−7287.7119 − (−7217.7072)) = 140. 01,
over the model that assumes independence between the filled and
lapsed vacancy destination states. These last results demonstrate the
different levels of overdispersion in the two responses, and a negative
correlation between the random effects of the two risks. This may be ex-
pected, as a filled vacancy cannot lapse and a lapsed vacancy cannot be
filled. The random effect of the filled vacancies has a standard deviation
of 1.2887 (s.e. 0.16222), while that of the lapsed vacancies is consider-
ably larger at 5.2516 (s.e. 1.1412), and their correlation is −0.89264 (s.e.
0.035399).
The inferences about duration effects differ between models. For
instance, the coefficient on r2 logt is estimated to be 0.39002 (s.e.
0.14721) in the bivariate model with uncorrelated random effects. This
coefficient becomes 2.7385 (s.e. 0.72471) in the bivariate model with cor-
related random effects. This result suggests that, the longer a vacancy
goes unfilled, the longer it is likely to be unfilled. Differences also occur
for the firm size effects (r2 noemps2,...,r2 noemps4) which are over
two times larger in the bivariate model with correlated random effects.
12.5 Exercises using renewal and competing risks
models
Exercise 12.5.1. Angina pectoris (renewal models)
Danahy et al. [32] recorded the length of exercise time (in seconds)
required to induce angina pectoris (chest pain) in 21 heart patients.
Pickles and Crouchley [88] analyzed a subset of the data on each pa-
tient: the time to onset of angina on three occasions: (i) just before oral
administration of a dose of isosorbide dinitrate; (ii) one hour after ad-
ministration; (iii) three hours after administration. On each occasion,
the duration of a period of exercise was measured from the start of that
exercise session.
154 Multivariate Generalized Linear Mixed Models Using R
The dataset to be analyzed in this book (angina.tab ) includes the
variable time which takes values 1 (pre-dose), 2 (one hour post-dose) and
3 (three hours post-dose), with corresponding indicator variables t1, t2
and t3. Both post-dose occasions are combined in the variable d, which
takes values 1 (pre-dose) and 2 (post-dose), with respective indicator
variables d1 and d2. Dosage (dose) will be regarded as a continuous
covariate.
Pickles and Crouchley [88] used a positive stable law (PSL) distri-
bution for the frailty. In this exercise, we repeat their analysis using a
lognormal frailty distribution which is equivalent to assuming a normal
distribution for the random effects. Pickles and Crouchley [88] treated
the times to onset of angina as continuous responses. Here, we have
converted the durations from continuous time into discrete time.
The dataset has been expanded in such a way that each second of
exercise taken is regarded as a discrete interval of time. Exercise time (in
seconds) is represented by the variable t. A total of 11 of the 63 exercise
times were censored due to patient fatigue. This censoring mechanism
was assumed to be independent of frailty (residual heterogeneity) as
represented by the random effects.
The variable y takes value 1 if a duration is censored by fatigue,
and takes value 0 otherwise. In this exercise, we use renewal models to
explore whether the impact of dose declines with time since treatment,
and whether the duration effects also change with time since treatment.
We are going to estimate various Weibull survival models on the renewal
data by using (logt) as a covariate with the complementary log log link:
1. Use SabreR to fit the homogeneous common baseline hazard
model; in other words, a model with the same constant for each
exercise time, the same parameter for logt, but with different coef-
ficients on dose for the two treatment times. Use interactions with
the t2 and t3 dummy variables to set this model up. Obtain the log
likelihood, parameter estimates and standard errors. These results
are given as the first part (homogeneous model) of the output that
is obtained by estimating the random effects model. These model
results can also can be obtained from sabreR by setting mass=1.
There is no need to include dose in the linear predictor for the
model of pre-treatment data.
2. The second model allows for a different baseline hazard for each ex-
ercise session. Interact the t2 and t3 dummy variables with logt,
add both the interaction effects and the t2 and t3 indicator vari-
ables to the model. Obtain the log likelihood, parameter estimates
and standard errors. These results are given as the first part (ho-
mogeneous model) of the output that is obtained by estimating
Models for duration and event history data 155
the random effects model. These model results can also can be
obtained from sabreR by setting mass=1. Can the model be sim-
plified? What does this result tell you?
3. Add a subject-specific random effect (id) to the renewal model.
Use adaptive quadrature with mass 24. How do the effects of logt
and dose change, relative to the models estimated previously?
4. What is your preferred model and why?
Exercise 12.5.2. German unemployment (competing risk data)
In this exercise, we examine spells of unemployment (measured in
months) for 500 individuals taken from the German Socio Economic
Panel, SOEP. The data were used by Kauermann and Khomski [67, 68]
to illustrate the application of a competing risk model. An individual
may leave a period of unemployment for one of two destinations: full-
time employment and part-time employment. These two possible exit
strategies were regarded as the competing risks.
Kauermann and Khomski [67, 68] treated the spells as continuous
responses. Here, we convert spells into discrete time, using month as
the interval of measurement. Spells lasting more than 36 months have
been censored at 36 months. Each individual will contribute up to 36
observations in the dataset (unemployedR.tab ), clustered according to
the person identifier (id):
1. Use SabreR to estimate a Weibull (logt) model, without random
effects, for the r1=1 (full-time job) and r2=1 (part-time job) exits
from unemployment. Use the explanatory variables: nationality
(1: German, 2: foreign), gender (1: male, 2: female), age (1: 25 or
younger, 2: aged 26 to 50, 3: older than 50), training (1: profes-
sional; 2: otherwise), university (1: no degree, 2: degree). Obtain
the log likelihood, parameter estimates and standard errors. These
results are given as the first part (homogeneous model) of the out-
put that is obtained by estimating the random effects model, which
is the next task to perform.
2. Re-estimate this model, but allow each exit type to have an inde-
pendent random effect for each failure type. Use 32 point adaptive
quadrature. Hint: use a bivariate model, but set rho=0. What do
the results tell you?
3. Re-estimate this model, but allow for the correlation between the
random effects of each failure type. How do the results change?
4. What is your preferred model and why?
This page intentionally left blank
13
Stayers, non-susceptibles and endpoints
13.1 Introduction
There are several empirical contexts in which a subset of the population
might behave differently to those individuals whose behaviour is char-
acterized by the proposed mixed generalized linear model. For instance,
in a migration study, we could observe a group of respondents who do
not move outside the study region over the study period. These observed
non-migrators could comprise two distinct groups. The first group could
consist of those individuals that considered migrating, but who were not
observed to do so during the study period. The second group could com-
prise those that would never consider migrating. These non-migrators
are known as stayers. In the competing risk context, some individuals
may be classified as stayers in the sense that they are not vulnerable to
one particular exit condition. For example, few unemployed males are
likely to seek part-time work. In biometric research, these ‘stayers’ are
often referred to as non-susceptibles.
13.2 Mover-stayer model
The phenomenon of stayers may be handled through the mover-stayer
model [45], to which we return later in the section. Distributions of
count data with an abnormally large number of cases at zero may be
analyzed using a zero-inflated Poisson Model [46, 71] which comprises
a standard Poisson model supplemented by a spike of probability at
zero. In a similar manner, the goodness-of-fit of mixed generalized linear
models can be improved by adding a spike to the parametric distribution
for the random effects in order to represent the stayers. This explicitly
results in a ‘spiked distribution’ [95].
Non-parametric representations of the random effects (or mixing)
distribution, for example, those specified by Heckman and Singer [58]
157
158 Multivariate Generalized Linear Mixed Models Using R
and Davies and Crouchley [35], can have the flexibility to accommodate
stayers. However, non-parametric mixing distributions can require many
parameters (mass point locations and probabilities), while spiked distri-
butions are generally more parsimonious. SabreR assumes a Gaussian or
normal probability distribution for the random effects, with mean zero
and standard deviation to be estimated from the data; see Figure 13.1.
FIGURE 13.1
The normal distribution
This mixing distribution may be approximated by a number of mass
(or quadrature) points with specified probabilities at given locations.
This approximation is illustrated by the solid vertical lines in Figure
13.2. Increasing the number of quadrature points increases the accuracy
of the computation at the expense of computer time.
FIGURE 13.2
Quadrature points approximating the normal distribution
The Gaussian mixing distribution may tend to zero too quickly at
the extremes. In order to compensate for this limitation, SabreR has
Stayers, non-susceptibles and endpoints 159
the flexibility to supplement the quadrature points with endpoints, that
is, delta functions at plus and/or minus infinity, as indicated in Figure
13.3. The quadrature probabilities can be estimated from the data. This
flexibility may be needed when modelling binary data.
FIGURE 13.3
Quadrature with left and right endpoints
With the Poisson model, a single left endpoint may be added at minus
infinity, as shown in Figure 13.4, in order to allow for extra zeros.
FIGURE 13.4
Quadrature with left endpoint only
160 Multivariate Generalized Linear Mixed Models Using R
13.3 Likelihood incorporating the mover-stayer
model
To allow for stayers in two-level generalized linear models, we need to ex-
tend the notation established in Chapter 9. Let the two types of ‘stayer’
be denoted by Sr and Sl for the right spike (at plus infinity) and the left
spike (at minus infinity) spikes and let the probability of these events be
Pr [Sr ] and Pr [Sl ] respectively. In a two-level binary response
P model, let
Tj be the length of the observed sequence, and Σj = i yij , where yij is
the binary response of individual j at occasion i. Let Sl = [0, 0, ..., 0] rep-
resent a sequence without any moves from state 0, and let Sr = [1, 1, ..., 1]
represent a sequence without any moves from state 1. The likelihood of
the mixed binary response model with endpoints takes the form
L γ, φ, σ 2u0 |y, x, z =
Tj Tj
Y Pr [Sl ] Πi=1 (1 −Zyij ) + Pr [Sr ] Πi=1 yij +
Q ,
(1 − Pr [Sl ] − Pr [Sr ]) g (yij | θij , φ) f (u0j ) du0j
j i
where:
g (yij | θij , φ) = exp {[yij θij − b (θij )] /φ + c (yij , φ)} ,
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
and: !
1 u20j
f (u0j ) = √ exp − 2 ,
2πσ u0 2σ u0
as specified in Chapter 9. We parameterize Pr [Sl ] and Pr [Sr ] as:
l
Pr [Sl ] =
1+l+r
r
Pr [Sr ] = ,
1+l+r
where l, r > 0.
In a zero-inflated Poisson model, Sl = [0, 0, ..., 0] represents a se-
quence with zero counts at every point. There is no Sr , so that r =
Pr [Sr ] = 0. The above likelihood simplifies to:
Tj
Y PrZ[Sl ] Πi=1 (1 − yij )+
L γ, φ, σ 2u0 |y, x, z = Q .
(1 − Pr [Sl ]) g (yij | θij , φ) f (u0j ) du0j
j i
Stayers, non-susceptibles and endpoints 161
The binary and Poisson models can be extended in two ways. First,
in order to allow for between-individual (j) variation in the probability
of being a stayer, we can make Pr [Sl ] (and Pr [Sr ]) a function of time-
constant explanatory variables and write Pr [Slj ] (and Pr [Srj ]). Second,
in order to allow Pr [Sl ] to vary over response occasions (i), as well as
between individuals (j), we can write Pr [Slij ]. However, these extensions
have not yet been implemented in SabreR.
13.4 Example 1: stayers within count data
Example 13.4.1 Migration moves
The examples in this section and the following section are concerned
with individuals’ migration histories within Great Britain, where migra-
tion is defined as a residential move between two counties. The data we
use are derived from a large retrospective survey of life and work histories
carried out in 1986 under the ESRC-funded Social Change and Economic
Life Initiative, SCELI [86]. The data were not collected specifically for
the study of migration, but were drawn from an existing dataset which
includes information on where individuals had lived all their working
lives. Temporary moves of a few months’ duration do not imply commit-
ment to a new area and are not regarded as migration. Migration data
are therefore recorded on an annual basis.
The respondents were aged between 20 and 60 years of age, and lived
in the travel-to-work area of Rochdale, just to the north of Manchester,
UK. Rochdale was one of six localities chosen for their contrasting expe-
rience of recent economic change. As the analysis is concerned with in-
ternal migration within Great Britain, individuals who had lived abroad
during their working lives are excluded from the dataset. For simplic-
ity, we ignore the complications due to differential pushes and pulls of
different regions in the following models of migration behaviour.
For each individual, we have summed the number of annual migra-
tions recorded in the survey, to produce one line of information compris-
ing the following variables: n: number of annual migrations since leaving
school; t: number of years since leaving school; ed: educational qualifi-
cation which is a factor comprising 5 categories: 1: Degree or equivalent,
professional qualifications with a degree; 2: Education above A-level but
below degree level, which includes professional qualifications without
a degree; 3: A-level or equivalent; 4: Other educational qualification; 5:
None. This information is contained in the file rochmigx.tab. Table 13.1
summarizes the observed migration frequencies for the 348 respondents
162 Multivariate Generalized Linear Mixed Models Using R
Number of moves 0 1 2 3 4 5 >=6
Observed frequency 228 34 42 17 9 8 10
TABLE 13.1
Observed migration frequencies
in the sample. As the individuals ranged in age from 20 to 60 years, they
have varying lengths of migration history.
To model heterogeneity in migration propensity due to unmeasured
and unmeasurable factors, we use a mixed Poisson model. To see if there
is an inflated number of zeros in the count data, we allow for the left
endpoint (Sl = [0]). We create the logt variable (the offset) and reverse
the coding of education, then estimate random effects models with and
without the left endpoint (both with adaptive quadrature). The SabreR
command required to fit the model without the left endpoint is
sabre.model.1 <- sabre(n~offset(log(t))+
factor(reved)+1,case=case,
first.family="poisson",adaptive.quad=TRUE)
This produces the following output:
Log likelihood = -418.32707
on 342 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_________________________________________________
cons -3.8812 0.20656 -18.790
fed( 1) 0.0000 ALIASED [I]
fed( 2) 0.19636 0.24059 0.81616
fed( 3) -0.64142E-01 0.41699 -0.15382
fed( 4) 0.56425 0.40588 1.3902
fed( 5) 0.54582 0.34587 1.5781
scale 1.2126 0.12432 9.7538
The SabreR command required to fit the model without the left
endpoint is:
sabre.model.2 <- sabre(n~offset(log(t))+
factor(reved)+1,case=case,
Stayers, non-susceptibles and endpoints 163
first.family="poisson",left.end.point=0,
adaptive.quad=TRUE)
The argument left.end.point=0 tells SabreR that the starting value
for the estimator of the left endpoint is zero. This command results in
the output:
Log likelihood = -404.75236
on 341 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
______________________________________________________
cons -2.7888 0.16431 -16.973
fed( 1) 0.0000 ALIASED [I]
fed( 2) 0.44623 0.18477 2.4150
fed( 3) -0.28819E-01 0.32214 -0.89459E-01
fed( 4) 0.68724 0.31777 2.1627
fed( 5) 0.34412 0.26081 1.3194
scale 0.45755 0.12292 3.7225
PROBABILITY
___________
endpoint 0 0.91270 0.16824 0.47718
The random effects model with an endpoint has an improved log
likelihood (−404.75236), when compared to that of the random effects
model without an endpoint (−418.32707). In this case, the difference in
log likelihoods is not chi-square distributed, as under the null hypothesis,
the Pr(Sl = [0]) is on the edge of the parameter space. However, we can
say that the probability that a randomly sampled individual is a stayer
is estimated to be 0.47718.
The mixed Poisson model with an endpoint suggests that: (i) edu-
cational qualifications do significantly affect the likelihood of migration;
(ii) there is evidence that the probability of migration varies markedly
between individuals; (iii) the sample contains a highly significant propor-
tion of ‘stayers’. With a single count of the number of annual migrations
over an individual’s working life, we cannot distinguish between two sce-
narios. The first is a heterogeneous population, with some individuals
having a consistently high propensity to migrate and other individuals
having a consistently low propensity to migrate. The second scenario is
a truly contagious process, in other words, one in which an individual’s
experience of migration increases the probability of subsequent migra-
tion. The Poisson model assumes that the intervals between events are
164 Multivariate Generalized Linear Mixed Models Using R
exponentially distributed, that is, they do not depend on duration of
stay at a location. To examine this issue, we incorporate duration into
the analysis of the next example.
13.5 Example 2: stayers within binary data
Example 13.5.1 Migration moves
In this section, we use the dataset rochmig.tab which comprises
6,349 observations on 348 individuals. We model the individual binary
response (move) of whether (1) or not (0) there was a migration move in
each calendar year (year). We wish to relate the binary response to the
explanatory variables: age: age in years; dur: duration of stay at each
address. We start by transforming age and by producing up to the sixth
power of this transformed age effect (stage, stage2,..., stage6). We
use the transformation stage=(age-30)/10 in order to avoid overflow
in the calculations. We estimate a binary response model (with probit
link) using adaptive quadrature with 12 mass points and then add lower
and upper endpoints to the model.
The SabreR command required to fit the binary response model with-
out endpoints is:
sabre.model.1 <- sabre(move~log(dur)+year+
stage+stage2+stage3+stage4+stage5+
stage6+1,case=case,first.link="probit",
adaptive.quad=TRUE)
This leads to the following output:
Log likelihood = -1071.2854
on 6339 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
________________________________________________
cons 0.42571 0.38433 1.1076
logdur -0.34513 0.54732E-01 -6.3058
year -0.24286E-01 0.50955E-02 -4.7661
stage 0.25612E-01 0.16432 0.15587
stage2 0.31476E-01 0.27323 0.11520
stage3 -0.37013 0.25653 -1.4428
Stayers, non-susceptibles and endpoints 165
stage4 0.14482 0.24645 0.58765
stage5 0.26736 0.10005 2.6723
stage6 -0.12638 0.67015E-01 -1.8858
scale 0.46939 0.80817E-01 5.8081
The SabreR command required to fit the binary response model with
lower and upper endpoints is:
sabre.model.2 <- sabre(move~log(dur)+year+
stage+stage2+stage3+stage4+stage5+stage6+1,
case=case,first.link="probit",
adaptive.quad=TRUE,left.end.point=0,
right.end.point=0)
This produces the output:
Log likelihood = -1067.3881
on 6337 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
________________________________________________
cons 0.38177 0.39562 0.96498
logdur -0.34093 0.54231E-01 -6.2867
year -0.19718E-01 0.56105E-02 -3.5146
stage -0.23967E-01 0.16595 -0.14442
stage2 0.56781E-01 0.27440 0.20692
stage3 -0.37510 0.26055 -1.4397
stage4 0.13499 0.24929 0.54151
stage5 0.26857 0.10162 2.6430
stage6 -0.12634 0.68092E-01 -1.8554
scale 0.24113 0.97331E-01 2.4774
PROBABILITY
___________
endpoint 0 0.53247 0.20929 0.34681
endpoint 1 0.28838E-02 0.45007E-02 0.18783E-02
By adding both endpoints to the binary response model, the log like-
lihood increases from −1071.2854 to −1067.3881. The chi-square test is
not strictly valid, as under the null hypothesis of no endpoints, the end-
point parameters lie on the edge of the parameter space. However, this
change suggests that endpoints are needed. The probability of 0.34681
166 Multivariate Generalized Linear Mixed Models Using R
associated with the left endpoint gives a measure of the proportion of
‘stayers’ in the population, in other words, those individuals who are
never likely to migrate. Examination of the parameter estimate and
standard error of the right endpoint (and corresponding probability of
0.0018783) suggests that this parameter is not significantly different from
zero, indicating that the proportion of the population migrating every
year could be set to zero.
The coefficient estimate of logdur (log duration) is negative. The co-
efficient of logdur measures cumulative inertia effects, and its value con-
firms that there is an increasing disinclination to move with increasing
length of residence. Inference about duration effects can be misleading
unless there is control for omitted variables [56].
The random effects are significant in the binary response model with
endpoints: the scale parameter equals 0.24113 (s.e. 0.097331). We could
improve our model of migration by adding explanatory variables which
measure life cycle factors, such as marriage, occupation and employment
status and the presence of children in the family.
13.6 Exercises: stayers
Exercise 13.6.1. Trade union membership (stayers within binary
data)
As part of the National Longitudinal Survey of Youth, 4,132 young
women aged 14–26 in 1968 were followed for 21 years, excluding the
following years: 1974, 1976, 1979, 1981, 1984 and 1986. The set of event
history data used in this book (nls.tab ) contains 18,995 observations
on 20 variables.
We estimate a mixed binary probit model to relate trade union mem-
bership (union) to a set of explanatory variables including age, race (1:
white, 2: black, 3: other), marital status: msp (1: respondent married
and spouse present), nev mar (1: not yet married, 0: otherwise), edu-
cation: current grade completed and collgrad (1: college graduate, 0:
otherwise) and area lived: not sma (1: not SMA, standard metropolitan
statistical area, 0: otherwise), c city (1: central city, 0: otherwise) and
south (1: South, 0: otherwise).
We also wish to test whether there are significant proportions of the
young women who were always, or almost always, trade union members
(stayers in state 1) and who were never, or almost never, trade union
members (stayers in state 0). In order to do this, incorporate the mover-
Stayers, non-susceptibles and endpoints 167
stayer model into the modelling framework by supplementing the normal
mixing distribution with endpoints at plus and minus infinity:
1. Use SabreR to fit a binary response model for the response variable
union, with the explanatory variables: age, age2, black, msp,
grade, not smsa, south, cons. Use a probit link with adaptive
quadrature and 36 mass points.
2. Re-estimate this model, but allow for both lower and upper end-
points. How much of an improvement in log likelihood do you ob-
tain by incorporating endpoints into the model? Can the model
be simplified? How do you interpret the results of your preferred
model?
Exercise 13.6.2. Female employment participation (stayers within bi-
nary data)
Ben-Porath [15] observed that cross-sectional studies were ambigu-
ous with regard to some important dynamic characteristics of labour
force participation. This work led Heckman and Willis [59] to use the
University of Michigan Panel Study of Income Dynamics (PSID) 1968–
1972 [80] to investigate the variation in labour force participation rates
among married women.
PSID provided Heckman and Willis [59] with employment participa-
tion data on white women who were continuously married to the same
husband during the five-year period 1967–1971. A woman was defined
as having participated in the labour force in the appropriate year if her
husband answered yes to the question: ‘Did your wife do any work for
money last year?’
The dataset (labour.tab ) comprises three variables: a female iden-
tifier (case), year (t) and the binary response variable (y) which takes
value 1 if a female has participated in the labour market in year t, and
takes value 0 otherwise [36]:
1. Use SabreR to fit a heterogeneous logit model to the response
y, and allow for non-stationarity by treating t as a factor. Use
adaptive quadrature with first.mass=64.
2. Test whether there are significant proportions of women who al-
ways, or almost always, participate in the labour market (stayers
in state 1) and of women who never, or almost never, participate in
the labour market (stayers in state 0) by adding upper and lower
endpoints to the logit model. How much of an improvement in
log likelihood do you obtain by incorporating endpoints into the
model? How do you interpret your results?
168 Multivariate Generalized Linear Mixed Models Using R
Exercise 13.6.3. Number of fish caught by visitors to a US National
Park (stayers in count data)
The dataset used in this exercise (fish.tab ) contains information
on the number of fish caught by parties of visitors to a US National Park.
However, we are unable to distinguish between those groups of visitors
that fished and those parties that did not fish. So, we might expect the
dataset to include a statistically significant proportion of zero counts
comprising those parties that did not fish and those groups that did fish
but were unsuccessful. We will investigate whether a lower endpoint is
present in a Poisson random effects model for the number of fish caught:
1. Use SabreR to estimate a Poisson model for the response variable
count, with the explanatory variables: persons (number of people
in the party), livebait (1: livebait was used, 0: otherwise) and
cons. Use adaptive quadrature with mass 36.
2. Re-estimate this model, but allow for a lower endpoint. How much
of an improvement in log likelihood do you obtain with the model
which incorporates an endpoint? What happens to your inference
on the explanatory variables?
3. How would you interpret the results of your preferred model?
14
Handling initial conditions/state
dependence in binary data
14.1 Introduction to key issues: heterogeneity, state
dependence and non-stationarity
Longitudinal and panel data on recurrent events are substantively im-
portant in social science research for two reasons. First, they provide
some scope for extending control for variables that have been omitted
from the analysis. For example, differencing provides a simple way of
removing time-constant effects (both omitted and observed) from the
analysis. Second, social science theory postulates that behaviour and
outcomes are typically influenced by previous behaviour and outcomes;
in other words, there is positive ‘feedback’, as in, for example, the ‘axiom
of cumulative inertia’ [78]. A frequently noted empirical regularity in the
analysis of unemployment data is that those who were unemployed in
the past (or have worked in the past) are more likely to be unemployed
(or working) in the future [54].
Heckman asks whether this is due to a causal effect of being unem-
ployed (or working) or whether it is a manifestation of a stable trait.
These two issues are related because inference about feedback effects
are particularly prone to bias if the additional variation due to omitted
variables (stable trait) is ignored. With dependence upon previous out-
come, the explanatory variables representing the previous outcome will,
for structural reasons, tend to be correlated with omitted explanatory
variables and therefore will always be subject to bias using conventional
modelling methods. Understanding of this generic substantive issue dates
back to the study of accident proneness by Bates and Neyman [14] and
has been discussed in many applied areas, including consumer behaviour
[75] and voting behaviour [34].
An important attraction of longitudinal data is that, in principle,
they make it possible to distinguish between a number of different types
of effect: state dependence, heterogeneity and non-stationarity. A key
type of causality, state dependence, is the dependence of current be-
169
170 Multivariate Generalized Linear Mixed Models Using R
haviour on earlier or related outcomes. The confounding effects of un-
observed heterogeneity (or omitted variables) and non-stationarity are
the changes in the scale and relative importance of the systematic rela-
tionships over time. Large sample sizes reduce the problems created by
local maxima in disentangling the heterogeneity, state dependence and
non-stationarity effects.
Most observational schemes for collecting panel and other longitu-
dinal data commence with the process already under way. They will
therefore tend to have an informative start; the initial observed response
is typically dependent upon pre-sample outcomes and unobserved vari-
ables. In contrast to time series analysis and, as explained by Anderson
and Hsiao [8], Heckman [52, 53], Bhargava and Sargan [17] and others,
failure to allow for this informative start when state dependence and
heterogeneity are present will prejudice consistent parameter estimation.
Various treatments of the initial conditions problem for recurrent events
subject to state dependence using random effects to handle heterogeneity
have been proposed; see, for example, [6, 30, 69, 99, 107]. We will concen-
trate on first-order models for state dependence in linear, binary and
count response sequences.
14.2 Example
Example 14.2.1 Depression data
Data were collected in a one-year panel study of depression and help-
seeking behaviour in Los Angeles [81]. Adults were interviewed during
the spring and summer of 1979 and re-interviewed at three monthly
intervals. A respondent was classified as being depressed if they scored
more than 16 on a 20-item list of symptoms. The primary response of
interest is whether a respondent was classified as being depressed (1)
or not (0). This response is recorded on four different occasions. The
profile of each respondent is represented by a sequence of 0s and 1s.
Sixteen different profiles are possible. These profiles are listed in Table
14.1, along with their respective numbers of respondents.
Morgan et al. [81] concluded that there is strong temporal depen-
dence in this binary depression measure and that the dependence is
consistent with a mover-stayer process in which depression is a station-
ary, Bernoulli process for an ‘at risk’ subset of the population. Davies
and Crouchley [35] showed that a more general mixed Bernoulli model
provides a significantly better fit to the data. However, by its very na-
Handling initial conditions/state dependence in binary data 171
Season (i)
y1j y2j y3j y4j Frequency
0 0 0 0 487
0 0 0 1 35
0 0 1 0 27
0 0 1 1 6
0 1 0 0 39
0 1 0 1 11
0 1 1 0 9
0 1 1 1 7
1 0 0 0 50
1 0 0 1 11
1 0 1 0 9
1 0 1 1 9
1 1 0 0 16
1 1 0 1 9
1 1 1 0 8
1 1 1 1 19
TABLE 14.1
Depression data (1 = depressed, 0 = not depressed)
ture, depression is difficult to overcome suggesting that state dependence
might explain at least some of the observed temporal dependence, al-
though it remains an empirical issue whether true contagion extends
over three months. We might also expect seasonal effects due to the
weather. In other words, what is the relative importance of (first-order
Markov) state dependence, non-stationarity (seasonal effects) and unob-
served heterogeneity (differences between the subjects) in the depression
data? In the rest of this chapter, we present a range of models designed
to address this question.
14.3 Random effects models
In two-level generalized linear models, the subject-specific unobserved
random effects u0j are integrated out of the joint distribution for the
responses to obtain the likelihood function. Thus:
+∞ T
YZ Y
L γ, φ, σ 2u0 |y, x, z = g (yij | θ ij , φ) f (u0j | x, z) du0j ,
j −∞ i=1
172 Multivariate Generalized Linear Mixed Models Using R
where we have extended the notation of f (u0j | x, z) to acknowledge the
possibility that the random effects (u0j ) depend on the regressors (x, z).
For notational simplicity, we have assumed that all the sequences are of
the same length (T ), which is indeed the case in the depression example,
though this can be relaxed easily by replacing T with Tj in the likelihood
function.
To account for state dependence (specifically, first-order Markov ef-
fects), we need to augment our standard notation further. We do this by
adding the previous response (yi−1j ) to the linear predictor of the model
for yij so that:
P
X Q
X
θ ij = γ 00 + γ p0 xpij + γ 0q zqj + δyi−1j + u0j , i = 2, ..., T,
p=1 q=1
where δ is the new parameter associated with first-order Markov state
dependence. In the depression example, the probability of an adult being
depressed in the current month is allowed to depend on whether they
were depressed in the previous month. We acknowledge explicitly this
change to the generalized linear model by writing the response model as
g (yij | yi−1j , θij , φ).
14.4 Initial conditions problem
This treatment of state dependence can be appropriate when modelling
an ongoing response. However, this treatment begs the question: what
do we do about the first observation? In panel data, the period of study
usually samples an ongoing process and the information collected on the
initial observation rarely contains all of the pre-sample response sequence
and its determinants back to inception. In the depression example, how
do we handle the history of depression (or otherwise) of the adults prior
to the start of the study? This is known as the initial conditions problem.
The implications of this problem will be explored in detail in the rest of
this chapter.
Handling initial conditions/state dependence in binary data 173
14.5 Initial treatment
For the moment, we will write the response model for the initial observed
response y1j as g y1j | θ1j , φ1 to allow the parameters and the random
effects for the initial response to be different to those of subsequent
responses, so that:
L γ 1 , γ, δ, φ1 , φ, σ 2u0 |y, x, z =
+∞
YZ T
Y
g y1j | θ 1j , φ1 g (yij | yi−1j , θij , φ) f (u0j | x, z) du0j .
j −∞ i=2
To all the responses, that is, both initial response and subsequent
responses:
yj = [y1j, y2j , ..., yT j ] ,
we can relate time-varying explanatory variables:
xj = [x1j , x2j,..., xT j ] ,
and time-constant explanatory variables:
zj = [zj ] .
In particular, for the initial response:
P
X Q
X
θ 1j = γ 100 + γ 1p0 xp1j + γ 10q zqj + u0j .
p=1 q=1
In the above likelihood, we have the same random effect (u0j ) for
both the initial response and subsequent responses. This assumption will
be relaxed later in the chapter. If we omit the first term on the right-
hand side of the above likelihood function, then we are conditioning on
the initial response. The data window interrupts an ongoing process,
whereby the initial observation y1j will, in part, be determined by u0j ,
and this simplification may induce inferential error.
This problem was examined by Anderson and Hsiao [8] for the linear
model. They compared ordinary least squares, generalized least squares
and maximum likelihood estimation for a number of different scenar-
ios. They concluded that maximum likelihood estimation has desirable
asymptotic properties when time T or sample size N (or both) → ∞. In
174 Multivariate Generalized Linear Mixed Models Using R
conventional panel studies, T is often fixed and small. For random (that
is, endogenous) y1j , only maximum likelihood provides consistent param-
eter estimation but this requires the inclusion of the term g y1j | θ1j , φ1
in the likelihood.
Specification of this density is itself problematic for non-linear mod-
els, as emphasized by Diggle [37]. Heckman [53] suggested using an
approximate formulation including whatever explanatory variables are
available. Various treatments of the initial conditions problem for re-
current events with state dependence using random effects to adjust for
heterogeneity have been proposed; see, for example, [6, 30, 69, 99, 107].
We will review the alternative treatments of the initial conditions
problem, and illustrate them on the binary depression data, in the fol-
lowing sections.
14.6 Example: depression data
We will estimate a range of first-order models on the ungrouped de-
pression data of Morgan et al. [81]. We have two forms of the data.
First, there is the full dataset (depression.tab) with four responses
(rows) for each individual, lagged response variables and season indi-
cators. This dataset comprises 3,008 observations on 752 individuals.
Second, we have the conditional dataset (depression2.tab) which is
the same as the full dataset, but without the row of data corresponding
to the first response of each individual. This dataset consists of 2,256
observations on the same number of individuals. An example of how
to create the two datasets for these first-order models is presented in
Appendix B, Sub-section B.2.3.
14.7 Classical conditional analysis
If we omit the model for the initial response from the likelihood, we get:
+∞ T
YZ Y
L c
γ, δ, φ, σ 2u0 |y, x, z = g (yij | yi−1j , θ ij , φ) f (u0j | x, z) du0j .
j −∞ i=2
For the subsequent responses:
yj = [y2j , y3j , ..., yT j ] ,
Handling initial conditions/state dependence in binary data 175
we have included the lagged response in the time-varying explanatory
variables:
xij = [xij , yi−1j ] ,
zj = [zj ] .
We ignore any dependence of the random effects on the explanatory
variables:
f (u0j | x, z) = f (u0j ) .
The above likelihood simplifies to:
+∞ T
YZ Y
L c
γ, δ, φ, σ 2u0 |y, x, z = g (yij | yi−1j , θij , φ) f (u0j ) du0j ,
j −∞ i=2
with:
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + δyi−1j + u0j
p=1 q=1
for i = 2, 3, ..., T .
14.8 Classical conditional model: example
We can use SabreR to estimate a classical conditional model on the bi-
nary depression data depression2.tab . The SabreR command required
to fit this model is:
sabre.model.1 <- sabre(s~factor(t)+s_lag1+1,
case=ind,first.link="probit",first.mass=24)
This produces the following output:
Log likelihood = -831.56731
on 2251 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
_____________________________________________________
cons -1.2942 0.72379E-01 -17.881
t3 -0.15466 0.88638E-01 -1.7448
t4 -0.21480E-01 0.87270E-01 -0.24613
s_lag1 0.94558 0.13563 6.9718
scale 0.32841 0.18226 1.8019
176 Multivariate Generalized Linear Mixed Models Using R
Having fitted this model, we find that the coefficient on yi−1j
(s lag1) is 0.94558 (s.e. 0.13563), which is highly significant, but the
scale parameter (σ) is of marginal significance, suggesting a nearly
homogeneous first-order model. Can we trust this inference? In sub-
sequent sections, we consider alternative treatments of the initial condi-
tions problem.
14.9 Conditioning on initial response but allowing
random effect u0j to be dependent on zj
Wooldridge [107] proposed that we drop the term g y1j | θ1j , φ1 and
use the conditional likelihood:
+∞ T
YZ Y
L c
γ, δ, φ, σ 2u0 |y, x, z = g (yij | yi−1j , θ ij , φ) f (u0j | x, z) du0j ,
j −∞ i=2
where:
yj = [y2j , y3j , ..., yT j ] ,
zj = [zj , y1j ] ,
xij = [xij , yi−1j ] ,
but rather than assume u0j is independent and identically distributed,
that is, f (u0j | x, z) = f (u0j ) as in Section 14.7, we use:
f (u0j | x, z) = f (u0j | z) .
By allowing the omitted (random) effects to depend on the initial re-
sponse:
Q
X
u0j = κ00 + κ1 y1j + κ0q zqj + uw
0j ,
q=1
Handling initial conditions/state dependence in binary data 177
where uw
0j is independent and identically distributed, we get:
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + δyi−1j + u0j
p=1 q=1
P
X Q
X
= γ 00 + γ p0 xpij + γ 0q zqj + δyi−1j + κ00 + κ1 y1j
p=1 q=1
Q
X
+ κ0q zqj + uw
0j
q=1
P
X Q
X
= (γ 00 + κ00 ) + γ p0 xpij + (γ 0q + κ0q )zqj + δyi−1j
p=1 q=1
+ κ1 y1j + uw
0j
P
X Q
X
= γw
00 + γ p0 xpij + γw w
0q zqj + δyi−1j + κ1 y1j + u0j .
p=1 q=1
This implies that the coefficients on the constant (γ 00 + κ00 ) = γ w
00
and on the time-constant explanatory variables (γ 0q + κ0q ) = γ w 0q will
be confounded. The ability of the auxiliary model:
Q
X
u0j = κ00 + κ1 y1j + κ0q zqj + uw
0j
q=1
to account for the dependence in f (u0j | x, z) will depend to some ex-
tent on the nature of the response (yij ). For initial binary responses
(y1j ), only one parameter κ1 is needed, but for the Gaussian and Poisson
models, polynomials in y1j may be needed to account more fully for the
dependence. In addition, as Wooldridge [107] suggested, we can include
interaction effects between the y1j and zqj . Crouchley and Davies [29]
raise inferential issues about the inclusion of baseline responses (initial
conditions) in models without state dependence.
14.10 Wooldridge conditional model:
example
The Wooldridge [107] conditional model for the binary depression data
uses depression2.tab . The SabreR command required to fit this model
is:
178 Multivariate Generalized Linear Mixed Models Using R
sabre.model.2 <- sabre(s~factor(t)+s_lag1+
s1+1,case=ind,first.link="probit",
first.mass=24)
This produces the following output:
Log likelihood = -794.75310
on 2250 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
___________________________________________________
cons -1.6646 0.11654 -14.283
t3 -0.20988 0.99663E-01 -2.1059
t4 -0.88079E-01 0.97569E-01 -0.90274
s_lag1 0.43759E-01 0.15898 0.27525
s1 1.2873 0.19087 6.7445
scale 0.88018 0.12553 7.0117
This model has the lagged response (s lag1) estimate at 0.043759
(s.e. 0.15898), which is not significant, while the initial response (s1)
estimate 1.2873 (s.e. 0.19087) and the scale parameter estimate 0.88018
(s.e. 0.12553) are highly significant. There is also a highly significant
improvement in the log likelihood, over the model without s1, of:
−2(−831.56731 − (−794.75310)) = 73. 628
for one degree of freedom. This model has no time-constant explanatory
variables to be confounded by the auxiliary model and suggests that
depression is a zero-order process; in other words, the probability of
an individual being depressed in the current month does not depend
to a significant extent on whether that individual was depressed in the
previous month.
14.11 Modelling the initial conditions
There are several approximations that can be adopted: (1) use the same
random effect in the initial and subsequent responses, for example, [30];
(2) use a one-factor decomposition for the initial response and subse-
quent responses, for example, [52, 99]; (3) use different (but correlated)
Handling initial conditions/state dependence in binary data 179
random effects for the initial response and subsequent responses; (4) em-
bed the Wooldridge [107] approach in joint models for the initial response
and subsequent responses.
All the joint models for the binary depression responses use the data
depression.tab. This dataset has a constant for the initial response,
a constant for the subsequent responses, dummy variables for seasons 3
and 4 and the lagged response variable. In the rest of this chapter, we
discuss each of the above approximations in turn. For each approxima-
tion, we define the model before applying that model to the depression
data.
14.12 Same random effect in the initial response and
subsequent response models with a common
scale parameter
The likelihood for this model is:
L γ 1 , γ, δ, φ1 , φ, σ 2u0 |y, x, z =
+∞
YZ T
Y
g y1j | θ 1j , φ1 g (yij | yi−1j , θij , φ) f (u0j | x, z) du0j ,
j −∞ i=2
with the responses:
yj = [y1j, y2j , y3j , ..., yT j ] ,
time-varying explanatory variables:
xj = [x1j , x2,j,..., xT j ]
and time-constant explanatory variables
zj = [zj ] .
For the initial response:
P
X Q
X
θ 1j = γ 100 + γ 1p0 xp1j + γ 10q zqj + u0j ,
p=1 q=1
and for subsequent responses, we have:
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + δyi−1j + u0j , i = 2, ..., T.
p=1 q=1
180 Multivariate Generalized Linear Mixed Models Using R
To set up this model in SabreR, we combine the linear predictors by
using dummy variables so that, for all i:
θij = r1 θ 1j + r2 θij , i = 2, ..., T,
r1 = 1, if i = 1, 0 otherwise,
r2 = 1, if i > 1, 0 otherwise,
where, for all i:
var (u0j ) = σ 2u0 .
For the binary and Poisson models, we have φ = 1 in g (yij | yi−1j , θij , φ);
for the linear model, we have for the initial response:
φ = σ 2ε1
and for subsequent responses:
φ = σ 2ε .
14.13 Joint analysis with a common random effect:
example
The joint model with a common random effect has two components: (i)
the model for the initial response and (ii) the model for subsequent re-
sponses. The model component for the initial response (indicator r1=1)
only includes a constant term. The model component for the three sub-
sequent responses (indicator r2=1) has a constant, dummy variables
for seasons 3 (r2:t3) and 4 (r2:t4), and the lagged response variable
(r2:s lag1). The SabreR command required to fit this model to the
data depression.tab is:
sabre.model.3 <- sabre(s~r1+r2+r2:(t3+t4+s_lag1)-1,
case=ind,first.link="probit",first.mass=24)
This produces the following output:
Log likelihood = -1142.9749
on 3002 residual degrees of freedom
Handling initial conditions/state dependence in binary data 181
Parameter Estimate Std. Err. Z-score
____________________________________________________
r1 -1.3476 0.10026 -13.441
r2 -1.4708 0.92548E-01 -15.893
r2_t3 -0.20740 0.99001E-01 -2.0949
r2_t4 -0.85438E-01 0.97129E-01 -0.87964
r2_lag1 0.70228E-01 0.14048 0.49990
scale 1.0372 0.10552 9.8293
The non-significant coefficient of r2:s lag1 is 0.070228 (s.e.
0.14048), which suggests that there is no state dependence in these data,
while the highly significant scale coefficient of 1.0372 (s.e. 0.10552) sug-
gests heterogeneity.
14.14 Same random effect in models of the initial
response and subsequent responses but with
different scale parameters
This model can be derived from a one-factor decomposition of the ran-
dom effects for the initial observation and subsequent observations; for
its use in this context, see [52, 99]. The likelihood for this model:
L γ 1 , γ, δ, φ1 , φ, σ 21u0 , σ 2u0 |y, x, z ,
is just like that for the common scale parameter model of Section 14.12,
with the same random effect for the initial response and subsequent
responses, except that, for i = 1, we have:
var (u0j ) = σ 21u0
and, for i > 1,
var (u0j ) = σ 2u0 .
In binary or linear models, the scale parameter for the initial response is
identified from the covariance between the initial response y1j and the
subsequent responses yij , i > 1. Stewart [99] uses a different parameter-
isation: for i = 1:
var (u0j ) = λσ 2u0
and, for i > 1:
var (u0j ) = σ 2u0 .
182 Multivariate Generalized Linear Mixed Models Using R
As in the common scale parameter model, we combine the linear
predictors by using dummy variables so that, for all i:
θij = r1 θ 1j + r2 θij , i = 2, ..., T,
r1 = 1, if i = 1, 0 otherwise,
r2 = 1, if i > 1, 0 otherwise.
14.15 Joint analysis with a common random effect
(different scale parameters): example
As in the common scale parameter model of Section 14.13, this joint
model for the data depression.tab has a constant for the initial re-
sponse, a constant for the subsequent responses, dummy variables for
seasons 3 and 4 and the lagged response variable. The SabreR command
required to fit this model is:
sabre.model.4 <- sabre(s[t==1]~1,
s[t>1]~factor(t[t>1])+s_lag1[t>1],
case=list(ind[t==1],ind[t>1]),
first.link="probit",first.mass=24,
depend=TRUE)
This produces the following output:
Log likelihood = -1142.9355
on 3001 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
___________________________________________________
r1 -1.3248 0.12492 -10.605
r2 -1.4846 0.10639 -13.954
r2_t3 -0.21020 0.10004 -2.1011
r2_t4 -0.87882E-01 0.98018E-01 -0.89658
r2_lag1 0.50254E-01 0.15792 0.31822
scale1 1.0021 0.15927 6.2917
scale2 1.0652 0.14587 7.3021
These results show that the state dependence regressor s lag1[t>1].2
has an estimate of 0.050254 (s.e. 0.15792), which is not significant. They
Handling initial conditions/state dependence in binary data 183
also indicate that the scale parameters σ 21u0 , σ 2u0 are nearly the same.
The log likelihood improvement in the model with two scale parameters,
over that of the previous model with one scale parameter, is:
−2(−1142.9749 − (−1142.9355)) = 0.0788,
for one degree of freedom. Thus, the model with one scale parameter is
to be preferred.
14.16 Different random effects in models of the ini-
tial response and subsequent responses
Now we allow the models for the initial response and the subsequent
responses to have different random effects. The likelihood for this model
is:
+∞ +∞
YZ Z
1
1
L γ , γ, δ, φ , φ, σ 2u0 , ρ|y, x, z = g y1j | θ 1j , φ1
j −∞ −∞
I
Y
× g (yij | yi−1j , θij , φ) f u10j , u20j | x, z du10j du20j ,
i=2
with the responses:
yj = [y1j, y2j , ..., yT j ] ,
time-varying explanatory variables:
xj = [x1j , x2j,..., xT j ]
and time-constant explanatory variables:
zj = [zj ] .
The main difference between this joint model and the previous model,
that is, the random effect model of Section 14.14, is the use of different
random effects for the initial response and subsequent responses. This
implies that we need a bivariate integral to form the marginal likelihood.
For the initial response:
P
X Q
X
θ 1j = γ 100 + γ 1p0 xp1j + γ 10q zqj + u10j ,
p=1 q=1
184 Multivariate Generalized Linear Mixed Models Using R
and, for subsequent responses, we have:
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + δyi−1j + u20j , , i = 2, ..., T.
p=1 q=1
The correlation between the random effects u10j , u20j has parameter
ρ, which is identified from the covariance between y1j and the yij , i >
1. The scale parameter for the initial response is not identified in the
presence of ρ in the binary or linear models, so in these models we hold
it at the same value as that of the subsequent responses. As in all joint
models, to set up this model in SabreR, we combine the linear predictors
by using dummy variables so that, for all i:
θij = r1 θ 1j + r2 θij , i = 2, ..., T,
r1 = 1, if i = 1, 0 otherwise,
r2 = 1, if i > 1, 0 otherwise.
For the binary and Poisson models, we have φ = 1 in g (yij | yi−1j , θij , φ).
For the linear model, we still have for the initial response:
φ = σ 2ε1
and for subsequent responses:
φ = σ 2ε .
14.17 Different random effects: example
As in the single random effect models, this joint model for the binary
depression data depression.tab has a constant for the initial response,
a constant for the subsequent responses, dummy variables for seasons 3
and 4 and the lagged response variable. The SabreR command required
to fit this model is:
sabre.model.5 <- sabre(s[t==1]~1,
s[t>1]~factor(t[t>1])+s_lag1[t>1],
case=list(ind[t==1],ind[t>1]),
first.link="probit",second.link="probit",
first.mass=24,second.mass=24,equal.scale=TRUE,
only.first.derivatives=TRUE)
Handling initial conditions/state dependence in binary data 185
This produces the following output:
Log likelihood = -1142.9355
on 3001 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
______________________________________________________
r1 -1.3672 0.12386 -11.038
r2 -1.4846 0.10591 -14.018
r2_t3 -0.21020 0.10033 -2.0951
r2_t4 -0.87880E-01 0.97890E-01 -0.89774
r2_lag1 0.50260E-01 0.15945 0.31520
scale 1.0652 0.14363 7.4158
corr 0.97090 0.10088 9.6245
The log likelihood is exactly the same as that for the previous model.
The scale2 parameter from the previous model has the same value
as the scale parameter of the current model. The lagged response
s lag1[t>1].2 has an estimate of 0.050260 (s.e. 0.15945), which is
not significant. The correlation between the random effects (corr) is
estimated to be 0.97090 (s.e. 0.10088), which is very close to 1, suggest-
ing that the common random effects, zero-order, single scale parameter
model is to be preferred.
14.18 Embedding the Wooldridge approach in joint
models for the initial response and
subsequent responses
This extended model will help us to assess the value of the Wooldridge
[107] approach in an empirical context. We can include the initial re-
sponse in the linear predictors of the subsequent responses of any of
the joint models, but for simplicity we will use the single random effect,
single scale parameter model. The likelihood for this model is:
L γ 1 , γ, δ, φ1 , φ, σ 2u0 |y, x, zp =
+∞
YZ T
Y
g y1j | θ1j , φ1 g (yij | yi−1j , y1j, θij , φ) f (u0j | x, zp ) du0j ,
j −∞ i=2
186 Multivariate Generalized Linear Mixed Models Using R
with the responses:
yj = [y1j, y2j , y3j , ..., yT j ] ,
time-varying explanatory variables:
xj = [x1j , x2j,..., xT j ]
and time-constant explanatory variables:
zpj = [zj , y1j ] .
For the initial response:
P
X Q
X
θ1j = γ 100 + γ 1p0 xp1j + γ 10q zqj + u0j,
p=1 q=1
and, for subsequent responses, we have:
P
X Q
X
θ ij = γ 00 + γ p0 xpij + γ 0q zqj + δyi−1j + κ1 y1j + u0j , i = 2, ..., T,
p=1 q=1
as we have added κ1 y1j to the linear predictor.
As with joint models, we combine the linear predictors by using
dummy variables so that, for all i:
θij = r1 θ 1j + r2 θij , i = 2, ..., T,
r1 = 1, if i = 1, 0 otherwise,
r2 = 1, if i > 1, 0 otherwise.
where, for all i:
var (u0j ) = σ 2u0 .
For the binary and Poisson models, we have φ = 1 in g (yij | yi−1j , θij , φ);
for the linear model, we have for the initial response:
φ = σ 2ε1
and for subsequent responses:
φ = σ 2ε .
Handling initial conditions/state dependence in binary data 187
14.19 Joint model incorporating the Wooldridge ap-
proach: example
As in the single random effect models, this joint model for the binary
depression data depression.tab has a constant for the initial response,
a constant for the subsequent responses, dummy variables for seasons
3 and 4, the lagged response variable and the initial response variable.
The SabreR command required to fit this model is:
sabre.model.6 <- sabre(s~r1+r1+
r2:(t3+t4+s_lag1+s1)-1,case=ind,
first.link="probit",first.mass=24)
This produces the following output:
Log likelihood = -1142.9670
on 3001 residual degrees of freedom
Parameter Estimate Std. Err. Z-score
____________________________________________________
r1 -1.3632 0.16189 -8.4207
r2 -1.4741 0.97129E-01 -15.176
r2_t3 -0.20869 0.99797E-01 -2.0911
r2_t4 -0.86541E-01 0.97774E-01 -0.88511
r2_lag1 0.61490E-01 0.15683 0.39209
r2_base -0.33544E-01 0.26899 -0.12470
scale 1.0602 0.21274 4.9835
This joint model has the lagged response r2:s lag1 estimate of
0.061490 (s.e. 0.15683) and the baseline/initial response effect r2:s1
estimate of −0.033544 (s.e. 0.26899), both of which are non-significant.
The previous model, that is, the model with different random effects for
the initial response and the subsequent responses (see Sections 14.16 and
14.17), is an adequate descriptor of the data.
14.20 Other link functions
State dependence can also occur in Poisson and linear models. For lin-
ear model examples, see [12, 13]. These data concern the demand for
188 Multivariate Generalized Linear Mixed Models Using R
cigarettes per capita by state for 46 American States. We have found
first-order state dependence in the Poisson data of Hausman et al. [47]
and Hall et al. [48]. The data refer to the number of patents awarded to
346 firms each year from 1975 to 1979.
14.21 Exercises using models incorporating initial
conditions/state dependence in binary data
Exercise 14.21.1. Trade union membership of young males
The Vella and Verbeek [103] data were taken from the Youth Sample
of the US National Longitudinal Survey of Youth (NLSY) and consisted
of a sample of 545 full-time working males who had completed their
schooling by 1980 and who were then followed annually from 1980 to
1987. The years 1981 to 1987 are represented by dummy variables d81
to d87.
The binary response of primary interest, trade union membership,
was determined by the question of whether or not the sampled individual
had his wage set in a collective bargaining agreement or not. We have
used these data previously in this book, to illustrate the application of
the standard binary logistic model in Chapter 3 (see Exercise 3.4.2 ),
and of its random effects version in Chapter 6 (see Exercise 6.7.2 ).
In the current exercise, we employ these data to demonstrate how
the initial conditions problem may be addressed. Wooldridge [107] used
the data to illustrate his treatment of the problem. He used the time-
constant factors of educ (years of schooling) and race, black (whether
Black or not), and the time-varying explanatory variable of marital sta-
tus, married. Whether a respondent was married or not in the years
1981 to 1987 is represented by a series of indicator variables marr81 to
marr87.
In this exercise, you will apply a range of models to the datasets
unionjmw1.tab and unionjmw2.tab. The variables in the dataset
unionjmw2.tab are the same as those in unionjmw1.tab, with the ad-
dition of the variables: d: 1 for the initial response, 2 if a subsequent
response, d1: 1 if d=1, 0 otherwise, d2: 1 if d=2, 0 otherwise. In addi-
tion, the lagged response variable union 1 in the dataset unionjmw2.tab
takes the value −9 for a baseline observation:
Handling initial conditions/state dependence in binary data 189
1. Use unionjmw1.tab to estimate a mixed probit model (adaptive
quadrature, mass 24) of trade union membership (union), with a
constant term, the lagged union membership variable (union 1),
educ, black and the marital status dummy variable (married),
the marr81-marr87 and the d82-d87 sets of dummy variables.
2. Add the initial condition of trade union membership in 1980
(union80) to the previous model. How does the inference on the
lagged responses (union 1) and the scale parameters differ between
the two models?
3. Use unionjmw2.tab to estimate a common random effect, com-
mon scale parameter, joint probit model (adaptive quadrature,
mass 24) of trade union membership (union 1). Use the d1 and d2
dummy variables to set up the linear predictors. Use constants in
both linear predictors. For the initial response, use the explanatory
variables: married, educ and black. For the subsequent response,
use the following explanatory variables: lagged union membership
variable (union 1), educ, black and the marital status dummy
variable (married), the marr81-marr87 and the year dummy vari-
ables. What does this model suggest about state dependence and
unobserved heterogeneity?
4. Re-estimate the model, allowing the scale parameters for the ini-
tial and subsequent responses to be different. Is this a significant
improvement over the common scale parameter model?
5. To the different scale parameters model, add the baseline response
(union80). Does this improve the model significantly?
Exercise 14.21.2. Trade union membership of females
Stewart [99] used a different subset of NLSY data, on young women
from 1978 onwards. The data for 1983 were dropped and only those
individuals surveyed in each of the remaining six waves were kept. The
observations for 1985 and 1987 were treated as if they were for 1984
and 1986 respectively, thereby providing six waves of data at regular
two-year intervals. This gave a balanced panel with 799 young women
observed in each of six waves.
As in the previous exercise, the binary response of primary inter-
est, trade union membership (union), was determined by the question
of whether or not the sampled individual had her wage set in a collec-
tive bargaining agreement or not. We use the datasets unionred1.tab
and unionred2.tab to illustrate how to address the initial conditions
problem.
190 Multivariate Generalized Linear Mixed Models Using R
The variables in the dataset unionred2.tab are the same as those
in unionred1.tab, with the exception of the following variables: d, d1
and d2. In unionred1.tab, all the responses are post-baseline, so these
variables are constants: d = 2, d1 = 0, d2 = 1. In unionred2.tab,
these variables take the following values: d: 1 for the initial response,
2 for a subsequent response, d1: 1 if d=1, 0 otherwise, d2: 1 if d=2,
0 otherwise. In addition, the lagged response variable lagunion in the
dataset unionred2.tab takes the value −9 for a baseline observation:
1. Use unionred1.tab to estimate a mixed probit model (idcode
at level two, adaptive quadrature, mass 16) of trade union mem-
bership (union), with a constant term and the following explana-
tory variables: lagged union membership variable (lagunion), age,
grade (number of years of schooling completed), and southxt (1
if resident in South, 0 otherwise).
2. Add the initial condition of trade union membership in 1978
(baseunion: 1 if union=1 in 1978, 0 otherwise) to the previous
model. How does the inference on the lagged responses (lagunion)
and the scale effects differ between the two models?
3. Use unionred2.tab to estimate a common random effect, com-
mon scale, joint probit model (adaptive quadrature, mass 24) of
trade union membership (union). Use constant terms in both lin-
ear predictors. Use the d1 and d2 dummy variables to set up
the linear predictors. For the initial response, use the explana-
tory variables: age, grade, southxt and not smsa (1 if living out-
side a standard metropolitan statistical area, 0 otherwise). For the
subsequent response, use the explanatory variables: lagged union
membership variable (lagunion), age, grade and southxt. What
does this model suggest about state dependence and unobserved
heterogeneity?
4. Re-estimate this model allowing the scale parameters for the initial
and subsequent responses to be different (use adaptive quadrature
with mass 32). Is this a significant improvement over the common
scale parameter model?
5. Re-estimate this model using a bivariate specification for the ran-
dom effects (common scale). Are these results different to those of
the previous model?
6. To this bivariate model, add the initial or baseline response
(baseunion). Are these results different to those of the previous
model?
Handling initial conditions/state dependence in binary data 191
Exercise 14.21.3. Female UK labour force participation
As part of the ESRC-funded Social Change and Economic Life Ini-
tiative, SCELI [86], Davies, Elias and Penn [36] and Davies [33] analyzed
the annual employment behaviour of 144 wives from Rochdale, from the
date of their marriage to the end of the survey in 1987. The data are
stored in the files wemp-base1.tab and wemp-base2.tab.
The binary response femp takes the value 1 if a wife was employed in
the current year, and takes value 0 otherwise. The explanatory variables
include: age (wife’s age - 1975), her husband’s employment status, mune
(1 if the husband is in employment in the current year, 0 otherwise) and
whether or not the wife has children under the age of 5 (und5). In this
exercise, we will investigate whether we can distinguish between (first
order) state dependence in the employment behaviour of the wives, and
unobserved heterogeneity.
The variables in the dataset wemp-base2.tab are the same as those
in wemp-base1.tab, with the exception of the following variables: r, r1
and r2. In wemp-base1.tab, all the responses are post-first year obser-
vations, so these variables are constants: r = 2, r1 = 0 and r2 = 1.
In wemp-base2.tab, these variables take the following values: r: 1 for
the initial response, 2 for a subsequent response, r1: 1 if r=1, 0 other-
wise, r2: 1 if r=2, 0 otherwise. In addition, the lagged response variable
ylag in the dataset wemp-base2.tab takes the value −9 for a first-year
observation:
1. Use wemp-base1.tab to estimate a mixed logit model (case at
level two, adaptive quadrature, mass 12) of female employment
participation (femp), with a constant term and the following ex-
planatory variables: lagged female employment participation vari-
able (ylag), mune, und5 and age.
2. Add the initial condition of being employed in the first year (ybase)
to the previous model. How does the inference on the lagged re-
sponses (ylag) and the scale effects differ between the two models?
3. Use wemp-base2.tab to estimate a common random effect, com-
mon scale, joint logit model (adaptive quadrature, mass 12) of
female employment participation (femp). Use constant terms in
both linear predictors. Use the r1 and r2 dummy variables to
set up the linear predictors. For the initial response, use the ex-
planatory variables: mune, und5 and age. For the subsequent re-
sponses, use the explanatory variables: the lagged female employ-
ment participation variable (ylag), mune, und5 and age. What
does this model suggest about state dependence and unobserved
heterogeneity?
192 Multivariate Generalized Linear Mixed Models Using R
4. Re-estimate this model, allowing the scale parameters for the initial
and subsequent responses to be different.
5. In this model, replace the lagged female employment participation
variable (ylag) with the initial or baseline response (ybase). Are
these results different to those of the previous model?
6. In this model, include both the lagged response (ylag) and the
baseline response (ybase). Are these results different to those of
the previous model?
7. Re-estimate this model with the baseline response (ybase) and
the lagged response (ylag), using a bivariate specification for the
random effects (common scale).
8. Compare the results obtained for the various models on the ex-
planatory variables and role of employment status in the previ-
ous year. Are both state dependence and unobserved heterogeneity
present in these data?
Exercise 14.21.4. Patents and R&D expenditure
The data used in this exercise refer to the number of patents awarded
to a sample of firms each year from 1975 to 1979. Hall et al. [48] were
particularly interested in the effects of current and lagged research and
development (R&D) expenditures on the number of awarded patents.
The data on 336 firms that we use here (patents.tab) were made avail-
able by Cameron and Trivedi [22]. All spending figures are in 1972 US
dollars.
In patents.tab , the response pat is the number of patents applied
for, during the current year, that were eventually granted. The first-order
to fourth-order lagged responses are stored in the variables pat1 to pat4
respectively. The explanatory variables include: logr (the logarithm of
R&D spending), logk (the logarithm of the book value of the firm’s
capital value in 1972), scisect (1 for firms in the scientific sector, 0
otherwise) and indicator variables year3 to year5 representing the years
1977 to 1979 respectively. The dataset patents.tab also includes the
following variables: r: 1 if the current year is the baseline year (1975), 2
otherwise, r1: 1 if r=1, 0 otherwise, r2: 1 if r=2, 0 otherwise.
We will estimate several versions of the joint model of the initial
and subsequent responses. In order to do this, we need the explanatory
variables to have different parameter estimates in the model for the
initial conditions to those estimates we want to obtain in the model
for the subsequent responses. This means that we will need to create
interaction effects with the r1 and r2 indicators:
Handling initial conditions/state dependence in binary data 193
1. The first model to be estimated has a common random
effect for the baseline and subsequent responses but ex-
cludes the lagged response. Use the explanatory variables: r1,
r1 logr, r1 logk, r1 scisect for the baseline, and the explana-
tory variables: r2, r2 logr, r2 logk, r2 scisect, r2 year3,
r2 year4, r2 year5 for the subsequent responses. Use adap-
tive quadrature and first.mass=36. Add the previous outcome,
r2 pat1, to establish whether or not we have a first-order model.
If previous outcome is significant, then we can add r2 base to es-
tablish whether the Wooldridge [107] control makes a significant
contribution towards the model. Interpret your results.
2. Repeat the above model fitting procedure, with a one-factor model
for the baseline and subsequent responses with adaptive quadra-
ture (first.mass=24) and accurate arithmetic.
3. Repeat this task, using a bivariate model for the baseline and sub-
sequent responses with adaptive quadrature (first.mass=36) in
both dimensions and with accurate arithmetic.
4. Compare the results. Which is your preferred model and why?
This page intentionally left blank
15
Incidental parameters: an empirical
comparison of fixed effects and random
effects models
15.1 Introduction
The main objective of the mixed/multi-level/random effects modelling
approach is the estimation of the γ parameters, that is, the regression
coefficients associated with the explanatory variables, in the presence
of the random effects or incidental parameters. In a two-level model,
these incidental parameters are the individual-specific random effects
u0j . This has been achieved by assuming that the incidental parameters
are Gaussian distributed, and by computing the expected behaviour of
individuals randomly sampled from this distribution; in other words,
by integrating the random effects out of the model. For the two-level
generalized linear model defined in Chapter 9, we had the likelihood:
Z
Y Y
L γ, φ, σ 2u0 |y, x, z = g (yij | θij , φ) f (u0j ) du0j ,
j i
where:
g (yij | θij , φ) = exp {[yij θij − b (θij )] /φ + c (yij , φ)} ,
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
This approach will provide consistent estimates of the γ = γ 00 , γ p0 , γ 0q
as long as, in the true model, the u0j are independent of the covariates
[x, z].
195
196 Multivariate Generalized Linear Mixed Models Using R
A second approach, due to Andersen [7], is to find a sufficient statis-
tic for the u0j and to estimate the γ from a likelihood conditional upon
this sufficient statistic. For the binary response model with a logit link,
the formulation
P uses the probability of the grouped responses conditional
upon Sj = i yij . For panel data, this probability is the total number or
count of events observed for that individual divided by the length of the
observation period. The distribution of the data y1j , ..., yT j conditional
on Sj is free of u0j . The product of these conditional distributions pro-
vides a likelihood whose maximum will provide a consistent estimator
of γ p0 . The γ 00 and γ 0q are conditioned out of the likelihood. The same
approach can be used with the Poisson model. When there is some form
of state dependence or endogeneity in the binary response model, the
conditional likelihood approach gives inconsistent estimates [28].
There are several other related approaches. One involves factoring the
likelihood into two orthogonal parts, one for the structural parameters
and another for the incidental parameters [26]. Another related approach
is to estimate the u0j and some of the elements of γ by the usual maxi-
mum likelihood procedures. For instance, with panel/clustered data, only
the parameters
on the time-varying/within cluster explanatory variables
γ p0 in the linear model are identified.
In a panel, the number of panel members may be large and the period
of observation may be short. As the number of panel members increases,
so too does the number of incidental parameters (u0j ). This feature was
called the ‘incidental parameters problem’ by Neyman and Scott [84].
For the linear model, with only time-varying/within cluster explanatory
variables, maximum likelihood gives consistent γ p0 but biased u0j .
Abowd et al. [2] developed
an algorithm for the direct least squares
estimation of γ p0 , u0j in linear models on very large datasets. SabreR
uses this algorithm when the argument fixed.effects=TRUE is speci-
fied. The estimates of u0j produced by direct least squares are consistent
as the cluster size or Tj → ∞ [65], Section 3.2 and [106], Chapter 10. The
number of dummy variables (u0j ) that can be estimated directly using
conventional matrix manipulation in least squares is limited by storage
requirements, so SabreR with the argument fixed.effects=TRUE uses
sparse matrix procedures which have been tested on some very large data
sets, for example, with over one million fixed effects. These procedures
have been written in such a way that enables SabreR to use multiple
processors in parallel.
We start by reviewing the fixed effects treatment of the two-level
linear model and estimate this model in SabreR. Then we compare the
fixed effects model with the random effects model. This chapter con-
cludes with a discussion about the three-level fixed effects model.
Incidental parameters: fixed effects versus random effects 197
15.2 Fixed effects treatment of the two-level linear
model
Using the notation of Chapter 5, the explanatory variables at the indi-
vidual level (level one) are denoted by x1 , · · · , xP , and those at the group
level (level two) by z1 , · · · , zQ . This leads to the following formula:
P
X Q
X
yij = γ 00 + γ p0 xpij + γ 0q zqj + u0j + εij ,
p=1 q=1
where the regression coefficients γ p0 (p = 1, · · · , P ) and γ 0q (q =
1, · · · , Q) are for level-one and level-two explanatory variables respec-
tively. If we treat the incidental parameters u0j as fixed effects or con-
stants, then without additional restrictions, the γ 0q ,γ 00 and u0j are not
separately identifiable or estimable.
We absorb the zqj into the fixed effects, so that:
P
X
yij = γ 00 + γ p0 xpij + u+
0j + εij ,
p=1
where:
Q
X
u+
0j = γ 0q zqj + u0j .
q=1
Then we can identify the fixed effects u+
0j by introducing the restric-
P +
tion u0j = 0. The individual incidental parameter u+ 0j represents the
deviation of the jth individual from the common mean γ 00 . Another way
to identify the u+0j is to treat them as dummy variables and to set one of
them to zero, in other words, to place one of them in the reference group
(alternatively, to drop the constant from the model). The fixed effect u+
0j
may be correlated with the included explanatory variables xpij (unlike
the u0j in the random effects version). We still assume that the residuals
εij are mutually independent and have zero means conditional on the
explanatory variables. The population variance of the level-one residuals
εij is denoted by σ 2ε .
We can form a version of the model in which the yij are averaged
over i, so that:
P
X Q
X
yj = γ 00 + γ p0 xpj + γ 0q zqj + u0j + εj ,
p=1 q=1
198 Multivariate Generalized Linear Mixed Models Using R
P P P
where xpj = xpij /Tj , y j = yij /Tj and εj = εij /Tj . This model
still contains the original constant, cluster or time-constant explanatory
variables and the incidental parameter u0j . This model requires one ob-
servation for each individual or cluster. The u0j are not identified in this
model as they occur only once in each cluster and are absorbed into the
residual.
If we take the averaged model from the basic model, we get what is
called the demeaned model with clustered data, or the time-demeaned
model with longitudinal data:
P
X
yij − y j = γ p0 (xpij − xpj ) + (εij − εj ) .
p=1
This differenced model does not have a constant, any incidental pa-
rameters or any group-specific (time-constant) explanatory variables in
its specification. This differenced or demeaned form of the model is often
estimated using ordinary least squares.
The random effects and fixed effects models can lead to different
inferences about γ p0 . For example, Hausman [50] found that a fixed
effects estimator produced significantly different results from a random
effects specification of a wage equation. Mundlak [82] suggested that, in
the random effects formulation, we approximate E (u0j | xpj ) by a linear
function:
XP
u0j = γ ∗p0 xpj + ωoj ,
p=1
where ωoj ∼ N (0, σ 2ω ) [25], so that:
P
X P
X Q
X
yij = γ 00 + γ ∗p0 xpj + γ p0 xpij + γ 0q zqj + ω 0j + εij .
p=1 p=1 q=1
Mundlak [82] suggested that, if this augmented generalized linear
mixed model were to be used, then the difference between the random
effects and fixed effects specifications would disappear. However, there
is another explanation as to how there could be differences between the
two formulations. Suppose we had the alternative augmented generalized
linear mixed model:
P
X P
X Q
X
yij = γ 00 + γ ∗∗
p0 xpj + γ+
p0 (xpij − xpj ) + γ 0q zqj + u0j + εij ,
p=1 p=1 q=1
+
which reduces to the original form if γ ∗∗
p0 = γ p0 . In this model, a change
Incidental parameters: fixed effects versus random effects 199
in the average value of xpj has a different impact to differences from the
average. The averaged form of the alternative augmented model gives:
P
X Q
X
y j = γ 00 + γ ∗∗
p0 xpj + γ 0q zqj + u0j + εj .
p=1 q=1
If we take this averaged model away from the alternative augmented
model, then we get the time-demeaned form:
P
X
yij − y j = γ+
p0 (xpij − xpj ) + (εij − εj ) .
p=1
In this case, the averaged and time-demeaned models will not be esti-
+
mating the same parameters unless γ ∗∗p0 = γ p0 .
Hausman and Taylor [51] showed how to identify time-varying effects
using a fixed effects estimator and how to identify the time-constant
effects using a random effects estimator in the same regression. This
specification is currently beyond the scope of SabreR.
15.3 Dummy variable specification of the fixed
effects model
Hsiao [65], Section 3.2, showed that, by using dummy variables for the
incidental parameters in a linear model with time-varying covariates:
P
X
yij = γ p0 xpij + u∗0j + εij ,
p=1
we can obtain the same estimates as those of the differenced model:
P
X
yij − y j = γ p0 (xpij − xpj ) + (εij − εj ) .
p=1
However, the differenced model parameter estimates will have smaller
standard errors, unless the calculation of the means y j , xpj is taken
into account. The ordinary least squares estimates of the fixed effects
are given by:
P
X
b∗0j = yj −
u γ p0 xpj .
p=1
200 Multivariate Generalized Linear Mixed Models Using R
SabreR with the argument fixed.effects=TRUE uses least squares
to estimate directly the dummy variable version of the incidental param-
eter model. One advantage of the dummy variable form of the model is
that it can be applied to the non-demeaned data when the level-two nest-
ing is broken, for example, when pupils (level one) change class (level
two).
15.4 Empirical comparison of two-level fixed effects
and random effects estimators
Example 15.4.1 Wages of young women
We compare empirically the various ways of estimating a linear model
with incidental parameters. The data we use (nls.tab ) are a version
of the National Longitudinal Study of Youth (NLSY) data. The data
are for young women who were aged between 14 and 26 in 1968. The
women were surveyed each year from 1970 to 1988, except for the years
1974, 1976, 1979, 1981, 1984 and 1986. We have removed records with
missing values on the response (log wages) and on the explanatory vari-
ables, leaving 28,091 observations in the dataset. There are 4,697 women
(idcode), with between 1 and 12 years of data, who were in waged em-
ployment (in other words, not in full-time education) and who earned
over $1/hour and less than $700/hour.
The explanatory variables include: age, ethnicity (whether black
or not), marital status (whether married and spouse is present, or
not), years of schooling completed, whether living outside a standard
metropolitan statistical area (smsa), whether living in the South, trade
union membership and length of job tenure. The version of the dataset
that we use here includes the time-demeaned explanatory variables (de-
noted var tilde, for example, agetilde). We will explore how the re-
sults change when we use different estimators of the incidental parame-
ters.
We can use SabreR to estimate a range of models: (i) homogeneous
linear model with time-varying explanatory variables; (ii) homogeneous
linear model with time-demeaned explanatory variables; (iii) homoge-
neous linear model with explicit dummy variables for the incidental
individual-specific parameters (idcode).
Incidental parameters: fixed effects versus random effects 201
The SabreR commands required to fit models (i), (ii) and (iii) are:
sabre.model.1 <- sabre(ln_wage~age+age2+
ttl_exp+ttl_exp2+tenure+tenure2+
not_smsa+south+grade+black+1,
case=idcode,first.family="gaussian",
first.mass=1)
sabre.model.2 <- sabre(ln_wagetilde~agetilde+
age2tilde+ttl_exptilde+ttl_exp2tilde+
tenuretilde+tenure2tilde+not_smsatilde+
southtilde+gradetilde+blacktilde-1,
case=idcode,first.family="gaussian",
first.mass=1)
sabre.model.3 <- sabre(ln_wage~age+age2+ttl_exp+
ttl_exp2+tenure+tenure2+not_smsa+south+
factor(idcode)-1,case=idcode,
first.family="gaussian",first.mass=1)
These commands result in the following three sets of output:
Log likelihood = -12523.347
on 28079 residual degrees of freedom
Parameter Estimate Std. Err.
___________________________________________
cons 0.24728 0.49332E-01
age 0.38598E-01 0.34670E-02
age2 -0.70818E-03 0.56322E-04
ttl_exp 0.21128E-01 0.23350E-02
ttl_exp2 0.44733E-03 0.12461E-03
tenure 0.47369E-01 0.19626E-02
tenure2 -0.20270E-02 0.13380E-03
not_smsa -0.17205 0.51675E-02
south -0.10034 0.48938E-02
grade 0.62924E-01 0.10313E-02
black -0.69939E-01 0.53207E-02
sigma 0.37797
Log likelihood = -2578.2531
on 28082 residual degrees of freedom
202 Multivariate Generalized Linear Mixed Models Using R
Parameter Estimate Std. Err.
___________________________________________
agetilde 0.35999E-01 0.30903E-02
age2tilde -0.72299E-03 0.48601E-04
ttl_exptilde 0.33467E-01 0.27060E-02
ttl_exp2tilde 0.21627E-03 0.11657E-03
tenuretilde 0.35754E-01 0.16870E-02
tenure2tilde -0.19701E-02 0.11406E-03
not_smsatilde -0.89011E-01 0.86980E-02
southtilde -0.60631E-01 0.99759E-02
gradetilde 0.0000 ALIASED [E]
blacktilde 0.0000 ALIASED [E]
sigma 0.26527
Log likelihood = -2578.2532
on 23385 residual degrees of freedom
Parameter Estimate Std. Err.
___________________________________________
age 0.35999E-01 0.33864E-02
age2 -0.72299E-03 0.53258E-04
ttl_exp 0.33467E-01 0.29653E-02
ttl_exp2 0.21627E-03 0.12774E-03
tenure 0.35754E-01 0.18487E-02
tenure2 -0.19701E-02 0.12499E-03
not_smsa -0.89011E-01 0.95316E-02
south -0.60631E-01 0.10932E-01
fidcode( 1) 1.4233 0.96326E-01
fidcode( 2) 0.97264 0.96648E-01
fidcode( 3) 0.82992 0.89323E-01
fidcode( 4) 1.3009 0.10013
fidcode( 5) 1.1761 0.10011
. . .
. . .
. . .
fidcode(4693) 0.26070 0.21192
fidcode(4694) 0.73985 0.91011E-01
fidcode(4695) 0.47566 0.10950
fidcode(4696) 0.51585 0.21239
fidcode(4697) 1.1980 0.12140
sigma 0.29069
Note that we have edited out most of the output from model (iii)
Incidental parameters: fixed effects versus random effects 203
involving the explicit dummy variables; in other words, the parameter
estimates and standard errors relating to fidcode.
The model for the time-demeaned data and the explicit dummy vari-
able model with the non-time-demeaned data produce identical esti-
mates. These results are slightly different to those for the homogeneous
model. If the incidental parameters are independent of the explanatory
variables, then both sets of estimates will tend to the same limit as the
number of clusters increases.
The explanatory variables gradetilde and blacktilde are dropped
from the time-demeaned model as these are time-constant factors, which
when demeaned have the value zero throughout. The SabreR command
needed to fit this model is:
sabre.model.4 <- sabre(ln_wage~age+age2+ttl_exp+
ttl_exp2+tenure+tenure2+not_smsa+south-1,
case=idcode,first.family="gaussian",
fixed.effects=TRUE)
This command produces the output:
Parameter Estimate Std. Err.
________________________________________________
age 0.35999E-01 0.33865E-02
age2 -0.72299E-03 0.53259E-04
ttl_exp 0.33467E-01 0.29654E-02
ttl_exp2 0.21627E-03 0.12774E-03
tenure 0.35754E-01 0.18487E-02
tenure2 -0.19701E-02 0.12499E-03
not_smsa -0.89011E-01 0.95318E-02
south -0.60631E-01 0.10932E-01
sigma 0.29070
The smaller standard errors of the demeaned model parameter es-
timates occur because the model-fitting procedure has not taken into
account the separate estimation of the means that were used to obtain
the time-demeaned values.
204 Multivariate Generalized Linear Mixed Models Using R
15.5 Implicit fixed effects estimator
SabreR with the argument fixed.effects=TRUE uses dummy variables
for each individual, and solves the least squares normal equations using
sparse matrix procedures. For this reason, we do not present a log likeli-
hood for this model. We call this the implicit fixed effects estimator, as
the dummy variables are not written out as part of the display.
The implicit fixed effects model does not have a constant term. The
estimates and standard errors match those of the explicit dummy vari-
ables model. Clearly, with small datasets like nlswage.tab, both the
implicit and explicit dummy variable models can be used. However, the
implicit model estimator fixed.effects=TRUE was 3,000 times faster
on this dataset than the standard homogeneous model fit and required
much less memory.
15.6 Random effects models
We now use SabreR to obtain estimates for various specifications of the
random effects model. The classical random effects model is:
P
X Q
X
yij = γ 00 + γ p0 xpij + γ 0q zqj + u0j + εij .
p=1 q=1
We estimate the model with xpij and zqj , using six-point adaptive
quadrature. The SabreR command required to fit this model is:
sabre.model.5 <- sabre(ln_wage~age+age2+ttl_exp+
ttl_exp2+tenure+tenure2+not_smsa+south+
grade+black+1,case=idcode,
first.family="gaussian",first.mass=6,
adaptive.quad=TRUE)
This command leads to the following output:
Log likelihood = -8853.4259
on 28078 residual degrees of freedom
Incidental parameters: fixed effects versus random effects 205
Parameter Estimate Std. Err.
______________________________________________
cons 0.23908 0.49190E-01
age 0.36853E-01 0.31226E-02
age2 -0.71316E-03 0.50070E-04
ttl_exp 0.28820E-01 0.24143E-02
ttl_exp2 0.30899E-03 0.11630E-03
tenure 0.39437E-01 0.17604E-02
tenure2 -0.20052E-02 0.11955E-03
not_smsa -0.13234 0.71322E-02
south -0.87560E-01 0.72143E-02
grade 0.64609E-01 0.17372E-02
black -0.53339E-01 0.97338E-02
sigma 0.29185 0.13520E-02
scale 0.24856 0.35017E-02
In the first extension, both the explanatory variables averaged over
time (xpj ) and the time-varying covariates (xpij ) have their own param-
eters in the linear predictor:
P
X P
X Q
X
yij = γ 00 + γ ∗p0 xpj + γ p0 xpij + γ 0q zqj + ω 0j + εij .
p=1 p=1 q=1
We estimate this model with xpj , xpij and zqj . The SabreR command
needed to fit this model is:
sabre.model.6 <- sabre(ln_wage~agebar+age2bar+
ttl_expbar+ttl_exp2bar+tenurebar+tenure2bar+
not_smsabar+southbar+age+age2+ttl_exp+
ttl_exp2+tenure+tenure2+not_smsa+south+grade+
black+1,case=idcode,first.family="gaussian",
first.mass=6,adaptive.quad=TRUE)
This command produces the following output:
Log likelihood = -8774.6178
on 28070 residual degrees of freedom
Parameter Estimate Std. Err.
________________________________________________
cons 0.31033 0.12438
agebar -0.20870E-02 0.95809E-02
age2bar 0.10329E-03 0.15613E-03
206 Multivariate Generalized Linear Mixed Models Using R
ttl_expbar -0.19474E-01 0.63847E-02
ttl_exp2bar 0.49153E-03 0.34887E-03
tenurebar 0.31656E-01 0.62217E-02
tenure2bar -0.79062E-03 0.42178E-03
not_smsabar -0.98306E-01 0.14231E-01
southbar -0.40645E-01 0.14537E-01
age 0.35999E-01 0.33967E-02
age2 -0.72299E-03 0.53421E-04
ttl_exp 0.33467E-01 0.29744E-02
ttl_exp2 0.21627E-03 0.12813E-03
tenure 0.35754E-01 0.18543E-02
tenure2 -0.19701E-02 0.12537E-03
not_smsa -0.89011E-01 0.95607E-02
south -0.60631E-01 0.10965E-01
grade 0.61112E-01 0.19098E-02
black -0.60684E-01 0.98738E-02
sigma 0.29158 0.13489E-02
scale 0.24458 0.34461E-02
In the second extension, both the explanatory variables averaged over
time (xpj ) and the time-demeaned covariates (xpij − xpj ) have their own
parameters in the linear predictor:
P
X P
X Q
X
yij = γ 00 + γ ∗∗
p0 xpj + γ+
p0 (xpij − xpj ) + γ 0q zqj + u0j + εij .
p=1 p=1 q=1
We estimate this model with xpj , (xpij − xpj ) and zqj . The SabreR
command needed to fit this model is:
sabre.model.7 <- sabre(ln_wage~agebar+age2bar+
ttl_expbar+ttl_exp2bar+tenurebar+tenure2bar+
not_smsabar+southbar+agetilde+age2tilde+
ttl_exptilde+ttl_exp2tilde+tenuretilde+
tenure2tilde+not_smsatilde+southtilde+grade+
black+1,case=idcode,first.family="gaussian",
first.mass=6,adaptive.quad=TRUE)
This command results in the output:
Log likelihood = -8774.6178
on 28070 residual degrees of freedom
Incidental parameters: fixed effects versus random effects 207
Parameter Estimate Std. Err.
________________________________________________
cons 0.31033 0.12438
agebar 0.33912E-01 0.89586E-02
age2bar -0.61971E-03 0.14671E-03
ttl_expbar 0.13992E-01 0.56496E-02
ttl_exp2bar 0.70780E-03 0.32449E-03
tenurebar 0.67410E-01 0.59389E-02
tenure2bar -0.27607E-02 0.40271E-03
not_smsabar -0.18732 0.10542E-01
southbar -0.10128 0.95431E-02
agetilde 0.35999E-01 0.33967E-02
age2tilde -0.72299E-03 0.53421E-04
ttl_exptilde 0.33467E-01 0.29744E-02
ttl_exp2tilde 0.21627E-03 0.12813E-03
tenuretilde 0.35754E-01 0.18543E-02
tenure2tilde -0.19701E-02 0.12537E-03
not_smsatilde -0.89011E-01 0.95607E-02
southtilde -0.60631E-01 0.10965E-01
grade 0.61112E-01 0.19098E-02
black -0.60684E-01 0.98738E-02
sigma 0.29158 0.13489E-02
scale 0.24458 0.34461E-02
The inference from the classical random effects model differs from
that of the two extended random effects models. The inference from the
two extended random effects models is the same. There is a significant
difference between the log likelihoods of the classical and extended ran-
dom effects models. The change in deviance is:
−2(−8853.4259 − (−8774.6178)) = 157. 62,
on 28078 − 28070 = 8 degrees of freedom. In addition, several of the
coefficients on the xpj explanatory variables are significant. This signif-
icance could be interpreted in two alternative ways. First, the omitted
effects could be significantly correlated with the included time-varying
explanatory variables. Second, the explanatory variables averaged over
time could have different impacts to their time-demeaned values.
208 Multivariate Generalized Linear Mixed Models Using R
15.7 Comparing two-level fixed effects and random
effects models
As the fixed effects models and the extended random effects models
make similar inferences about the effect of the time-varying explanatory
variables, it might seem that we can use either of them to make such
inferences. However, in this empirical comparison, there were no internal
explanatory variables or state dependence effects, such as durations or
lagged responses. When these sorts of endogenous variable are present,
the correlation between the included factors and omitted effects will
vary with time. This variation will depend on the current duration in a
survival model (or on the previous response in a first-order model) and
thus will be difficult to capture in a fixed effects model.
In the absence of endogenous explanatory variables, we can establish
if there is some systematic non-stationarity in the correlation between
the included factors and omitted effects. We divide the observation win-
dow into blocks of responses and then produce averages over time, and
time-demeaned variable effects for each block.
To explore whether the coefficients for the time-constant explanatory
variables are really time constant, we can use dummy variables for dif-
ferent time intervals and can include the interactions of these dummy
variables with the time-constant explanatory variables. However, it may
not always be possible to account for the correlation between included
explanatory variables and the incidental parameters using simple linear
functions of the means of the time-varying explanatory variables, or by
using different parameters for different time intervals.
15.8 Fixed effects treatment of the three-level linear
model
As we saw in Chapter 10, it is not unusual to have three-level data, for
instance, six questions (level one) answered by a number of pupils (level
two) across different schools (level three). Another example of three-level
data would be workers (level two) in firms (level three) employed over
time (level one). In the models discussed in Chapter 10, the lower level
data were nested in their higher level units, and this nesting simplified
the analysis. However, with longitudinal data, this three-level nesting
is often broken, for example, when workers change job and go to work
Incidental parameters: fixed effects versus random effects 209
for a different firm. When this happens, there is no such transformation
as time demeaning that will ‘sweep out’ both the worker and firm fixed
effects [1].
By focusing on different re-arrangements of the data (worker, firm
and spell), different aspects of the model can be identified. For example,
the time-demeaned worker data identifies the differences in the firm ef-
fects for the workers who move [2]. These different aspects of the model
can then be recombined using minimum distance estimators; see An-
drews et al. [9, 11]. The estimation of fixed effects in the three-level lin-
ear model is particularly important for researchers who are interested in
assessing their correlation with other effects in the model. For example,
Abowd et al. [1, 2] wanted to establish the relationship between ‘high
wage workers and high wage firms’.
15.9 Exercises comparing fixed effects and random
effects
Exercise 15.9.1 Effect of job training on firm scrap rates
Holzer et al. [61] studied the impact of job training grants on worker
productivity by collecting information on ‘scrap rates’ for a sample of
Michigan manufacturing firms. The ‘scrap rate’ is defined to be the rate
(per 100 items) at which items are scrapped.
In a related study, Wooldridge [108], Example 14.1, used the data
(jtrain.tab) on 54 firms that reported scrap rates for the years 1987,
1988 and 1989. No firms obtained job training grants before 1988. A
total of 19 firms obtained grants in 1989.
Wooldridge [108] allowed for the possibility that the additional job
training in 1988 made workers more productive in 1989 by using the
lagged value of the grant indicator variable. He also included dummy
variables for 1988 and for 1989. In this exercise, we replicate Wooldridge’s
analysis:
1. Estimate a homogeneous linear model for the response lscrap (log
scrap rate), with explanatory variables grant (1 if firm received
grant, 0 otherwise), year indicator variables d89 and d88, and
lagged grant variable grant 1, which is assumed to be 0 in 1987.
These results are given as the first part (homogeneous model) of
the output that is obtained by estimating the fixed effects model.
210 Multivariate Generalized Linear Mixed Models Using R
2. Re-estimate this model, using the fixed firm effects (fcode). What
is the main difference between the results from the alternative
estimators?
3. Re-estimate the previous two models, without the lagged grant
indicator (grant 1). Is the model a poorer fit to the data?
4. What does the coefficient for d89 suggest in your preferred model?
5. Re-estimate the linear models of lscrap, with and without the
lagged grant indicator (grant 1), this time treating fcode as a ran-
dom effect. Use adaptive quadrature and first.mass= 64. Com-
pare the fixed effect and random effect model inferences. What do
you find?
Exercise 15.9.2. Effect of education on log wages
In this exercise, we return to the data analyzed previously in Exercise
11.7.2. Vella and Verbeek [103] examined data on 545 young males from
the Youth Sample of the US National Longitudinal Survey for the period
1980–1987. Following Wooldridge [108], we use a version of their data
(wagepan2.tab) in order to relate log wages (lwage) to a set of explana-
tory variables. Some factors, such as race, remain constant over time.
Other variables, including number of years of labour market experience,
marital status and trade union membership, are time variant.
To establish if wage rates have changed over time, we need to start
by creating interaction effects for educ (number of years of schooling)
with the year-specific dummy variables (d81,d82,...,d87). Call these
interaction effects edd81-edd97 respectively:
1. Estimate a homogeneous linear model for the response lwage with
the covariates espersq (number of years of labour market experi-
ence squared), union (1 if respondent is in a union, 0 otherwise),
married (1 if respondent is married, 0 otherwise), d81-d87 and
edd81-edd97. Note that, in the presence of all interaction terms
edd81-edd97, the educ main effect is not required in the model.
2. These results are given as the first part (homogeneous model) of
the output that is obtained by estimating the fixed effects model.
Estimate the model using the respondent fixed effects (nr). What
is the main difference between the results from the alternative
estimators?
3. Re-estimate this model, without the time-varying effects of educa-
tion (edd81-edd97). Is the model a poorer fit to the data?
Incidental parameters: fixed effects versus random effects 211
4. Re-estimate the model which includes edd81-edd97, this time
treating respondent identifier (nr) as a random effect. Use adap-
tive quadrature with first.mass=12. Compare the fixed effect and
random effect model inferences. What do you find?
This page intentionally left blank
Appendices
This page intentionally left blank
Appendix A
SabreR installation, SabreR commands,
quadrature, estimation, endogenous
effects
A.1 SabreR installation
In order to install SabreR on a Windows system, you will
need to download the file, sabreR.zip, from the Sabre site,
http://sabre.lancs.ac.uk/. Put the file, sabreR.zip, in a convenient
place, for example, on the desktop. Start R and use the menu system in
R to install the SabreR package. The installation process will set up two
demonstration examples of SabreR. These examples can be used to check
that the SabreR package has been installed correctly. A similar process
may be used to install SabreR on a Unix system.
A.2 SabreR commands
A.2.1 The arguments of the SabreR object
There are many arguments in the SabreR object. Type:
> library(sabreR)
Then type:
> args(sabreR)
You will obtain the following output:
function (model.formula.uni, model.formula.bi = NULL, model.formula.tri
= NULL,
case, alpha = 0.01, approximate = 5, max.its = 100, arithmetic.type = "fast",
offset = "", convergence = 1e-04, correlated = "yes", left.end.point = NULL,
right.end.point = NULL, first.family = "binomial", second.family = "binomial",
third.family = "binomial", first.link.function = "logit",
second.link.function = "logit", third.link.function = "logit",
first.mass = 12, second.mass = 12, third.mass = 12, ordered = FALSE,
215
216 Multivariate Generalized Linear Mixed Models Using R
first.scale = -10000, second.scale = -10000, third.scale = -10000,
first.rho = 0, second.rho = 0, third.rho = 0, first.sigma = 1,
second.sigma = 1, third.sigma = 1, tolerance = 1e-06, equal.scale = FALSE,
depend = FALSE, only.first.derivatives = FALSE, adaptive.quad = FALSE),
Fixed.effects=FALSE)
on your screen.
This facility is particularly useful if you have forgotten the syntax
of the argument you want to use. The arguments can be entered in any
order but they must be labelled correctly. This output also shows the
default values; for instance, the default quadrature method is standard
Gaussian, as the default form is adaptive.quad = FALSE.
A.2.2 The anatomy of a SabreR command file
There are several key elements to a SabreR command file. We examine
the first few lines of the command file (example c5 sabre.R) required
to fit the mixed Poisson model to data (racd.tab) on the demand for
health care (see Example 8.4.1 ). The file example c5 sabre.R contains
the following:
#save the log file
sink(file="/Rlib/SabreRCourse/examples/c5/c5.log")
# load the sabreR library
library(sabreR)
# read the data
racd<-read.table(file="/Rlib/SabreRCourse/data/racd.tab")
attach(racd)
# look at 10 lines 10 columns of the data
racd[1:10,1:10]
# estimate model
sabre.model.51<-sabre(prescrib~sex+age+agesq+income+
levyplus+freepoor+freerepa+illness+actdays+hscore+chcond1+chcond2+1,
case=id,first.family="poisson")
# look at the results
sabre.model.51
# show just the estimates
#print(sabre.model.51,settings=FALSE)
SabreR installation and commands, quadrature, estimation, endogeneity 217
#remove the created objects
detach(racd)
rm(racd,sabre.model.51)
#close the log file
sink()
The lines that start with a # are comments that have been added to
help you understand the contents of this file. We will now examine the
commands in more detail.
The following command is an R command which opens a log file called
c5.log:
sink(file="/Rlib/SabreRCourse/examples/c5/c5.log")
The following command is an R command which makes the sabreR
library available to the current R session:
library(sabreR)
The first line of the R text below creates the R object, racd, which
links the current session to the R data set racd.tab. The second com-
mand, attach(racd), loads the racd.tab dataset into memory:
racd<-read.table(file="/Rlib/SabreRCourse/data/racd.tab")
attach(racd)
The following command displays the first ten lines and first ten
columns of the dataset racd.tab:
racd[1:10,1:10]
The next command creates an R object sabre.model.51:
sabre.model.51<-sabre(prescrib~sex+age+agesq+income+
levyplus+freepoor+freerepa+illness+actdays+hscore+chcond1+chcond2+1,
case=id,first.family="poisson")
This object instructs SabreR to fit a Poisson model of the form:
prescrib~sex+age+agesq+income+levyplus+freepoor+freerepa+
illness+actdays+hscore+chcond1+chcond2+1. This model will con-
tain a constant term (1). This object also informs SabreR that the
level-two (grouping or case) variable is called id. We will use the default
quadrature method (standard) with the default number of mass points
(12). SabreR can fit univariate, bivariate and trivariate models. The use
of the instruction, first.family, and the absence of the instructions,
second.family and third.family, instructs SabreR that we wish to
estimate only a univariate model.
The following command instructs R to display the results:
sabre.model.51
218 Multivariate Generalized Linear Mixed Models Using R
The next commands are used to tell R to remove the R objects racd
and sabre.model.51 from memory:
detach(racd)
rm(racd,sabre.model.51
The final R command closes the log file:
sink()
There is a full online help system within R, which provides more
details on the R commands.
A.3 Quadrature
We illustrate standard Gaussian quadrature and adaptive Gaussian
quadrature for the univariate two-level generalized linear model. The
underlying principles can be extended to three and higher levels, and to
multivariate responses. The likelihood for the univariate two-level gen-
eralized linear model takes the form:
+∞
YZ Y
L γ, φ, σ 2u0 |y, x, z = g (yij | θij , φ) f (u0j ) du0j ,
j −∞ i
where:
g (yij | θij , φ) = exp {[yij θij − b (θij )] /φ + c (yij , φ)} ,
P
X Q
X
θij = γ 00 + γ p0 xpij + γ 0q zqj + u0j ,
p=1 q=1
and: !
1 u20j
f (u0j ) = √ exp − 2 .
2πσ u0 2σ u0
SabreR can evaluate the integrals in L γ, φ, σ 2u0 |y, x, z using either
standard Gaussian or adaptive Gaussian quadrature. We discuss each of
these quadrature methods in turn.
A.3.1 Standard Gaussian quadrature
Standard Gaussian quadrature, or Gaussian quadrature, uses a finite
number (C) of quadrature points consisting of weights (probabilities pc )
SabreR installation and commands, quadrature, estimation, endogeneity 219
and locations uc0 . The values of pc and uc0 are available from standard
normal tables [100]. The approximation takes the form:
C
YX Y
L γ, φ, σ 2u0 |y, x, z ≃ pc g yij | θcij , φ ,
j c=1 i
where:
P
X Q
X
θcij = γ 00 + γ p0 xpij + γ 0q zqj + σ u0 uc0 ,
p=1 q=1
C
X
pc = 1.
c=1
Y
The approximation works as long as g (yij | θ ij , φ) can be repre-
i
sented by a polynomial in u0j which is of degree less than or equal to
2C − 1. However, it is not a priori clear what value of C is required.
Consequently, it is important to check whether a sufficient number of
quadrature points has been used by comparing solutions. Typically, we
start with a small C and increase it until convergence in the likelihood
occurs. C is large enough when the addition of more quadrature points
will not improve the approximation. SabreR can use a wide range of
quadrature points to represent the mixing distribution of each random
effect: 2 to 16 points in steps of length 2; 16 to 48 in steps of length 4;
48 to 112 in steps of length 8; 112 to 256 in steps of length 16.
A.3.2 Performance of Gaussian quadrature
When running SabreR serially rather than in parallel, the larger the
number of quadrature points used, the longer it takes to compute the
likelihood. The time taken is approximately proportional to the product
of the numbers of quadrature points across all the random effects in the
multivariate generalized linear mixed model. For a bivariate two-level
random intercept model, there are two random effects at level two for
each response. If we use C = 16 quadrature points for each random
effect, then the total time will be approximately 162 = 256 times longer
than that for a model without any random effects (C = 1).
Rabe-Hesketh et al. [90] noted that Gaussian quadrature (or nor-
mal quadrature (NQ)) tends to work well with moderate cluster sizes
such as those often found in panel studies. However, with large cluster
sizes which are common in grouped cross-sectional data, the estimates
from some algorithms can become biased. This problem was articulated
by Borjas and Sueyoshi [19] and Lee [73] for probit models, by Albert
220 Multivariate Generalized Linear Mixed Models Using R
and Follmann [5] for Poisson models and by Lesaffre and Spiessens [74]
for logit models. Lee [73] attributed the poor performance of Gaussian
quadrature to numerical underflow and developed an algorithm to over-
come this problem.
Rabe-Hesketh et al. [90] noted that, for probit models, the Lee [73]
algorithm worked well in simulations with clusters as large as 100 when
the intraclass correlation was 0.3, but produced biased estimates when
the correlation was increased to 0.6. Rabe-Hesketh et al. [90] also noted
that a likely reason for this bias was that, for large clusters and high
intraclass correlations, the integrands of the cluster contributions to the
likelihood had very sharp peaks that may have been located between
adjacent quadrature points.
There can be problems with underflow and overflow in SabreR when
estimating models. If these problems occur, SabreR will give you a warn-
ing message and suggest you use a more accurate form of arithmetic. In
some contexts, the underflow can be benign. For instance, when we cal-
culate: Y
pc g yij | θ cij , φ ,
i
for the tails of the distribution, its contribution to the overall approx-
imation to the likelihood may be so close to zero that it will make no
real difference to the overall approximation (summed over c) and it can
be ignored.
By default, SabreR uses standard double precision (FORTRAN 95,
real*8) variables and arithmetic (arithmetic.type = "fast"). This is
adequate for most applications but occasionally, some of the interme-
diate calculations of the log likelihood log L γ, φ, σ 2u0 |y, x, z , and its
first- and second-order derivatives, can require the calculation of values
which are beyond the range of double precision numbers. This range is
approximately 10 to the power −308 to 10 to the power +308.
This range in arithmetic precision can be extended by using the ar-
gument arithmetic.type = "accurate". In this case, all calculations
are performed using specially written arithmetic code in which the ex-
ponent of the variable is stored separately in a four-byte integer. This
facility extends the range of intermediate calculations to approximately
10 to the power −2 billion to 10 to the power +2 billion. The preci-
sion with which numbers are stored is the same for both ‘f(ast)’ and
‘a(ccurate)’, that is, about 15 decimal digits.
This greater range in arithmetic precision comes at the cost of an
increased run time, which is typically 15 times as long as the time it
takes to run a job using fast arithmetic. However, particularly when
using parallel SabreR on a large number of processors, this may be a
cost worth paying as otherwise the problem may not be soluble.
SabreR installation and commands, quadrature, estimation, endogeneity 221
By default, SabreR uses standard Gaussian quadrature (adaptive.
quad = FALSE). Rabe-Hesketh et al. [90] proposed the use of adaptive
quadrature as an alternative to standard quadrature, partly to avoid
the problem of underflow/overflow that may be encountered when using
standard Gaussian quadrature. Adaptive quadrature will be performed
by SabreR if you use the argument adaptive.quad = TRUE. We discuss
adaptive quadrature in more detail in the following sub-section.
A.3.3 Adaptive quadrature
Adaptive quadrature works by adapting the quadrature locations of each
integral in order to place them where they are of most benefit to the
quadrature approximation, in other words, under the peaks. The adap-
tive quadrature weights and locations depend on the parameters of the
model. Between each step of the maximisation algorithm, the weights
and locations are shifted and rescaled. We follow Skrondal and Rabe-
Hesketh [96] and Rabe-Hesketh et al. [90] in illustrating adaptive quadra-
ture. If we adopt a Bayesian perspective,
that is, we assume that we know
the model parameters γ, φ, σ 2u0 , then the two-level generalized linear
model likelihood:
+∞
Z
2
Y Y
L γ, φ, σ u0 |y, x, z = g (yij | θij , φ) f (u0j ) du0j
j −∞ i
has an integrand that comprises the product of the joint probabilty of
the data given u0j and the prior density of u0j ; in other words:
Y
g (yij | θij , φ) f (u0j ) .
i
Under the Bayesian central limit theorem [24], posterior densities are
approximately normal. If µj and ϕ2j are the respective mean and variance
of this posterior density f u0j ; µj , ϕ2j , then the ratio:
Q
g (yij | θ ij , φ) f (u0j )
i
f u0j ; µj , ϕ2j
should be approximated by a polynomial of a lower degree than that of
the original Gaussian quadrature function. If this is the case, then we
will require fewer quadrature points than standard Gaussian quadrature.
We can rewrite the original Gaussian quadrature integral as:
Q
+∞
Z g (yij | θij , φ) f (u0j )
fj γ, φ, σ 2u0 = f u0j ; µj , ϕ2j i du0j ,
f u0j ; µj , ϕ2j
−∞
222 Multivariate Generalized Linear Mixed Models Using R
so that the posterior density f u0j ; µj , ϕ2j becomes the weight function.
Let f (ν j ) denote a standard normal density. Then, by applying the
change of variable:
u0j − µj
νj =
ϕj
2
to the elements of fj γ, φ, σ u0 , and by applying the standard quadra-
ture rule (with weights pc and locations ν c0 ), θ cij becomes:
P
X Q
X
θ AQc
ij = γ 00 + γ p0 xpij + γ 0q zqj + σ u0 ϕj ν c0 + µj ,
p=1 q=1
and:
Q AQc
X i g y ij | θ ij , φ f ϕj ν c0 + µj
fj γ, φ, σ 2u0 ≃ pc 1 2
√ c
exp − (ν 0 )
c ϕj 2π
" #
X Y AQc
= π jc g yij | θ ij , φ ,
c i
where:
f ϕj ν c0
+ µj
π jc = pc 1
2 .
ϕj
√
2π
exp − (ν c0 )
Unfortunately, at each interation of the optimisation procedure, the
posterior mean and variance µj , ϕ2j of each group are not known a
priori. However, they can be obtained from an iterative procedure [83].
We use the superscript k to denote values at the k-th iteration. At the
start, we have k = 1, and set µ0j = 0 and ϕ0j = 1 to give ϕ0j ν c0 + µ0j
and π 0jc . The posterior means and variances are then updated at each
subsequent iteration as follows:
P k−1 c k−1
k−1 Q AQck−1 k−1
c ϕj ν 0 + µ j π jc g y ij | θ ij , φ
i
µkj = ,
fjk γ k , φk , σ 2uk0
P k−1 c
k−1 2 k−1 Q AQck−1 k−1
2 c ϕj ν 0 + µj π jc g yij | θij ,φ
2
i
ϕkj = − µkj ,
fjk γ k , φk , σ 2u0
k
where:
" #
X Y
k k AQck−1 k−1
fjk k
γ ,φ , σ 2u0 ≃ π k−1
jc g yij | θ ij ,φ .
c i
SabreR installation and commands, quadrature, estimation, endogeneity 223
At each k, we use µk−1j and ϕk−1
j in ϕk−1
j ν c0 + µk−1
j , and π k−1
jc , in
k k
the convergence process that will give us µj and ϕj . As the optimisation
procedure approaches the solution, there is a smaller change in γ k , φk
k
and σ 2u0 , and consequently there is a smaller change in µkj and ϕkj , so
that convergence in this local adaptation tends to occur within two to
three cycles.
It is our experience that underflow can still occur in SabreR with
adaptive quadrature (adaptive.quad = TRUE) but this can be resolved
by using the argument arithmetic.type = "accurate". Algorithms for
adaptive quadrature in the context of multi-level and multivariate ran-
dom effects can also be developed along similar lines [90, 96]. Adaptive
quadrature has been deployed in SabreR for two-level and three-level
models, and for univariate, bivariate and trivariate models.
A.4 Estimation
Two forms of estimation are detailed in this section. First, we examine
how random effects models are estimated, then we proceed to consider
the estimation of fixed effects models.
A.4.1 Maximizing the log likelihood of random effects
models
SabreR uses the Newton-Raphson algorithm to maximize the log likeli-
hood. The Newton-Raphson algorithm is an iterative procedure. If we de-
note the parameters which maximize log L (π|y, x, z) by π = γ, φ, σ 2u0 ,
then a necessary condition for this log likelihood to be maximized is:
∂ log L (π|y, x, z)
= 0.
∂π
The values of the parameters at the n-th iteration are denoted by π n .
Then a first-order Taylor expansion about π n gives:
∂ log L (π|y, x, z) ∂ log L (π|y, x, z)
≃
∂π ∂π n
2 π=π
∂ log L (π|y, x, z)
+ (π − π n )
∂π∂π ′ π=π n
= g (π n ) + H (π n ) (π − π n ) ,
224 Multivariate Generalized Linear Mixed Models Using R
where g (π n ) is the gradient vector at π n and H (π n ) is the Hessian
matrix. The process is made iterative by writing:
g (π n ) + H (π n ) π n+1 −πn = 0,
so that:
π n+1 = π n − [H (π n )]−1 g (π n ) .
When π has, say, k elements (k > 1), the computational effort re-
quired to calculate log L (π n |y, x, z) once is less than it is to calculate
g (π n ) for the k elements of π and similarly for the (k − 1)k/2 distinct
elements of H (π n ). So, we actually use:
−1
π n+1 = πn + s [−H (π n )] g (π n )
= πn + sd,
where s is a scalar (often called the step length). At each step (n) we
try s = 1, if:
log L π n+1 |y, x, z ≻ log L (π n |y, x, z) ,
then continue, while if:
log L π n+1 |y, x, z log L (π n |y, x, z) ,
then try s = 0.5 or s = 0.25, until:
log L π n+1 |y, x, z ≻ log L (π n |y, x, z) ,
then continue.
SabreR has an option that allows you to use minus the outer product
of the gradient vectors, which we write as:
X
H (π n ) = − gj (π n ) gj (πn )′ .
j
In the two-level generalized linear model, gj (π n ) takes the form:
" c=C #
X Q
c
∂ log pc g yij | θ ij , φ
c=1 i
πn
gj (πn ) = .
∂π
The outer product of the gradient vectors ensures that H (π n ) is neg-
ative definite. This form of H (π n ) can be useful when there are many
local maxima and minima of log L (π|y, x, z). This version of H (πn )
gives the Fisher scoring algorithm [16]. However, it can be very slow
SabreR installation and commands, quadrature, estimation, endogeneity 225
to converge when compared to the Newton-Raphson algorithm for es-
timating multivariate generalized linear mixed models evaluated using
Gaussian quadrature.
It is important to acknowledge that many Gaussian quadrature log
likelihoods have multiple local maxima. This makes it necessary to use
different starting values, to compare the solutions and to establish the
best solution. It is only the global maximum in log L (π|y, x, z) that
provides the maximum likelihood estimates.
SabreR uses analytic rather than numerical approximations to
H (π n ) and g (π n ), and is therefore much faster than gllamm (Stata)
which uses ml (Newton-Raphson) with method d0 (no analytic deriva-
tives required).
A.5 Fixed effects linear models
Using the notation of Chapter 5 on the two-level linear regression model,
the explanatory variables at the individual level (level one) are denoted
by x1 , · · · , xP , and those at the group level (level two) by z1 , · · · , zQ , so
that:
P
X Q
X
yij = γ 00 + γ p0 xpij + γ 0q zqj + u0j + εij .
p=1 q=1
The parameters γ p0 (p = 1, · · · , P ) and γ 0q (q = 1, · · · , Q) are the re-
gression coefficients associated with the level-one and level-two explana-
tory variables respectively. Groups comprising only one individual have
to be removed from the dataset, as dummy variables for groups of size 1
are not identified. This model is estimated without a constant term and
without time-constant covariates, in other words, we set γ 00 = γ 0q = 0,
and treat all of the incidental parameters u0j as dummy variables. This
is the least squares dummy variable (LSDV) estimator. We use the term
LSDV for the explicit use of dummy variables. The estimates of u0j are
biased but consistent. A number of fixed effects estimators have been
proposed. Papers on the estimation of this model include Abowd et al.
[2] and Andrews et al. [11].
There may be too many groups in a dataset to perform the con-
ventional matrix manipulations required to estimate this model, given
the limited memory of most desktop PCs. SabreR does not use any ap-
proximations or differencing (demeaning), as it solves directly the least
squares normal equations for the model. Furthermore, the group sizes
do not need to be balanced. The algorithm still works if the model in-
226 Multivariate Generalized Linear Mixed Models Using R
cludes level-three dummy variables, as long as these variables change
for some level-two units. To solve the normal equations, SabreR uses
some of the large sparse matrix algorithms of the Harwell Subroutine Li-
brary (HSL) [49]. The SabreR estimator (Fixed.effects = TRUE) may
be parallelized on multiprocessor systems.
A.6 Endogenous and exogenous variables
In the social sciences, interest often focuses on the dynamics of social
or economic processes. Social science theory suggests that individual
behaviour, choices or outcomes of a process are directly influenced by (or
are a function of) previous behaviour, choices or outcomes. For instance,
someone employed this week is more likely to be in employment next
week than someone who is currently unemployed; someone who voted
for a certain political party in the last election is more likely to vote for
that party in the next election than someone who did not.
When analyzing observational data in the social sciences, it is neces-
sary to distinguish between two different types of explanatory variable:
those which are exogenous, and those which are endogenous. Exogenous
variables are external to the process under study, for example, age, sex,
social class and education in studies of voting behaviour. Endogenous
variables have characteristics which in some way relate to previous de-
cisions, choices or outcomes of a process. For example, in a study of
voting behaviour, previous vote, being a previous decision, is an en-
dogenous variable; in the study of migration, duration of stay since the
last residential move is endogenous as it relates to previous migration
behaviour.
Endogenous variables may be seen as proxy variables for the many
unmeasured and unmeasurable factors which affect individual choice or
behaviour and which are therefore necessarily omitted from analyses.
Thus, voting choice may be seen as a proxy for individual social, eco-
nomic and psychological characteristics, while duration of stay in a lo-
cality is a proxy for all the unknown social and economic factors which
affect an individual’s propensity to move.
Endogenous variables create problems in statistical analyses because,
being related to the outcomes of the process of interest, they will, by
definition, be a function of the unobserved variables which govern the
process. They will therefore be correlated with the random variation
(or error structure) of the outcome. This leads to an infringement of
the basic regression model assumption that the explanatory variables
SabreR installation and commands, quadrature, estimation, endogeneity 227
included in the model are independent of the error term. A consequence
of this violation is the risk of substantial and systematic bias.
In the presence of endogenous variables, the basic statistical mod-
els are not robust against the infringement of assumptions. Expressed
technically, parameter estimation is not consistent; in other words, there
is no guarantee that the parameter estimates will approach their true
values as the sample size increases. Consistency is usually regarded as
the minimum requirement of an acceptable estimation procedure.
To avoid spurious relationships and misleading results, with endoge-
nous variables it is essential to use longitudinal data and models in which
there is control for omitted variables. Longitudinal data and, in particu-
lar, repeated measures on individuals are important because they provide
scope for controlling for individual-specific explanatory variables omit-
ted from the analysis. The conventional approach to representing the
effect of omitted variables is to add an individual-specific random term
to the linear predictor, and to include an explicit distribution for this
random term in the model.
There is no single agreed terminology for models which include this
random term. In econometrics, the models are called random effects
models. In epidemiology, health research and medicine, the models are
termed frailty models. Statisticians refer to them as multi-level mod-
els, mixed models, mixture models and heterogeneous models. Models
without random effects are sometimes called standard models and ho-
mogeneous models. An alternative terminology describes models without
random effects as marginal models, and models with random effects as
conditional models. Marginal models correspond closely to the ‘popula-
tion averaged’ formulations used in the generalized estimating equation
(GEE) literature.
It is important to note that, when interest focuses on the causal
relationship in social processes, inference can only be drawn by using
longitudinal data and models in which there is control for unobserved
(or residual) heterogeneity. Although this approach does not overcome
all the problems of cross-sectional analysis with endogenous variables,
there is ample evidence that it greatly improves inference.
This page intentionally left blank
Appendix B
Introduction to R for Sabre
B.1 Getting started with R
These notes are intended to be used in conjunction with the data and
material provided as part of the SabreR workshop and available on the
Sabre web page http://sabre.lancs.ac.uk. The data associated with
the examples and exercises are printed in truetype and the all paths
are relative to the top level directory Rlib.
Typically, anything you are asked to type into R will be shown in:
this font
and reference to any R output will be shown in:
this font
There are a number of online presentations which accompany this
book. Although they are not self-contained, they may be useful to have
to hand when using this material.
The manner in which R is installed, started and used may vary ac-
cording to the operating system, the user interface employed and, in
some cases, other localised system settings. The online resources for R
are excellent and the reader is encouraged to consult these if necessary.
B.1.1 Preliminaries
B.1.1.1 Working with R in interactive mode
When working with R interactively, commands are typed at the com-
mand prompt (the ‘>’ symbol). Pressing return at the end of the com-
mand causes it to be executed, R to print its response to the command
(if there is one), and a new prompt to be produced in readiness for the
next command. If the command is not complete, R prompts the user for
further input. Below is a sample of some basic interactive input:
229
230 Multivariate Generalized Linear Mixed Models Using R
> 2+2
[1] 4
> 7/3
[1] 2.333333
> 4*5
[1] 20
> 4**5
[1] 1024
> 4*/5
Error: syntax error, unexpected ’/’ in "4*/"
>
Note that each of the results is prefixed by [1]. The reason for this
will become clear later. Also, note that the expression 4*/5 is illegal, so
R prints some diagnostic information. It is possible to split a command
across more than one command line by pressing return before the com-
mand is complete. When this is done, the R prompt changes to a + to
indicate that further input is required from the user before execution
can take place. The example below demonstrates a calculation which
straddles two command lines:
> 2+6/
+ 4
[1] 3.5
>
This feature can be useful when constructing long commands. Note that
there are places where a command cannot be broken, for example, be-
tween consecutive digits of a number.
The results of commands can be stored for later use by assigning
them to variables. Assignment to variables in R is made using the <-
operator which is often referred to as the ‘gets’ operator. When a result
is assigned to a variable, the name of the variable can be used to access
the result. The following examples show how to store results using the
<- operator. They demonstrate assignment using <- and how the stored
results can be used in subsequent commands:
> x<-4+5
> y<-3
> z<-(x+y)/3
> z
[1] 4
> z<-x/y
> z
[1] 3
>
Introduction to R for Sabre 231
Note that, in these examples, when the result of a command is as-
signed to a variable, the result is not echoed to the console. This is
generally but not always the case. Typically, the value associated with
a variable can be displayed by issuing the variable name as a command,
as shown in the previous example. A variable can be reassigned using
the ‘gets’ operator (<-).
B.1.1.2 Basic functions
The default installation of R provides an extensive and diverse range
of functions for mathematics, statistics, plotting, graphics and string
manipulation, amongst many other areas. Functions are identified by a
name and the function arguments are in the form of a comma separated
list enclosed in ( ). A sample of function calls is shown below:
> 4.0*atan(1)
[1] 3.141593
> max(1,3,7,2,5)
[1] 7
> dnorm(x=0.0,mean=0.0,sd=0.1)
[1] 3.989423
> dnorm(0.0,sd=0.1)
[1] 3.989423
> rnorm(3,2.0,0.1)
[1] 1.940109 1.974019 2.094738
> rnorm(16,sd=0.1,mean=2)
[1] 2.005127 2.054042 1.994713 1.899572 1.924619 1.876174
[7] 1.956686 1.778670 1.921541 1.947481 2.216113 1.997847
[13] 2.012x951 1.817877 2.044500 2.171258
The number, type and position of the arguments that a function can
take is known as its argument signature and this defines how the func-
tion is used. The argument signature for the function dnorm used in
the following examples is (x, mean=0, sd = 1, log = FALSE). This
argument signature dictates that, when dnorm is invoked, at least one
argument, x, is required (x is called a mandatory argument). Addition-
ally, it states that three optional arguments can be provided and that
they take the values shown (the default values) when not specified. Typ-
ically, an argument is provided explicitly by typing its name followed by
an = sign and its value. When this is done, the argument can appear any-
where in the argument list. Arguments can also be provided implicitly by
just typing their values. The argument to which the value corresponds is
determined by its position relative to the arguments (provided explicitly
or implicitly) to its left in the argument list. The rule used to match a
232 Multivariate Generalized Linear Mixed Models Using R
value to an argument is such that all the invocations of dnorm shown
below are equivalent:
> dnorm(0.9,1.0,0.1)
[1] 2.419707
> dnorm(x=0.9,mean=1.0,sd=0.1)
[1] 2.419707
> dnorm(sd=0.1,0.9,mean=1.0)
[1] 2.419707
> dnorm(mean=1.0,0.9,0.1)
[1] 2.419707
>
When a mixture of implicit and explicit values is used, matching a
value to its intended argument can become complicated. For this reason,
it is recommended to only use implicit values for arguments when the
arguments are mandatory.
B.1.1.3 Getting help
R has a standard help facility which allows information about an R
command or setting to be obtained. The help system uses a common
format for all commands. Help information is obtained by help (topic)
where topic is the name of a function or search object. To find out more
information regarding the use of help, type help (help).
It is recommended that, each time a new function is introduced in
this book, the reader uses the help function to get further information
regarding its use.
B.1.1.4 Stopping R
To terminate an R session, use the function quit( ). When quit is used
with no arguments, R asks if it is required to save the workspace image.
Answering in the positive will allow the current session to be recovered
for future use.
B.1.2 Creating and manipulating data
B.1.2.1 Vectors and lists
R supports object orientation which allows variables (and thus the values
to which they associate) to be manipulated in a manner that reflects the
variables. In fact, a variable in R is more properly referred to as an
object. Two of the most important objects provided by R are the vector
and the list.
Introduction to R for Sabre 233
B.1.2.2 Vectors
An R vector can be considered as an ordered set of values of the same
type. It is possible to have vectors of integers, floating point values,
Boolean types (TRUE or FALSE) and other basic types, but each
vector must contain only one type. The example below demonstrates
how to create a vector and shows how the elements of a vector can be
accessed and assigned:
> x<-c(1,5,3,7)
> x[3]
[1] 3
> x[1]
[1] 1
> x
[1] 1 5 3 7
> x[2]<-8
> x
[1] 1 8 3 7
>
Access and assignment is by use of the indexing operator [ ]. The
argument to [ ] is an integer value indicating the position of the desired
element in the vector (the first element of a vector corresponding to the
value 1). Providing an index for a non-existent entry results in the value
‘NA’ being returned. Assigning a value to an element that is not yet in
the vector creates the element and assigns any unassigned elements with
index lower than the provided index to ‘NA’. There are many functions
which take vectors for arguments and/or return vectors as results. For
example, the functions rep and seq are useful for creating vectors with
structure, as shown in the examples below. Also shown in the last of
these examples is the : operator which provides a simple shorthand for
the creation of a vector:
> x<-rep(1,10)
> x
[1] 1 1 1 1 1 1 1 1 1 1
> y<-seq(0,20,5)
> y
[1] 0 5 10 15 20
> z<-4:7
> z
[1] 4 5 6 7
>
234 Multivariate Generalized Linear Mixed Models Using R
B.1.2.3 Vector operations
Many of the functions that are supplied with R operate on vectors even
though they are defined nominally for a single scalar argument. For
example, the trigonometric functions such as sin and cos can act on
vectors. The action of the function in this case is to return a vector of
values containing the result of the function acting on each of the elements
of its vector argument. The standard arithmetic operators also support
vectors. For example, given two vectors x and y, then x + y is defined
and the result is to add the elements of x and y pairwise. If the vectors
are of different length, then the operation is only defined when the length
of the longer vector is a multiple of the length of the shorter vector. In
this case, the operation repeats the values of the shorter vector along
the longer vector. The examples below illustrate the use of vectors in
functions and with operators:
> x
[1] 1 2 3
> sin(x)
[1] 0.8414710 0.9092974 0.1411200
> y
[1] 4 5 6 7 8 9
> x+y
[1] 5 7 9 8 10 12
> y+x
[1] 5 7 9 8 10 12
> 3*x>y
[1] FALSE TRUE TRUE FALSE FALSE FALSE
>
The [ ] operator can also take a vector of values. When the vector
is of type ‘integer’, the result of the operation is to return the result of
[ ] with an index corresponding to each element of the argument. Alter-
natively, the vector can be of type ‘boolean’, in other words, its elements
are either TRUE or FALSE. When this is the case, an element of the
underlying vector is returned only if the corresponding element of the ar-
gument vector is TRUE. The values of the argument vector are wrapped
over the elements of the vector being indexed. This latter mechanism is
extremely powerful when used in conjunction with relational operators,
as demonstrated in the examples below:
> x<-c(1,4,3,6,5,4,9)
> x[c(1,3,5)]
[1] 1 3 5
> x[c(TRUE,FALSE)]
Introduction to R for Sabre 235
[1] 1 3 5 9
> x[c(TRUE,FALSE,FALSE)]
[1] 1 6 9
> x>5
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE
> x[x>5]
[1] 6 9
B.1.2.4 Lists
Lists are similar to vectors but have two major differences. First, a list
can be inhomogeneous, that is, its elements can be of differing types.
Second, a list, if required, can associate names with its elements. A
simple way of creating a list is to use the list function. Elements of a list
can be accessed by an index using the [[ ]] operator. Some examples
of list creation and indexing are shown below:
> x<-list(1,2,3)
> x<-list(x,"hello world")
> x
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 2
[[1]][[3]]
[1] 3
[[2]]
[1] "hello world"
> x[[1]]
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
> x[[1]][[2]]
[1] 2
> x[[2]]
[1] "hello world"
>
A further example of list creation and indexing follows:
> x<-list(height=1.68,Montague=TRUE)
236 Multivariate Generalized Linear Mixed Models Using R
> y<-list(height=1.51,Montague=FALSE)
> characters<-list(Romeo=x,Juliet=y)
> characters[[1]]
$height
[1] 1.68
$Montague
[1] TRUE
> characters$Juliet$Montague
[1] FALSE
Notice that, in this example, the list characters is a list of lists.
B.1.2.5 Data frames
One of the most commonly employed data structures in R is the data
frame. A data frame can be used to represent tabular data in which each
named column of the table may have a different type1 . Essentially, a data
frame is a list of lists, but it has some additional attributes, such as row
names. Perhaps the easiest way to create a data frame is by employing
the data.frame function. Elements of a data frame can be accessed as
if the data frame were a list of lists, but also by using the [ ] operator
with two integers which indicate the row and column numbers. Below
are some examples of data frame creation and indexing:
> sample.points<-seq(0,1,0.1)
> simulated.data<-data.frame(x=rep(sample.points,10),
+ y=rnorm(110,2*sample.points,0.1))
> simulated.data[1:3,]
x y
1 0.0 -0.02896063
2 0.1 0.22258147
3 0.2 0.43555200
> simulated.data[,1][5:7]
[1] 0.4 0.5 0.6
> simulated.data$x[1:3]
[1] 0.0 0.1 0.2
Here, the data frame is constructed by specifying the names of the
columns (x and y) and by setting the contents of these columns using
two vectors generated from the vector sample.points. The example also
shows how a subset of the rows of the table can be accessed by using the
1 It is assumed here that a table is such that each column has an equal number of
rows.
Introduction to R for Sabre 237
[ ] operator. In this example, the first argument to [ ], 1:3, indicates
that the first three rows are to be accessed and that omitting the second
argument indicates that all the columns are to be accessed. Similarly,
simulated.data[,1] accesses the first column of the data frame. A col-
umn can also be selected by using its name as shown. Note that the
type of a column is a vector (all the entries in a column are of the same
type) and thus can be manipulated using the [ ] operator, as detailed
in Section B.1.2.1.
A useful function for simultaneously subsetting a data frame by rows
and columns is the subset function. This function takes a data frame,
a logical condition which the rows must satisfy, and a vector of column
names to select. The example below demonstrates how the subset func-
tion might be used:
> some.data
x y
1 1 17
2 2 16
3 3 15
4 4 14
5 5 13
> subset(some.data,some.data$x != 3,select="y")
y
1 17
2 16
4 14
5 13
>
B.1.3 Session management
B.1.3.1 Managing objects
During an R session, many different objects may be created and modi-
fied. Consequently, it is necessary to have a way of keeping track of what
objects exist and to know how to delete them when they are no longer
required. To determine what objects are available, the ls function can
be used. To remove unwanted objects, use rm.
B.1.3.2 Attaching and detaching objects
It is often the case that most of the data that are referred to in an R
session are associated with columns of a data frame. When this is the
case, it can become somewhat tedious to keep prefixing the names of the
238 Multivariate Generalized Linear Mixed Models Using R
columns with the name of the data frame when accessing the data. The
process can be much simplified by attaching the data frame. Once this
has been done, the names of the columns of the data frame can be used
to access the data directly. The example below demonstrates how a data
frame can be accessed when it has been attached. Once attached, a data
frame can be detached, as shown in the example:
> some.data<-data.frame(x=seq(1,5,1),y=seq(17,13,-1))
> some.data
x y
1 1 17
2 2 16
3 3 15
4 4 14
5 5 13
> x
Error: object "x" not found
> some.data$x
[1] 1 2 3 4 5
> attach(some.data)
> x
[1] 1 2 3 4 5
> detach(some.data)
>
B.1.3.3 Serialization
There are a number of different ways of loading and saving data in R.
One way is to save your R session when stopping R (see Sub-section
B.1.1.4). However, this is not a very selective way of serializing data.
Using the save function allows only selected objects to be saved to a
named file which can then be reloaded using the load function. For
more complicated structures, such as data frames, it is often important
to have greater control over the format of the serialization. For example,
it may not be necessary to save the column names, or it might be required
to import data into a data frame from comma or tab separated ASCII
data. When working with data frames, this fine control can be obtained
by employing the read.table and write.table functions.
B.1.3.4 R scripts
It is often the case that large numbers of R commands are used together
to perform a single task, and that this task is often repeated. When this
is the case, it is possible to create a text file containing the R commands
which can be loaded into and executed by R. Such a file is often referred
Introduction to R for Sabre 239
to as an R script. An R script can be loaded and executed in R by using
the source function.
B.1.3.5 Batch processing
An alternative to using the source function is to load and to execute R in
batch mode. In this mode, R starts then automatically sources a specified
file containing an R script. The output produced whilst executing the R
script is serialized to an output file, which is also specified when starting
R. To run R in batch mode, use R CMD BATCH infile [outfile].
When running R in batch mode, it is possible to pass additional
options that define the environment used in R. To obtain further infor-
mation regarding these, use R CMD BATCH –help and R –help.
B.1.4 R packages
The basic functionality provided by an installation of R depends on how
it has been installed. In general, all of the functions introduced in this
book are available by default. Much of the utility of R stems from the
ease with which additional functions and methods can be added to R
by third parties. When this is done, it is common to collect these meth-
ods together into an R package. Packages are constructed in a standard
manner so that they are easy to distribute and to share across different
installations of R and different operating systems.
B.1.4.1 Loading a package into R
Individual packages can be installed on a system for use by R, but it
is not usual to have R load these packages when R is started. There
are simply too many different packages to make this a sensible option.
Consequently, it is necessary for the user to select individual packages
to load after R has been started. Package management is achieved by
using the library function. To see which packages are available to R,
use library().
This will print a list of all the available packages. To load a particular
library, use library( package.name ) where package.name is the name
of the required package.
B.1.4.2 Installing a package for use in R
Sometimes, a package is required which is not currently installed for
loading into R. A package can be installed directly from an R package
repository such as CRAN, or from an archive file which is typically
compressed. A package can be installed from within an R session by using
240 Multivariate Generalized Linear Mixed Models Using R
the install.packages function. This function can be used in a number
of different ways depending on where the package is being obtained from
and where it is to be installed.
B.1.4.3 R and Statistics
R is primarily a data manipulation program, which is why some of the
basic data structures offered by R have been examined in Section B.1.2.
However, R is most commonly used for, and is typically associated with,
statistical modelling. A basic installation of R comes complete with a
vast range of statistical functions and many means of summarizing, dis-
playing and exploring data. This includes lm for the linear regression
model and the function for generalized linear modelling, glm, which al-
lows the user to specify, amongst other things, the distribution and the
link function to be used in the regression model. For a more extensive
overview of lm and glm, see [27, 31, 104].
In addition, there is a large and active community of people
who contribute to developing additional methods and libraries for
R, referred to as R packages, which can very easily be obtained
and installed from a number of online repositories and their mir-
rors. The SabreR package can be obtained from the Sabre web page
http://www.sabre.lancs.ac.uk/.
B.2 Data preparation for SabreR
SabreR specializes in the estimation of random effects and fixed effects
models, and only has a few commands for performing simple transfor-
mations. For instance, SabreR does not have the facilities for handling
data with missing values, or for reshaping data, so these activities are
best performed within R, as we shall explain in the rest of this appendix.
In the R code which follows, a line beginning with a # is a comment line.
B.2.1 Creation of dummy variables
Example B.2.1.1 Essay grading
Johnson and Albert [66] analyzed data on the grading of the same es-
say by five experts. Essays were graded on a scale of 1 to 10, with 10 being
excellent. We use the subset of data limited to the grades from graders 1
to 5 on 198 essays, making a total of 990 observations (essays.tab). The
Introduction to R for Sabre 241
dataset comprises 11 variables, including essay identifier (essay) which
takes the values 1,2,. . . ,198, and grader identifier (grader) with values
1,2,3,4,5. The response of interest, essay grade (grade), is measured on a
scale from 1 to 10, though the highest grade is actually 8 in this dataset.
We wish to relate essay grade to a range of essay characteristics: av-
erage word length (wordlength); square root of the number of words
(sqrtwords); number of commas times 100 and divided by the number
of words (commas); percentage of spelling errors (errors); percentage of
prepositions (prepos); average length of sentences (sentlength). The
first 20 lines of essays.tab are presented in Figure B.1.
essay grader grade rating cons wordlength sqrtwords commas errors prepos sentlength
1 1 8 8 1 4.76 15.46 5.60 5.55 8.00 19.53
2 1 7 7 1 4.24 9.06 3.60 1.27 9.50 16.38
3 1 2 2 1 4.09 16.19 1.10 2.61 14.00 18.43
4 1 5 5 1 4.36 7.55 1.80 1.81 0.00 14.65
5 1 7 7 1 4.31 9.64 2.30 0.00 10.00 18.72
6 1 8 10 1 4.51 11.92 1.30 0.00 11.10 20.00
7 1 5 5 1 3.94 8.54 2.80 0.00 13.80 23.75
8 1 2 2 1 4.04 7.21 0.00 0.00 5.90 25.43
9 1 5 5 1 4.24 7.68 5.30 1.72 14.00 28.25
10 1 7 7 1 4.31 8.83 1.30 1.27 14.70 19.28
11 1 5 5 1 4.31 8.77 0.00 1.30 8.00 10.72
12 1 7 7 1 4.69 8.89 3.80 1.31 8.00 13.38
13 1 5 5 1 4.10 8.66 0.00 1.40 5.50 23.71
14 1 6 6 1 4.80 9.69 3.20 7.44 10.90 15.19
15 1 3 3 1 4.06 10.10 1.00 4.08 13.00 24.72
16 1 6 6 1 4.33 13.82 2.10 1.61 11.60 24.05
17 1 5 5 1 4.13 7.55 3.60 0.00 9.00 28.74
18 1 4 4 1 4.07 6.93 2.10 0.00 4.30 15.38
19 1 2 2 1 4.98 6.40 5.20 7.74 12.70 12.74
FIGURE B.1
First few lines of essays.tab
To load this dataset into R, type:
essays<-read.table(file="/Rlib/SabreRCourse/data/essays.tab")
attach(essays)
where “/Rlib/SabreRCourse/data/essays.tab” is the source of the
data. If we want to create a binary indicator (dummy or grouping) vari-
able for those essays that obtained a grade of 5 or over, compared to
those essays that were awarded a grade of less than 5, we use the com-
mand:
pass<-1*(grade>=5)
242 Multivariate Generalized Linear Mixed Models Using R
The variable grader identifies different examiners and takes the val-
ues 1,2,3,4,5. To create dummy variables for examiners 2 to 5, type:
grader2<-1*(grader==2)
grader3<-1*(grader==3)
grader4<-1*(grader==4)
grader5<-1*(grader==5)
To add these new indicator variables to essays.tab, use the com-
mand:
essays2<-cbind(essays,pass,grader2,grader3,grader4,grader5)
To save the data as the essays2 object, type:
write.table(essays2,
"/Rlib/SabreRCourse/examples/appendixB/essays2.tab")
The first few lines of the new dataset essays2.tab are shown in
Figure B.2.
essay grader grade rating constant wordlength sqrtwords commas errors prepos sentlength pass grader2 grader3 grader4 grader5
1 3 8 8 1 4.76 15.46 5.60 5.55 8 19.53 1 0 1 0 0
1 1 8 8 1 4.76 15.46 5.60 5.55 8 19.53 1 0 0 0 0
1 4 8 8 1 4.76 15.46 5.60 5.55 8 19.53 1 0 0 1 0
1 2 6 8 1 4.76 15.46 5.60 5.55 8 19.53 1 1 0 0 0
1 5 5 8 1 4.76 15.46 5.60 5.55 8 19.53 1 0 0 0 1
2 2 5 7 1 4.24 9.06 3.60 1.27 9.5 16.38 1 1 0 0 0
2 4 5 7 1 4.24 9.06 3.60 1.27 9.5 16.38 1 0 0 1 0
2 3 3 7 1 4.24 9.06 3.60 1.27 9.5 16.38 0 0 1 0 0
2 1 7 7 1 4.24 9.06 3.60 1.27 9.5 16.38 1 0 0 0 0
2 5 3 7 1 4.24 9.06 3.60 1.27 9.5 16.38 0 0 0 0 1
3 5 1 2 1 4.09 16.19 1.10 2.61 14 18.43 0 0 0 0 1
3 1 2 2 1 4.09 16.19 1.10 2.61 14 18.43 0 0 0 0 0
3 4 1 2 1 4.09 16.19 1.10 2.61 14 18.43 0 0 0 1 0
3 2 1 2 1 4.09 16.19 1.10 2.61 14 18.43 0 1 0 0 0
3 3 1 2 1 4.09 16.19 1.10 2.61 14 18.43 0 0 1 0 0
4 4 5 5 1 4.36 7.55 1.80 1.81 0 14.65 1 0 0 1 0
FIGURE B.2
First few lines of new dataset essays2.tab
The dataset can now be read directly into SabreR. You are asked
to use SabreR to analyze the data stored in essays2.tab in Exercises
3.4.1 and 6.7.1.
Introduction to R for Sabre 243
B.2.2 Missing values
Example B.2.2.1 Repeating a grade
Raudenbush and Bhumirat [92] analyzed data on children repeating
a grade during their time at primary school. The data were from a na-
tional survey of primary education in Thailand in 1988. We use a subset
of the data (thaieduc.tab) which comprises 8,582 observations (rows)
and 5 variables (columns): schoolid: school identifier; sex: 1 if child
is male, 0 otherwise; pped: 1 if the child had pre-primary experience, 0
otherwise; repeat: 1 if the child repeated a grade during primary school,
0 otherwise; msesc: mean pupil socio-economic status at the school level.
A sample of the data is given in Figure B.3.
schoolid sex pped repeat. msesc
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 0 1 0 NA
10101 1 1 0 NA
10101 1 1 0 NA
10101 1 1 0 NA
10101 1 1 0 NA
10102 0 0 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 0 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10102 1 1 0 NA
10103 0 0 0 0.88
10103 0 0 0 0.88
10103 0 1 0 0.88
10103 0 1 0 0.88
10103 0 1 0 0.88
FIGURE B.3
First few lines of thaieduc.tab
Figure B.3 shows that the dataset thaieduc.tab contains a school-
level variable msesc which has missing values denoted by NA. To read
the data into R, type:
thaieduc<-read.table(file="/Rlib/SabreRCourse/data/thaieduc.tab")
attach(thaieduc)
Type:
summary(thaieduc)
244 Multivariate Generalized Linear Mixed Models Using R
in order to produce the following summary statistics for each variable in
the data frame:
schoolid sex pped
Min. : 10101 Min. :0.0000 Min. :0.0000
1st Qu.: 70211 1st Qu.:0.0000 1st Qu.:0.0000
Median :120103 Median :1.0000 Median :1.0000
Mean :112184 Mean :0.5054 Mean :0.5054
3rd Qu.:150543 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :180665 Max. :1.0000 Max. :1.0000
repeat msesc
Min. :0.0000 Min. :-7.700e-01
1st Qu.:0.0000 1st Qu.:-2.800e-01
Median :0.0000 Median :-4.000e-02
Mean :0.1451 Mean : 9.674e-03
3rd Qu.:0.0000 3rd Qu.: 2.625e-01
Max. :1.0000 Max. : 1.490e+00
NA’s : 1.066e+03
To count the number of missing values for each variable, type:
apply(apply(thaieduc,2,is.na),2,sum)
which gives the following output:
schoolid sex pped repeat. msesc
0 0 0 0 1066
This output indicates that there are 1,066 rows with NA in the data set
thaieduc.tab. For analyses which do not involve msesc, we can drop
this variable from the dataset by typing:
thaieduc1<-subset(thaieduc,
select=c("schoolid","sex","pped","repeat."))
The new object thaieduc1 can be saved for later use with the command:
write.table(thaieduc1,
"/Rlib/SabreRCourse/examples/appendixB/thaieduc1.tab")
The dataset thaieduc1.tab contained within this new object has 8,582
observations on 4 variables. For analyses which do involve msesc, we need
to drop, from the original dataset thaieduc.tab, all those observations
for which the value of the variable msesc is missing. In order to achieve
this, type:
thaieduc2<-na.omit(thaieduc)
Introduction to R for Sabre 245
The new object thaieduc2 can be saved for later use with the command:
write.table(thaieduc2,
"/Rlib/SabreRCourse/examples/appendixB/thaieduc2.tab")
The dataset thaieduc2.tab contained within this new object has 7,516
observations on five variables. These datasets can now be read directly
into SabreR. The application of SabreR to the datasets thaieduc1.tab
and thaieduc2.tab is illustrated in Examples 3.1.6 and 6.6.1 respec-
tively.
B.2.3 Creating lagged response covariate data
Example B.2.3.1 Depression
In this example, we illustrate how to create the first-order lagged
response and baseline covariate data for a dataset with zero-order re-
sponses. We do this for a set of seasonal data on the incidence of depres-
sion. The grouped data shown in Table 14.1 were collected in a one-year
panel study of depression and help-seeking behaviour in Los Angeles
[81].
Adults were interviewed during the spring and summer of 1979 and
re-interviewed at three-month intervals. A respondent was classified as
being depressed if they scored more than 16 on a 20-item list of symp-
toms. By its very nature, depression is difficult to overcome, suggesting
that state dependence (see Chapter 14) might explain at least some of
the observed temporal dependence. We start with the ungrouped data
(depression0.tab) in which the binary response (s) takes value 1 if the
respondent with individual identifier ind is depressed in season t, and
the response takes value 0 otherwise. Thus, each of the 752 respondents
contributes up to 4 observations (rows) in the dataset, resulting in a to-
tal of 2,256 observations. The first 20 rows of data in depression0.tab
are presented in Figure B.4.
We have no information about the levels of depression in the respon-
dents prior to the study. So, when taking first-order state dependence
(see Chapter 14) into account, the lagged response is undefined for the
first observation of each respondent. To overcome this difficulty, we con-
struct two first-order versions of the data. One version will include the
initial response (t=1) and the subsequent responses (t=2,3,4). The sub-
sequent responses will have a lagged response and the initial response
(baseline) included as covariates (explanatory variables). This dataset
can be used in the joint modelling of the initial and subsequent re-
sponses (see Chapter 14, Sections 14.11 to 14.19). The other version of
246 Multivariate Generalized Linear Mixed Models Using R
ind t s
1 1 0
1 2 0
1 3 0
1 4 0
2 1 0
2 2 0
2 3 0
2 4 0
3 1 0
3 2 0
3 3 0
3 4 0
4 1 0
4 2 0
4 3 0
4 4 0
5 1 0
5 2 0
5 3 0
FIGURE B.4
Ungrouped depression data (depression0.tab)
the data will be similar to the first, with the exception that we drop the
initial response (t=1). This second version of the data can be used in a
conditional analysis (see Chapter 14, Sections 14.7 to 14.10).
To read the data into R, type:
depression0<-read.table(file=
"/Rlib/SabreRCourse/data/depression0.tab")
attach(depression0)
where “/Rlib/SabreRCourse/data/depression0.tab” is the source of
the data.
The dataset depression0.tab contains the variable s which indi-
cates whether or not the respondent is depressed. To create the baseline
covariate (initial response) (s1) for t=1,2,3,4, and to set the value of this
covariate to 0 for the initial response (t=1), use the commands:
s1<-rep(depression0$s[depression0$t==1],each=4)
s1[t==1]<-0
To create the first-order lagged response covariate for t=2,3,4, type:
s.lag1<-rep(0,nrow(depression0))
This command puts a 0 in each row of a column called s.lag1, which
has the same length as s and s1. Then, use:
Introduction to R for Sabre 247
s.lag1[t>1]<-s[t<4]
to cycle though s.lag1 for t>1 while cycling through s for t<4, so
that when t=2, s.lag1 will take the value of s for t=1, and so on for
t=3 and 4. To create a new object (depression) with the original and
new variables, use:
depression<-cbind(depression0,s1,s.lag1)
To save this depression object as depression.tab, use:
write.table(depression,
"/Rlib/SabreRCourse/examples/appendixB/depression.tab")
The resulting file takes the form as displayed in Figure B.5.
ind t s s1 s.lag1
690 1 1 0 0
690 2 0 1 1
690 3 1 1 0
690 4 0 1 1
691 1 1 0 0
691 2 0 1 1
691 3 1 1 0
691 4 0 1 1
692 1 1 0 0
692 2 0 1 1
692 3 1 1 0
692 4 1 1 1
693 1 1 0 0
693 2 0 1 1
693 3 1 1 0
693 4 1 1 1
694 1 1 0 0
694 2 0 1 1
694 3 1 1 0
694 4 1 1 1
FIGURE B.5
First few lines of depression.tab
For the conditional analysis, discard the initial response; in other
words, use:
depression2<-subset(depression,t>1)
To save the depression2 object as depression2.tab, use:
write.table(depression2,
"/Rlib/SabreRCourse/examples/appendixB/depression2.tab")
248 Multivariate Generalized Linear Mixed Models Using R
ind t s s1 s.lag1
690 2 0 1 1
690 3 1 1 0
690 4 0 1 1
691 2 0 1 1
691 3 1 1 0
691 4 0 1 1
692 2 0 1 1
692 3 1 1 0
692 4 1 1 1
693 2 0 1 1
693 3 1 1 0
693 4 1 1 1
694 2 0 1 1
694 3 1 1 0
694 4 1 1 1
695 2 0 1 1
695 3 1 1 0
695 4 1 1 1
696 2 0 1 1
696 3 1 1 0
FIGURE B.6
First few lines of new dataset depression2.tab
The resulting file takes the form shown in Figure B.6.
These datasets can now be read directly into SabreR. The use of
SabreR to analyze these datasets is illustrated in Chapter 14. Condi-
tional analyses are performed on the data depression2.tab in Sections
14.8 and 14.10. Joint models are fitted to the data depression1.tab in
Sections 14.13, 14.15, 14.17 and 14.19.
References
[1] J. Abowd, F. Kramarz and D. Margolis. High wage workers and
high wage firms. Econometrica, 67:251–333, 1999.
[2] J. Abowd, R. Creecy and F. Kramarz. Computing person and firm
effects using linked longitudinal employer-employee data. Techni-
cal Paper 2002-06, U.S. Census Bureau, April 2002.
[3] M. Aitkin. A general maximum likelihood analysis of overdis-
persion in generalized linear models. Statistics and Computing,
6:251–262, 1996.
[4] M. Aitkin. A general maximum likelihood analysis of variance
components in generalized linear models. Biometrics, 55:218–234,
1999.
[5] P.S. Albert and D.A. Follmann. Modeling repeated count data
subject to informative dropout. Biometrics, 56:667–677, 2000.
[6] M. Alfò and M. Aitkin. Variance component models for longitudi-
nal count data with baseline information: Epilepsy data revisited.
Statistics and Computing, 16:231–238, 2006.
[7] E.B. Andersen. Conditional Inference and Models for Measuring.
Mentallhygiejnisk Forlag, Copenhagen, 1973.
[8] T.W. Anderson and C. Hsiao. Estimation of dynamic models with
error components. JASA, 76:598–606, 1981.
[9] M.J. Andrews, L. Gill, T. Schank and R. Upward. High wage work-
ers and low wage firms: Negative assortative matching or limited
mobility bias? Journal of the Royal Statistical Society Series A,
171:673–697, 2008.
[10] M.J. Andrews, S. Bradley, D. Stott and R. Upward. Successful
employer search? An empirical analysis of vacancy duration using
micro data. Economica, 75: 455–480, 2007.
[11] M. Andrews, T. Schank and R. Upward. Practical fixed effects
estimation methods for the three-way error components model.
Stata Journal, 6:461–481, 2006.
249
250 References
[12] B.H. Baltagi. Econometric Analysis of Panel Data. John Wiley &
Sons, Chichester, UK, 2005.
[13] B.H. Baltagi and D. Levin. Cigarette taxation: raising revenues
and reducing consumption. Structural Change and Economic Dy-
namics, 3:321–335, 1992.
[14] G.E. Bates and J. Neyman. Contributions to the theory of accident
proneness, i, an optimistic model of the correlation between light
& severe accidents, ii, true or false contagion. Pub stat, 1952.
[15] Y. Ben-Porath. Labour force participation rates and the supply of
labour. Journal of Political Economy, 81:697–704, 1973.
[16] E.R. Berndt, B. Hall, R. Hall and J.A. Hausman. Estimation and
inference in nonlinear structural models. Annals of Economic and
Social Measurement, 3:653–666, 1974.
[17] A. Bhargava and J.D. Sargan. Estimating dynamic random effects
models from panel data covering short time periods. Econometrica,
51:1635–1657, 1983.
[18] J.M. Bland and D.G. Altman. Statistical methods for assessing
agreement between two methods of clinical measurement. The
Lancet, 1:307–310, 1986.
[19] G.J. Borjas and G.T. Sueyoshi. A two-stage estimator for probit
models with structural group effects. Journal of Econometrics,
64:165–182, 1994.
[20] S. Bradley, M.J. Andrews, D. Stott and R. Upward. Testing theo-
ries of labour market matching. Working Papers 005434, Lancaster
University Management School, Department of Economics, 2007.
[21] N.E. Breslow and D. Clayton. Approximate inference in gener-
alized linear mixed models. Journal of the American Statistical
Association, 88:9–25, 1993.
[22] A. Cameron and P.K. Trivedi. Regression Analysis of Count
Data. Cambridge University Press, Cambridge, Econometric So-
ciety Monograph, 30, 1998.
[23] A.C. Cameron, P.K. Trivedi, F. Milne and J. Piggott. A micro-
econometric model of the demand for health care and health insur-
ance in Australia. Review of Economic Studies, 55:85–106, 1988.
References 251
[24] J.B. Carlin and T.A. Louis. Bayes and Empirical Bayes Methods
for Data Analysis. Chapman & Hall, New York, second edition,
2000.
[25] G. Chamberlain. Analysis of covariance with qualitative data. Re-
view of Economic Studies, 47:225–238, 1980.
[26] D.R. Cox and N. Reid. Parameter othogonality and approximate
conditional inference (with discussion). Journal of the Royal Sta-
tistical Society B, 49:1–39, 1987.
[27] M.J. Crawley. Statistical Computing. An Introduction to Data
Analysis Using S-Plus. Wiley, Chichester, 2002.
[28] R. Crouchley and A. Pickles. An empirical comparison of condi-
tional and marginal likelihood methods in a longitudinal study.
Sociological Methodology, 19:161–183, 1989.
[29] R. Crouchley and R.B. Davies. A comparison of population average
and random effects models for the analysis of longitudinal count
data with baseline information. Journal of the Royal Statistical
Society A, 162:331–347, 1999.
[30] R. Crouchley and R.B. Davies. A comparison of gee and random
effects models for distinguishing heterogeneity, nonstationarity and
state dependence in a collection of short binary event series. Sta-
tistical Modelling, 1:271–285, 2001.
[31] P. Dalgaard. Introductory Statistics with R. Springer-Verlag, New
York, 2002.
[32] D.T. Danahy, D.T. Burwell, W.S. Aronow and R. Prakash. Sus-
tained hermodynamic and antianginal effect of high dose oral
isosorbide dinitrate. Circulation, 55:381–387, 1977.
[33] R.B. Davies. Statistical modelling for survey analysis. Journal of
the Market Research Society, 35:235–247, 1993.
[34] R.B. Davies and R. Crouchley. The determinants of party loyalty: a
disaggregate analysis of panel data from the 1974 and 1979 general
elections in England. Political Geography Quarterly, 4:307–320,
1985.
[35] R.B. Davies and R. Crouchley. The mover-stayer model: Requi-
escat in pace. Sociological Methods and Research, 14:356–380, 1986.
[36] R.B. Davies, P. Elias and R. Penn. The relationship between a
husband’s unemployment and his wife’s participation in the labour
252 References
force. Oxford Bulletin of Economics and Statistics, 54:145–171,
1992.∗∗∗
[37] P.J. Diggle, K.Y. Liang and S.L. Zeger. Analysis of Longitudinal
Data. Clarendon Press, Oxford, 1994.
[38] A.J. Dobson. An Introduction to Generalized Linear Models. Wi-
ley, New York, 1991.
[39] G. Dunn. Design and analysis of reliability studies. Statistical
Methods in Medical Research, 1:123–157, 1992.
[40] C. Elbers and G. Ridder. True and spurious duration dependence:
The identifiability of the proportional hazards model. Review of
Economics Studies, 49:402–410, 1982.
[41] C.L. Garner and S.W. Raudenbush. Neighbourhood effects on edu-
cational attainment: A multilevel analysis of the influence of pupil
ability, family, school and neighbourhood. Sociology of Education,
64:252–262, 1991.
[42] D.P. Goldberg. The Detection of Psychiatric Illness by Question-
naire. Oxford University Press, Oxford, 1972.
[43] H. Goldstein. Multilevel Models in Educational and Social Re-
search. Griffin, London, 1987.
[44] H. Goldstein. Multilevel Statistical Models. Arnold, London, third
edition, 2003.
[45] L.A. Goodman. Statistical methods for the mover stayer model.
Journal of the American Statistical Association, 56:841–868,
1961.∗∗∗
[46] W. Greene. Accounting for excess zeros and sample selection in
Poisson and negative binomial regression models. Working Paper
EC-94-10, 1994.
[47] B. Hall, J. Hausman and Z. Griliches. Econometric models for
count data with an application to the patents – R&D relationship.
Econometrica, 52:909–938, 1984.
[48] B. Hall, Z. Griliches and J. Hausman. Patents and R&D: Is there
a lag? International Economic Review, 27:265–283, 1986.
[49] Harwell Subroutine Library (HSL), A collection of Fortran codes
for large-scale scientific computation, 2007.
References 253
[50] J.A. Hausman. Specification tests in econometrics. Econometrica,
46:1251–1271, 1978.
[51] J. Hausman and W. Taylor. Panel data and unobservable individ-
ual effects. Econometrica, 49:1377–1398, 1981.
[52] J.J. Heckman. The Incidental Parameters Problem and the Prob-
lem of Initial Conditions in Estimating a Discrete Time-Discrete
Data Stochastic Process. MIT Press, Cambridge, MA, 1981.∗∗∗
[53] J.J. Heckman. Statistical Models for Discrete Panel Data. MIT
Press, Cambridge, MA, 1981.∗∗∗
[54] J.J. Heckman. Micro data, heterogeneity and the evaluation of
public policy: Nobel lecture. Journal of Political Economy, 109:
673–748, 2001.
[55] J.J. Heckman and B.E. Honore. The identifiability of the compet-
ing risks model. Biometrika, 76:325–330, 1988.
[56] J.J. Heckman and B. Singer. Econometric duration analysis. Jour-
nal of Econometrics, 24:63–132, 1984.
[57] J.J. Heckman and B. Singer. The identifiability of the proportional
hazards model. Review of Economics Studies, 51:231–241, 1984.
[58] J.J. Heckman and B. Singer. A method for minimizing the impact
of distributional assumptions in econometric models of duration
data. Econometrica, 52:271–320, 1984.
[59] J.J. Heckman and R.J. Willis. A beta logistic model for the analy-
sis of sequential labor force participation by married women. Jour-
nal of Political Economy, 85:27–58, 1977.
[60] D. Hedeker. Mixno: A computer program for mixed effects logistic
regression. Journal of Statistical Software, 4:1–92, 1999.
[61] H. Holzer, R. Block, M. Cheatham and J. Knott. Are training
subsidies effective? The Michigan experience. Industrial and Labor
Relations Review, 46:625–636, 1993.
[62] P. Hougaard. A class of multivariate failure time distributions.
Biometrika, 73:671–678, 1986.
[63] P. Hougaard. Survival models for heterogenous populations de-
rived from stable distributions. Biometrika, 73:387–396, 1986.
[64] J. Hox. Multilevel Analysis Techniques and Applications. Lawrence
Erlbaum Associates, London, 2002.
254 References
[65] C. Hsiao. Analysis of Panel Data. Cambridge University Press,
Cambridge, 1986.
[66] V.E. Johnson and J.H. Albert. Ordinal Data Modelling. Springer,
New York, 1999.
[67] G. Kauermann and P. Khomski. Additive two way hazards model
with varying coefficients. Computation Statistics & Data Analysis,
51(3): 1944–1956, 2006.
[68] G. Kauermann and P. Khomski. Full time or part time reemploy-
ment: A competing risk model with frailties and smooth effects
using a penalty based approach. Journal of Computational and
Graphical Statistics, 18:106–125, 2009.
[69] I. Kazemi and R. Crouchley. Modelling the initial conditions in dy-
namic regression models of panel data with random effects, Chap-
ter 4, in B.H. Baltagi, ed., Panel Data Econometrics, Theoretical
Contributions and Empirical Applications. Elsevier, Amsterdam,
Netherlands, 2006.
[70] G.G. Koch, G.J. Carr, I.A. Amara, M.E. Stokes and T.J. Uryniak.
Categorical Data Analysis. in Statistical Methodology in the Phar-
maceutical Sciences, pp. 391–475, ed. D.A. Berry, Marcel Dekker,
New York, 1990.
[71] D. Lambert. Zero-inflated Poisson regression, with an application
to defects in manufacturing. Technometrics, 34:1–14, 1992.
[72] I.H. Langford, G. Bentham and A. McDonald. Multilevel mod-
elling of geographically aggregated health data: A case study on
malignant melanoma mortality and UV exposure in the European
Community. Statistics in Medicine, 17:41–58, 1998.
[73] L.-F. Lee. A numerically stable quadrature procedure for the
one-factor random-component discrete choice model. Journal of
Econometrics, 95:117–129, 2000.
[74] E. Lesaffre and B. Spiessens. On the effect of the number of quadra-
ture points in a logistic random effects model: An example. Applied
Statistics, 50:325–335, 2001.
[75] W.F. Massy, D.B. Montgomery and D.G. Morrison. Stochastic
Models of Buying Behaviour. MIT Press, Cambridge, MA, 1970.
[76] P. McCullagh. Regression models for ordinal data (with discus-
sion). Journal of the Royal Statistical Society B, 42:109–142,
1980.∗∗∗∗
References 255
[77] P. McCullagh and J.A. Nelder. Generalized Linear Models. Chap-
man & Hall, London, second edition, 1989.
[78] R. McGinnis. A stochastic model of social mobility. American
Sociological Review, 23:712–722, 1968.∗∗∗
[79] B. McKnight and S.K. van den Eeden. A conditional analysis
for two treatment multiple-period crossover design with binomial
or Poisson outcomes and subjects who drop out. Statistics in
Medicine, 12:825–834, 1993.
[80] J. Morgan, K. Dickinson, J. Dickinson, J. Benus and G. Duncan.
Five Thousand American Families, Patterns of Economic Progress,
Volume 1 and 2, Institute of Social Research, University of Michi-
gan, Ann Arbor, MI, 1974.
[81] T.M. Morgan, C.S. Aneshensel and V.A. Clark. Parameter esti-
mation for mover stayer models: Analysis of depression over time.
Sociological Methods and Research, 11:345–366, 1983.
[82] Y. Mundlak. On the pooling of time series and cross sectional
data. Econometrica, 46:69–85, 1978.
[83] J.C. Naylor and A.F.M. Smith. Econometric illustrations of novel
numerical integration strategies for Bayesian inference. Journal of
Econometrics, 38:103–125, 1988.
[84] J. Neyman and E. Scott. Consistent estimates based on partially
consistent observations. Econometrica, 46:69–85, 1948.
[85] L.E. Papke. Tax policy and urban development: Evidence from the
Indiana enterprise zone program. Journal of Public Economics,
54:37–49, 1994.
[86] R. Penn. Social Change and Economic Life in Britain. Homeless
Book, Bologna, 2006.
[87] R. Penn, D. Berridge and M. Ganjali. Changing attitudes to gender
roles: A longitudinal analysis of ordinal response data from the
british household panel study. International Sociology, 24(3):346–
367, May 2009.
[88] A.R. Pickles and R. Crouchley. Generalisations and applications
of frailty models for survival and event data. Statistical Methods
in Medical Research, 3:263–278, 1994.
256 References
[89] R. Prentice and L. Gloeckler. Regression analysis of grouped sur-
vival data with applications to breast cancer data. Biometrics,
34:57–67, 1978.
[90] S. Rabe-Hesketh and A. Skrondal. Multilevel and Longitudinal
Modelling using Stata. Stata Press, College Station, TX, 2005.
[91] S. Raudenbush, B. Rowan and Y. Cheong. Teaching as a non-
routine task: implications for the organisational design of schools.
Educational Administration Quarterly, 29(4):479–500, 1993.
[92] S.W. Raudenbush and C. Bhumirat. The distribution of re-
sources for primary education and its consequences for educational
achievement in Thailand. International Journal of Educational Re-
search, 17:143–164, 1992.
[93] S.W. Raudenbush and A.S. Bryk. Hierarchical Linear Models.
Sage, Thousand Oaks, CA, 2002.
[94] G. Rodrı́guez and N. Goldman. Improved estimation procedures
for multilevel models with binary response. Journal of the Royal
Statistics Society A, Statistics in Society, 164(2):339–355, 2001.
[95] B. Singer and S. Spilerman. Some methodological issues in the
analysis of longitudinal surveys. Annals of Economic and Social
Measurement, 5:92–119, 1976.
[96] A. Skrondal and S. Rabe-Hesketh. Generalized Latent Variable
Modeling: Multilevel, Longitudinal and Structural Equation Mod-
els. Chapman & Hall/CRC, Boca Raton, FL, 2004.
[97] M. Smans, C.S. Muir and P. Boyle. Atlas of Cancer Mortality in
the European Economic Community. IARC Scientific Publications,
Lyon, France, 1992.
[98] T.A.B. Snijders and R.J. Bosker. Multilevel Analysis: An Introduc-
tion to Basic and Advanced Multilevel Modelling. Sage, London,
1999.
[99] M.B. Stewart. The interrelated dynamics of unemployment and
low-wage employment. Journal of Applied Econometrics, 22:511–
531, 2007.
[100] A.H. Stroud and D. Secrest. Gaussian Quadrature Formulas.
Prentice-Hall, Englewood Cliffs, NJ, 1966.
[101] M.F. Taylor, J. Brice, N. Buck and E. Prentice-Lane. British
Household Panel Survey User Manual Volume A: Introduction,
References 257
Technical Report and Appendices. University of Essex, Colchester,
2005.
[102] P.F. Thall and S.C. Vail. Some covariance models for longitudinal
count data with overdispersion. Biometrics, 46:657–671, 1990.
[103] F. Vella and M. Verbeek. Whose wages do unions raise? A dynamic
model of unionism and wage rate determination for young men.
Journal of Applied Econometrics, 13:163–183, 1998.
[104] J. Verzani. Using R for Introductory Statistics. Chapman &
Hall/CRC, Boca Raton, FL, 2004.
[105] R.D. Wiggins, K. Ashworth, C.A. O’Muircheartaigh and J.J. Gal-
braith. Multilevel analysis of attitudes to abortion. Journal of the
Royal Statistical Society Series D, 40:225–234, 1991.
[106] J.M. Wooldridge. Econometric Analysis of Cross Section and
Panel Data. MIT Press, Cambridge, MA, 2002.
[107] J.M. Wooldridge. Simple solutions to the initial conditions problem
in dynamic, nonlinear panel data models with unobserved hetero-
geneity. Journal of Applied Econometrics, 20:39–54, 2005.∗∗∗
[108] J.M. Wooldridge. Introductory Econometrics: A Modern Ap-
proach. Cengage, Mason, OH, fourth edition, 2009.
This page intentionally left blank
Statistics
Multivariate Generalized Linear Mixed Models Using R
Multivariate Generalized Linear Mixed Models Using R presents
robust and methodologically sound models for analyzing large and
complex data sets, enabling readers to answer increasingly complex
research questions. The book applies the principles of modeling
to longitudinal data from panel and related studies via the Sabre
software package in R.
The authors first discuss members of the family of generalized linear
models, gradually adding complexity to the modeling framework by
incorporating random effects. After reviewing the generalized linear
model notation, they illustrate a range of random effects models,
including three-level, multivariate, endpoint, event history, and state
dependence models. They estimate the multivariate generalized
linear mixed models (MGLMMs) using either standard or adaptive
Gaussian quadrature. The authors also compare two-level fixed and
random effects linear models. The appendices contain additional
information on quadrature, model estimation, and endogenous
variables, along with SabreR commands and examples.
In medical and social science research, MGLMMs help disentangle
state dependence from incidental parameters. Focusing on these
sophisticated data analysis techniques, this book explains the
statistical theory and modeling involved in longitudinal studies. Many
examples throughout the text illustrate the analysis of real-world
data sets. Exercises, solutions, and other material are available on a
supporting website. Berridge • Crouchley
K10680
K10680_Cover.indd 1 3/17/11 10:00 AM