Data Analysis Using Hierarchical Generalized Linear Models With R 1st Edition Lee PDF Download
Data Analysis Using Hierarchical Generalized Linear Models With R 1st Edition Lee PDF Download
https://ebookbell.com/product/data-analysis-using-hierarchical-
generalized-linear-models-with-r-1st-edition-lee-55540290
https://ebookbell.com/product/data-analysis-using-regression-and-
multilevelhierarchical-models-andrew-gelman-jennifer-hill-2630560
Data Analysis Using Sql And Excel 1st Edition Gordon S Linoff
https://ebookbell.com/product/data-analysis-using-sql-and-excel-1st-
edition-gordon-s-linoff-4638738
https://ebookbell.com/product/data-analysis-using-sas-chaoying-joanne-
peng-4937560
Data Analysis Using Stata Third Edition 3rd Edition Ulrich Kohler
https://ebookbell.com/product/data-analysis-using-stata-third-
edition-3rd-edition-ulrich-kohler-4952628
Data Analysis Using Sql And Excel 2nd Edition Gordon S Linoff
https://ebookbell.com/product/data-analysis-using-sql-and-excel-2nd-
edition-gordon-s-linoff-5670786
Data Analysis Using The Leastsquares Method 1st Edition John Wolberg
https://ebookbell.com/product/data-analysis-using-the-leastsquares-
method-1st-edition-john-wolberg-932870
https://ebookbell.com/product/data-analysis-using-sas-enterprise-
guide-1566344
https://ebookbell.com/product/data-analysis-using-sql-and-excel-
gordon-s-linoff-230947040
Guerrilla Data Analysis Using Microsoft Excel Conquering Crap Data And
Excel Skirmishes Excel Skirmishes 3rd Edition Bill Jelen
https://ebookbell.com/product/guerrilla-data-analysis-using-microsoft-
excel-conquering-crap-data-and-excel-skirmishes-excel-skirmishes-3rd-
edition-bill-jelen-49154772
DATA ANALYSIS USING
HIERARCHICAL GENERALIZED
LINEAR MODELS WITH R
DATA ANALYSIS USING
HIERARCHICAL GENERALIZED
LINEAR MODELS WITH R
Youngjo Lee
Lars Rönnegård
Maengseok Noh
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
List of notations ix
Preface xi
1 Introduction 1
1.1 Motivating examples 5
1.2 Regarding criticisms of the h-likelihood 14
1.3 R code 16
1.4 Exercises 17
v
vi CONTENTS
3.4 Extended likelihood principle 50
3.5 Laplace approximations for the integrals 53
3.6 Street magician 56
3.7 H-likelihood and empirical Bayes 58
3.8 Exercises 62
5 HGLMs modeling in R 95
5.1 Examples 95
5.2 R code 124
5.3 Exercises 132
References 299
Symbol Description
y Response vector
X Model matrix for fixed effects
Z Model matrix for random effects
β Fixed effects
v Random effect on canonical scale
u Random effect on the original scale
n Number of observations
m Number of levels in the random effect
h(·) Hierarchical log-likelihood
L(·; ·) Likelihood with notation parameters; data
fθ (y) Density function for y having parameters θ
θ A generic parameter indicating
any fixed effect to be estimated
φ Dispersion component for the mean model
λ Dispersion component for the random effects
g(·) Link function for the linear predictor
r(·) Link function for random effects
η Linear predictor in a GLM
µ Expectation of y
s Linearized working response in IWLS
V Marginal variance matrix used in linear mixed models
V (·) GLM variance function
I(.) Information matrix
pd Estimated number of parameters
T
Transpose
δ Augmented effect vector
γ Regression coefficient for dispersion
ix
Preface
xi
xii PREFACE
follow different distributions, which can also fit factor and structural
equation models. The frailtyHL package is used for survival analysis us-
ing frailty models, which is an extension of Cox’s proportional hazards
model to allow random effects. The jointdhglm package allows joint mod-
els for HGLMs and survival time and competing risk models. In Chapter
10, we introduce variable selection methods via random-effect models.
Furthermore, in Chapter 10 we study the random-effect models with dis-
crete random effects and show that hypothesis testing can be expressed
in terms of prediction of discrete random effects (e.g., null or alternative)
and show how h-likelihood gives a general extension of likelihood-ratio
test to multiple testing. Model-checking techniques and model-selection
tools by using the h-likelihood modeling approach add further insight to
the data analysis.
It is an advantage to have studied linear mixed models and GLMs before
reading this book. Nevertheless, GLMs are briefly introduced in Chapter
2 together with a short review of GLM theory, for a reader who wishes
to freshen up on the topic. The majority of data sets used in the book
are available at URL
http : //cran.r − project.org/package = mdhglm
for the R package mdhglm (Lee et al., 2016b). Several different examples
are presented throughout the book, while the longitudinal epilepsy data
presented by Thall and Vail (1990) is an example dataset used recur-
rently throughout the book from Chapter 1 to Chapter 6 allowing the
reader to follow the model development from a basic GLM to a more
advanced DHGLM.
We are grateful to Prof. Emmanuel Lesaffre, Dr. Smart Sarpong, Dr.
Ildo Ha, Dr. Moudud Alam, Dr. Xia Shen, Mr. Jengseop Han, Mr. Dae-
han Kim, Mr. Hyunseong Park and the late Dr. Marek Molas for their
numerous useful comments and suggestions.
Introduction
1
2 INTRODUCTION
iii) we can use model-checking tools for linear regression and generalized
linear models (GLMs), making assumptions in all parts of an HGLM
checkable.
The marginal likelihood is used for inference on the fixed effects both
in classical frequentist and h-likelihood approaches, but the marginal
likelihood involves multiple integration over the random effects that are
most often not feasible to compute. For such cases the adjusted profile
h-likelihood, a Laplace approximation of the marginal likelihood, is used
in the h-likelihood approach. Because the random effects are integrated
out in a marginal likelihood, classical frequentist method does not allow
any direct inference of random effects.
Bayesians assume prior for parameters and for inference they often rely
on Markov Chain Monte Carlo (MCMC) computations (Lesaffre and
Lawson, 2012). The h-likelihood allows complex models to be fitted by
maximizing likelihoods for fixed unknown parameters. So for a person
who does not wish to express prior beliefs, there are both philosophical
and computational advantages of using HGLMs. However, this book does
not focus on the philosophical advantages of using the h-likelihood but
rather on the practical advantages for applied users, having a reasonable
statistical background in linear models and GLMs, to enhance their data
analysis skills for more general types of data.
In this chapter we introduce a few examples to show the strength of
the h-likelihood approach. Table 1.1 contains classes of models and the
chapters where they are first introduced, and available R packages. Var-
ious packages have been developed to cover these model classes. From
Table 1.1, we see that the dhglm package has been developed to cover a
wider class of models from GLMs to DHGLMs . Detailed descriptions of
dhglm package are presented in Chapter 7, where full structure of model
classes are described.
Figure 1.1 shows the evolution of the model classes presented in this
book together with their acronyms. Figure 1.1 also shows the building-
block structure of the h-likelihood; once you have captured the ideas at
one level you can go to a deeper level of modeling. This book aims to
show how complicated statistical model classes can be built by combin-
ing interconnected GLMs and augmented GLMs, and inference can be
made in a single framework of the h-likelihood. For readers who want
a more detailed description of theories and algorithms on HGLMs and
on survival analysis, we suggest the monographs by Lee, Nelder, and
Pawitan (2017) and Ha, Jeong, and Lee (2017). This book shows how to
analyze examples in these two books using available R-packages and we
have also new examples.
INTRODUCTION 3
Factor analysis Generalized linear mixed Joint GLM (Ch 4) Multiple testing
Generalized linear model
(Ch 7) model (GLMM, Ch 4-5) including dispersion model with (Ch 10)
Generalized linear model including fixed effects
Gaussian random effects
Multivariate DHGLM
(MDHGLM, Ch 7)
DHGLM including outcomes from
several distributions
Table 1.1 Model classes presented in the book including chapter number and available R packages
Model Class R package Developer Chapter
GLM glm() function 2
Nelder and Wedderburn (1972) dhglm Lee and Noh (2016)
Joint GLM (Nelder and Lee, 1991) dhglm Lee and Noh (2016) 4
GLMM dhglm Lee and Noh (2016) 4, 5
Breslow and Clayton (1993) lme4 Bates and Maechler (2009)
hglm Alam et al. (2015)
HGLM dhglm Lee and Noh (2016) 2,3,4,5
Lee and Nelder (1996) hglm Alam et al. (2015)
Spatial HGLM dhglm Lee and Noh (2016) 5
Lee and Nelder (2001b) spaMM Rousset et al. (2016)
DHGLM (Lee and Nelder, 2006; Noh and Lee, 2017) dhglm Lee and Noh (2016) 6
Multivariate DHGLM mdhglm Lee, Molas, and Noh (2016b) 7
Lee, Molas, and Noh (2016a) mixAK Komarek (2015)
Lee, Nelder, and Pawitan (2017) mmm Asar and Ilk (2014)
Frailty HGLM frailtyHL Ha et al. (2012) 8
Ha, Lee, and Song (2001) coxme Therneau (2015)
survival Therneau and Lumley (2015)
Joint DHGLM jointdhglm Ha, Lee, and Noh (2015) 8
Henderson et al. (2000) JM Rizopoulos (2015)
INTRODUCTION
MOTIVATING EXAMPLES 5
1.1 Motivating examples
Thall and Vail (1990) presented longitudinal data from a clinical trial of
59 epileptics, who were randomized to a new drug or a placebo (T=1 or
T=0). Baseline data were available at the start of the trial; the trial in-
cluded the logarithm of the average number of epileptic seizures recorded
in the 8-week period preceding the trial (B), the logarithm of age (A),
and number of clinic visit (V: a linear trend, coded (-3,-1,1,3)). A multi-
variate response variable (y) consists of the seizure counts during 2-week
periods before each of four visits to the clinic.
The data can be retrieved from the R package dhglm (see R code at the
end of the chapter). It is a good idea at this stage to have a look at
and get acquainted with the data. From the boxplot of the number of
seizures (Figure 1.2) there is no clear difference between the two treat-
ment groups. In Figure 1.3 the number of seizures per visit are plotted
for each patient, where the lines in this spaghetti plot indicate longi-
tudinal patient effects. Investigate the data further before running the
models below.
A simple first preliminary analysis could be to analyze the data (ignoring
that there are repeated measurements on each patient) with a GLM
having a Poisson distributed response using the R function glm. Let yij
be the corresponding response variable for patient i(= 1, · · · , 59) and
visit j(= 1, · · · , 4). We consider a Poisson GLM with log-link function
modeled as
log(µij ) = β0 + xBi βB + xTi βT + xAi βA + xVj βV + xBi Ti βBT , (1.1)
where β0 , βB , βT , βA , βV , and βBT are fixed effects for the intercept,
6 INTRODUCTION
4
log(Seizure counts + 1)
3
2
1
0
0 1
Treatment
Figure 1.2 Boxplot of the logarithm of seizure counts (new drug = 1, placebo
= 0).
Call:
glm(formula=y~B+T+A+B:T+V,family=poisson(link=log),
data=epilepsy)
MOTIVATING EXAMPLES 7
5
log(Seizure counts + 1)
4
3
2
1
0
1 2 3 4 5 6 7
Visit
Figure 1.3 Number of seizures per visit for each patient. There are 59 patients
and each line shows the logarithm of seizure counts + 1 for each patient.
Deviance Residuals:
Min 1Q Median 3Q Max
-5.0677 -1.4468 -0.2655 0.8164 11.1387
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.79763 0.40729 -6.869 6.47e-12 ***
8 INTRODUCTION
B 0.94952 0.04356 21.797 < 2e-16 ***
T -1.34112 0.15674 -8.556 < 2e-16 ***
A 0.89705 0.11644 7.704 1.32e-14 ***
V -0.02936 0.01014 -2.895 0.00379 **
B:T 0.56223 0.06350 8.855 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Random effects:
Groups Name Variance Std.Dev.
patient (Intercept) 0.2515 0.5015
Number of obs: 236, groups: patient, 59
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37817 1.17746 -1.170 0.24181
B 0.88442 0.13075 6.764 1.34e-11 ***
T -0.93291 0.39883 -2.339 0.01933 *
A 0.48450 0.34584 1.401 0.16123
V -0.02936 0.01009 -2.910 0.00362 **
B:T 0.33827 0.20247 1.671 0.09477 .
The output gives a variance for the random patient effect equal to 0.25.
In GLMMs the random effects are always assumed normally distributed.
The same model can be fitted within the h-likelihood framework using
the R package hglm, which gives the output (essential parts of the output
shown):
Call:
hglm2.formula(meanmodel=y~B+T+A+B:T+V+(1|patient),
data=epilepsy,family=poisson(link = log), fix.disp = 1)
----------
MEAN MODEL
----------
----------------
DISPERSION MODEL
----------------
The output gives a variance for the random patient effect equal to 0.27,
similar to the lme4 package. In Chapter 4, we compare similarities and
difference between HGLM estimates and lme4. Unlike the lme4 package,
however, it is also possible to add non-Gaussian random effects to further
model the over-dispersion. However, with the marginal-likelihood infer-
ences, subject-specific inferences cannot be made. For subject-specific
inferences, we need to estimate random patient effects vi via the h-
likelihood.
With dhglm package it is possible to allow different distribution for dif-
ferent random effects. For example, in the previous GLMM, a gamma
distributed random effect can be included for each observation, which
gives a conditional distribution of the response that can be shown to
be negative binomial, while patient effects are modeled as normally dis-
tributed:
E(yij |vi , vij ) = µij and var(yij |vi ) = µij .
with log-link function modeled as
log(µij ) = β0 + xBi βB + xTi βT + xAi βA + xVj βV + xBi Ti βBT + vi + vij ,
(1.3)
where vi ∼ N(0, λ1 ), uij = exp(vij ) ∼ G(λ2 ) and G(λ2 ) is a gamma
distribution with E(uij ) = 1 and var(uij ) = λ2 . Then, this model is
equivalent to the negative binomial HGLM such that the conditional
distribution of yij |vi is the negative binomial distribution with the prob-
ability function
!yij !1/λ2
yij + 1/λ2 − 1 λ2 1 ∗y
µij ij ,
1/λ2 − 1 1 + µ∗ij λ2 1 + µ∗ij λ2
Call:
hglm2.formula(meanmodel=y~B+T+A+B:T+V+(1|id)+
(1|patient),data=epilepsy,family=poisson(link=log),
rand.family=list(Gamma(link=log),gaussian()),
fix.disp = 1)
MOTIVATING EXAMPLES 11
----------
MEAN MODEL
----------
----------------
DISPERSION MODEL
----------------
The variance component for the random patient effect is now 0.24 with
additional saturated random effects picking up some of the over-disper-
sion. By adding gamma saturated random effects we can model over-
dispersion (extra Poisson variation). The example shows how modeling
with GLMMs can be further extended using HGLMs. This can be viewed
as an extension of Poisson GLMMs to negative-binomial GLMMs, where
repeated observations on each patient follow the negative-binomial dis-
tribution rather than the Poisson distribution.
Dispersion modeling is an important, but often challenging task in statis-
tics. Even for simple models without random effects, iterative algorithms
are needed to compute maximum likelihood (ML) estimates. An example
is the heteroscedastic linear model
y ∼ N(Xβ, exp(X d β d ))
0
~
>-
"'c
(!)
0
0
ro
::l
0'"
(!) 0
La:: 0
v
1 2 3 4 5 6 7 8 9 11 13 15 17 19 21
Litter Size
>-
"'c
(!)
0
0
ro
::J
0"
~ 0
LL
~
2 3 4 5 6 7 8 9
Figure 1.4 Distributions of observed litter sizes and the number of observations
per sow.
Likelihood is used both in the frequentist and Bayesian worlds and it has
been the central concept for almost a centry in statistical modeling and
inference. However, likelihood and frequentists cannot make inference
for random unknowns whereas Bayesian does not make inference of fixed
unknowns. The h-likelihood aims to allow inferences for both fixed and
random unknowns and could cover both worlds (Figure 1.5).
The concept of the h-likelihood has received criticism since the first pub-
lication of Lee and Nelder (1996). Partly the criticism has been motivated
because the theory in Lee and Nelder (1996) was not fully developed.
However, these question marks have been clarified in later papers by Lee,
Nelder and co-workers. One of the main concerns in the 1990’s was the
similarity with penalized quasi-likelihood (PQL) for GLMMs of Breslow
and Clayton (1993), which has large biases for binary data. PQL estima-
tion for GLMMs is implemented in e.g., the R package glmmPQL. Early
h-likelihood methodology was criticized because of non-ignorable biases
in binary data. These biases in binary data can be eliminated by im-
proved approximations of the marginal likelihood through higher-order
Laplace approximations (Lee, Nelder, and Pawitan, 2017).
Meng (2009, 2010) established Bartlett-like identities for h-likelihood.
That is, the score for parameters and unobservables has zero expecta-
tion, and the variance of the score is the expected negative Hessian under
easily verifiable conditions. However, Meng noted difficulties in infer-
ences about unobservables: neither the consistency nor the asymptotic
normality for parameter estimation generally holds for unobservables.
Thus, Meng (2009, 2010) conjectured that an attempt to make proba-
bility statements about unobservables without using a prior would be
in vain. Paik et al. (2015) studied the summarizability of h-likelihood
estimators and Lee and Kim (2016) showed how to make probability
statements about unobservables in general without assuming a prior as
we shall see.
The h-likelihood approach is a genuine approach based on the extended
REGARDING CRITICISMS OF THE H-LIKELIHOOD 15
likelihood principle. This likelihood principle is mathematical theory so
there should be no controversy on its validity. However, it does not tell
how to use the extended likelihood for statistical inference. Another im-
portant question, that has been asked and answered, is: For which family
of models can the joint maximization of the h-likelihood be applied on?
This is a motivated question since there are numerous examples where
joint maximization of an extended likelihood containing both fixed ef-
fects β and random effects v gives nonsense estimates (see Lee and Nelder
(2009). Such examples use the extended likelihood for joint maximiza-
tion of both β and v. If the h-likelihood h, defined in Chapter 2, is
jointly maximized for estimating both β and v, such nonsense estimates
disappear. However, consistent estimates for the fixed effect can only be
guaranteed for a rather limited class of models, including linear mixed
models and Poisson HGLMs having gamma random effects on a log scale.
Thus, as long as the marginal likelihood is used to estimate β and the h-
likelihood h is maximized to estimate the random effects these examples
do not give contradictory results.
Lee and Nelder have developed a series of papers to show the iterative
weighted least squares (IWLS) algorithm for GLMs can be extended to a
general class of models including HGLMs. It is computationally efficient
and therefore potentially very useful in statistical applications, to allow
analysis of more and more complex models (Figure 1.1). For linear mixed
models, there is nothing controversial about this algorithm because it can
be shown to give BLUP. Neither is it controversial for GLMs with ran-
dom effects in general, because the adjusted profile h-likelihoods (defined
in Chapter 2) simply are approximations of marginal and restricted like-
lihoods for estimating fixed effects and variance components, and as such
16 INTRODUCTION
H4it:e ihoodwottd
1.3 R code
library(dhglm)
data(epilepsy)
model1 <- glm(y~B+T+A+B:T+V,family=poisson(link=log),
data=epilepsy)
summary(model1)
library(lme4)
model2 <- glmer(y~B+T+A+B:T+V+(1|patient),
family=poisson(link=log),data=epilepsy)
summary(model2)
EXERCISES 17
library(hglm)
model3 <- hglm2(y~B+T+A+B:T+V+(1|patient),
family=poisson(link=log),
fix.disp=1, data=epilepsy)
1.4 Exercises
1. Get acquainted with the model checking plots available in the hglm
package by fitting a Gaussian distribution to the epileptic seizure count
data. (This is highly inappropriate but we do it for the sake of the
exercise.)
The model checking plots are produced with the command
plot(hglm.object) where hglm.object is the fitted model using hglm.
The QQ-plot for the residuals are found in the figure with heading Mean
Model Deviances and the QQ-plot for the random effects are found in
the figure with heading Random1 Deviances.
Compare these two QQ-plots to the ones from the Poisson GLMM (i.e.,
model3 in the text). Which model seems to be the most suitable one?
CHAPTER 2
2.1 Examples
19
20 GLMS VIA ITERATIVE WEIGHTED LEAST SQUARES
level x, as reported by their spouses. We use scores (0, 2, 4, 5) for x.
The response variable “yes” is the number of heart disease cases and the
variable “no” is the number of non-case.
g(µ) = Xβ.
The variance of y is a function of µ. Take a Poisson distribution,
for instance, the variance of y is equal to the mean µ. This rela-
tionship between the mean and the variance depends directly on
the assumed distribution of y. Thus, for all GLMs, the variance
of y is the product of a variance function V (µ) and a dispersion
parameter φ. With m being the binomial denominator, we have
the following variances and variance functions, V (µ).
Variance of y V (µ)
Normal φ 1
Poisson µ µ
Gamma φµ2 µ2
Binomial µ(m − µ)/m µ(m − µ)/m
For modeling proportion of heart disease cases (p), we fit a logistic re-
gression model as below,
p
log = α + βx.
1−p
We run the GLM by using the R function glm and from the output
we see that snoring level is a significant risk factor for heart disease. In
glm, the null deviance is the scaled deviance for intercept-only model
and the scaled deviance for proposed model is called residual deviance.
EXAMPLES 21
The difference of two-scaled deviance 63.10 is very significant with 1
degree of freedom, so that the snoring level x is significant. Since the
scaled deviance of the current model is 2.8 with degrees of freedom 2 (p-
value=0.25), we conclude that there is no lack of fit in using the logistic
regression model.
Call:
glm(formula = cbind(yes, no) ~ x, family = binomial)
Deviance Residuals:
1 2 3 4
-0.8346 1.2521 0.2758 -0.6845
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.86625 0.16621 -23.261 < 2e-16 ***
x 0.39734 0.05001 7.945 1.94e-15 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Consider the train accident data in the UK between 1975 and 2003
(Agresti, 2007). Let µ denote the expected value of the number of train-
related accidents y which are annual number of collisions between trains
and road vehicles, for t million kilometers of train travel. The data is
plotted in Figure 2.1.
To allow for a linear trend over time, we consider the following Poisson
GLM for response y with covariate x (=number of years since 1975) and
offset log(t).
Call:
glm(formula = y ~ x, family = poisson, offset = log(t))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0580 -0.7825 -0.0826 0.3775 3.3873
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.21142 0.15892 -26.50 < 2e-16 ***
x -0.03292 0.01076 -3.06 0.00222 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
We apply the model-checking plots of Lee and Nelder (1998) for GLMs as
shown in Figure 2.2. Two plots are used; the plot of studentized residuals
against fitted values on the constant information scale (Nelder, 1990),
and the plot of absolute residuals similarly. For a satisfactory model these
two plots should show running means that are approximately straight
EXAMPLES 23
x
−3.5
x
log(Annual number of collisions per million km)
−4.0
x
x
x
x
x
−4.5
x x x x x x
x x x
x
x
−5.0
x x x
x x x x x x
−5.5
x
−6.0
0 5 10 15 20 25
Figure 2.1 Annual number of collisions between trains and road vehicles per
million km for the years 1975–2003.
and flat. If there is marked curvature in the first plot, this indicates either
an unsatisfactory link function or missing terms in the linear predictor,
or both. If the first plot is satisfactory, the second plot may be used to
check the choice of variance function for the distributional assumption.
If, for example, the second plot shows a marked downward trend, this
implies that the residuals are falling in absolute value as the mean in-
creases, i.e., that the assumed variance function is increasing too rapidly
with the mean. In a normal probability plot, ordered values of residuals
are plotted against the expected order statistics of the standard normal
sample. In the absence of outliers this plot is approximately linear. We
also use the histogram of residuals. If the distributional assumption is
correct, it shows symmetry provided the deviance residual is the best
normalizing transformation.
24 GLMS VIA ITERATIVE WEIGHTED LEAST SQUARES
3.0
|Studentized Residual|
Studentized Residual
2.0
1
0
1.0
−1
−2
0.0
1.2 1.4 1.6 1.8 1.2 1.4 1.6 1.8
10
3
Sample Quantiles
8
Frequency
1
6
0
4
−1
2
−2
−2 −1 0 1 2 −3 −2 −1 0 1 2 3 4
Figure 2.2 Normal probability plot for the train accident data under a Poisson
GLM.
The normal probability plot in Figure 2.2 shows that there exist two
outliers (y=12, 13 for year=1976, 1986). The average accident ȳ is 4.2,
so that 1976 and 1986 have large number of train accidents, which cannot
be explained as a Poisson-variation. In Chapter 5, we will show how these
data could be more appropriately modeled using an HGLM.
Normal Q−Q
8
193
99
6
Std. deviance resid.
195
4
2
0
−2
−3 −2 −1 0 1 2 3
Theoretical Quantiles
glm(y ~ B + T + A + T * B + V)
2.2 R code
Normal Q−Q
99
10 222
Std. deviance resid.
223
5
0
−5
−3 −2 −1 0 1 2 3
Theoretical Quantiles
glm(y ~ B + T + A + T * B + V)
0.15
0.10
0.05
0.00
Index
Figure 2.5 Hat values under a Poisson GLM for the epileptic seizure count
data.
R CODE 27
Train accident data
x <- rep(1975:2003)-1975
y<-c(2,12,8,4,3,2,2,3,7,3,5,13,
6,4,4,2,6,4,4,4,2,2,1,4,2,3,4,3,3)
t<-c(436,426,425,430,426,430,
417,372,401,389,418,414,397,443,436,
431,439,430,425,415,423,437,463,487,
505,503,508,516,518)
trainres1 <- glm( y~ x, family=poisson, offset=log(t))
## This can be available at mdhglm package
## data(train,package="mdhglm")
model_mu<-DHGLMMODELING(Model="mean",Link="log",
LinPred=y~x, Offset=log(t))
model_phi<-DHGLMMODELING(Model="dispersion")
fit<-dhglmfit(RespDist="poisson",DataMain=train,
MeanModel=model_mu,DispersionModel=model_phi)
plotdhglm(fit)
train2<-data.frame(cbind(x1,y1,t1))
model_mu<-DHGLMMODELING(Model="mean",Link="log",
LinPred=y1~x1, Offset=log(t1))
model_phi<-DHGLMMODELING(Model="dispersion")
fit2<-dhglmfit(RespDist="poisson",DataMain=train2,
MeanModel=model_mu,DispersionModel=model_phi)
plotdhglm(fit2)
data(epilepsy, package="dhglm")
model0 <- glm(y ~ B + T + A + B:T + V, family=gaussian,
28 GLMS VIA ITERATIVE WEIGHTED LEAST SQUARES
data=epilepsy)
plot(model0, which=2)
The notation used for the likelihood here is that the parameters and data
are separated by a semicolon, i.e., L(parameter; data), where the data
are an observed response y. Furthermore, in the probability function
fθ (data), θ denotes the fixed unknown parameters and the data are an
observed response y. Here θ will be used generically denoting any kind
of unknown fixed parameter.
For the regression model 2.1, θ = (β, φ) were fixed unknown parameters.
A full Bayesian approach requires specification of a prior distribution
π(θ) pretending all parameters are random. The advantage is that a full
probabilistic framework can be used and inference can be directly made
from the posterior distribution
p(θ|y) ∝ p(y|θ)π(θ), (2.2)
where p(y|θ) ≡ fθ (y). Here p(y|θ) is the model specification, while π(θ)
is not part of the model specification but is necessary to obtain the
posterior. Bayesian often wants to fit the entire distribution p(θ|y) and
use MCMC techniques. In machine learning, the penalty is a function of
π(θ) and the resulting objective function is called a penalized likelihood.
The mode estimator for (2.2) is called a penalized least squares estimate
(more generally, penalized maximum likelihood estimate).
An informative prior will shrink the estimates β̂ toward the center of
the prior distribution in a similar fashion as random effects in mixed
linear models where the shrinkage depends on the specified distribution
of the random effects. Note however that the prior distribution in the
30 GLMS VIA ITERATIVE WEIGHTED LEAST SQUARES
Bayesian approach is not a part of the model while the distribution of
random effects in linear mixed models is an essential part of the model
specification to describe correlations among responses. Thus, we may
view penalized ML estimators as the use of h-likelihood estimators but
where the true model has fixed unknowns. Shrinkage of the h-likelihood
estimation can be viewed as regularization of fixed parameter estimates
by the penalty.
In the remaining chapter, we study how to have ML procedures for
GLMs. We study penalized maximum likelihood estimation in Chapter
10.
e = y − Xβ ∼ N(0, φI)
The linear model has an explicit solution for β̂ that does not require
an iterative procedure, whereas GLMs in general require iteration. Take
Poisson regression with a log link function for instance,
log(µ) = Xβ.
The IWLS works as follows
The diagonal elements of H are the hat values here denoted by qi . They
take values from 0 to 1 and indicate how much information there is in
the model for each observation, where low values show high information.
In general, GLMs have the hat matrix
H = X(X T W X)−1 X T W.
Studentized residuals adjust for the hat values and are obtained as
r r
√ D,i or √ P,i .
1 − qi 1 − qi
2.7 Exercises
1. Implement the IWLS algorithm given for the Poisson regression model
in the beginning of the chapter (Section 2.4.2) using the following re-
sponse, y, and explanatory variable, x.
y <- c(18,17,15,20,10,20,25,13,12)
x <- c(0.5, 0.5, 1, 0, 1, 0, 0, 1, 1)
EXERCISES 35
2. a) Compute the deviance components for the fitted model in Exercise
1 (using the formula for the Poisson distribution in Table 2.2.
b) Compute the standardized deviance residuals.
c) Check the distributional assumption of the model by plotting the
standardized deviance residuals in a QQ-plot.
3. Implement an IWLS algorithm for a gamma regression model with a
log link function and φ = 1. (You might find the table in Box 1 useful.)
Fit the same data as above.
4. Using the general health questionnaire score data available in the SMIR
package, fit a logistic GLM with the glm function in R (see Section 2.4.3
of Lee, Nelder, and Pawitan (2017) to check your results). The response
variable is case vs. non-case and the linear predictor is Constant + sex
+ ghq. Check whether it is reasonable to assume ghq to have a linear
effect or whether the ghq should be fitted as a factor.
data(ghq, package="SMIR")
# sex : men, women
# c : case
# nc: non-case
# ghq : general health questionair score
5. The ozone data (analyzed in Section 2.4.4 of Lee, Nelder, and Pawitan
(2017)) contains nine meteorological variables. Find a suitable distribu-
tion for ozone concentration as response and a suitable linear predictor
by comparing model checking plots and AIC for different GLMs.
data(ozone, package="mdhglm")
# y : ozone concentration
# x1 - x9 : nine meteorological variables
CHAPTER 3
ii) h-likelihood inference of random effects takes into account the un-
certainty in estimating the fixed effects, whereas empirical Bayes (EB)
estimation of random effects assumes known values of the fixed effects,
iv) all necessary inferential tools can be derived from the h-likelihood,
and
The use of model-checking plots and model selection for HGLMs are
introduced using the epilepsy seizure data and thereafter the theoretical
details are presented in the following sections.
37
Random documents with unrelated
content Scribd suggests to you:
Pohjoiseen ja etelään, itään ja länteen pantiin viestit kiertämään,
että herra Svante Niilonpoika oli tullut ja tahtoi pitää maakäräjät
Tunan kirkkomäellä Taalain rahvaan kesken; oli saavuttava niin
monien kuin suinkin.
"Kyllä se on niin, kuten olen aina sanonut, että hienon väen lapset
ovat toista maata kuin yhteisen kansan", sanoi hän. "Kukapa ei
mielellään antaisi henkeään niin siunatun herran puolesta kuin te,
Sten Svantenpoika, olette, ja varmaan on koituva minulle suureksi
kunniaksi, että olette rintaani imenyt."
"Kyllä hänen tarvitsee ollakin vahva, jos mieli kerran johtaa maata
ja kansaa", vastasi muori.
"Kenenkäs sitten?"
"Onko se mahdollista?"
Kaikki ne, joiden kotona hän oli käynyt lapsuudessaan, ne, jotka
olivat nähneet hänet kirkolla tai kohdanneet tiellä, olivat uteliaat
tietämään, muistiko hän sen ja sen tapauksen, ja nyt he kertoivat
sellaisella seikkaperäisyydellä, että hän tosiaankin saattoi uskoa
olleensa tapauksissa mukana, mutta näitä juttuja ei kerrottu
perätysten, vaan vähintään kymmenen kerrallaan, ja kun Sten
nyökkäsi myöntävästi, taputtelivat he polviaan ja nauroivat
katketakseen.
"Ja he?"
"He tahtovat tuumia asiaa."
"Minä myös!"
"Minä myös!"
6.
"Eikä suolastakaan!"
"No, hyvä isä, voin kai sentään sanoa jäähyväiset, sillä emme
luultavasti tule enää koskaan."
"Velan?"
"Jos olisit arvannut, että Antero tulee mukanani kotiin, olisit kyllä
pysynyt valveilla", tuumi isä.
"Keneltä?"
"Kuninkaalta!"
"Hänen nimensä?"
"Kristian!"
"Kaikki pyhimykset!"
"Mitä sitten?"
Kun nuori teini, joka kävi harvoin isänsä kodissa, näki jälleen
sisarensa, tunsi hän tuskin tätä.
"Minulle olisi mieleen, jos itsekin pyrkisit ylöspäin yhtä paljon kuin
hän!" vastasi hän ja käänsi selkänsä pojalleen.
Rikissa tunsi mielensä niin liikutetuksi, että hän tuskin tiesi, mitä
vastaili, ja kun vieras kysyi, milloin saisi palata noutamaan kirjaa,
vastasi hän pitäen silmällä omaa haluaan saada pian nähdä vierasta
jälleen: "Ylihuomenna!"
Olavi Juhananpoika."
"Ei, ei suinkaan."
Nyt kutsuttiin saapuville taitavin sälli, joka sai käskyn laatia oikean
taideteoksen, mutta ollenkaan hätiköimättä, sillä kun ei ollut kiirettä.
*****
ebookbell.com