Final Scribd
Final Scribd
Edition
Visit the link below to download the full version of this book:
https://cheaptodownload.com/product/an-introduction-to-generalized-linear-models
-2nd-edition-full-pdf-download/
AN
INTRODUCTION
TO
GENERALIZED
LINEAR MODELS
SECOND EDITION
Annette J. Dobson
This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or
retrieval system, without prior permission in writing from the publisher.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.
Preface
1 Introduction
1.1 Background
1.2 Scope
1.3 Notation
1.4 Distributions related to the Normal distribution
1.5 Quadratic forms
1.6 Estimation
1.7 Exercises
2 Model Fitting
2.1 Introduction
2.2 Examples
2.3 Some principles of statistical modelling
2.4 Notation and coding for explanatory variables
2.5 Exercises
4 Estimation
4.1 Introduction
4.2 Example: Failure times for pressure vessels
4.3 Maximum likelihood estimation
4.4 Poisson regression example
4.5 Exercises
5 Inference
5.1 Introduction
5.2 Sampling distribution for score statistics
Software
References
Statistical tools for analyzing data are developing rapidly so that the 1990
edition of this book is now out of date.
The original purpose of the book was to present a unified theoretical and
conceptual framework for statistical modelling in a way that was accessible
to undergraduate students and researchers in other fields. This new edition
has been expanded to include nominal (or multinomial) and ordinal logistic
regression, survival analysis and analysis of longitudinal and clustered data.
Although these topics do not fall strictly within the definition of generalized
linear models, the underlying principles and methods are very similar and
their inclusion is consistent with the original purpose of the book.
The new edition relies on numerical methods more than the previous edition
did. Some of the calculations can be performed with a spreadsheet while others
require statistical software. There is an emphasis on graphical methods for
exploratory data analysis, visualizing numerical optimization (for example,
of the likelihood function) and plotting residuals to check the adequacy of
models.
The data sets and outline solutions of the exercises are available on the
publisher’s website:
www.crcpress.com/us/ElectronicProducts/downandup.asp?mscssid=
I am grateful to colleagues and students at the Universities of Queensland
and Newcastle, Australia, for their helpful suggestions and comments about
the material.
Annette Dobson
1.2 Scope
The statistical methods considered in this book all involve the analysis of
relationships between measurements made on groups of subjects or objects.
For example, the measurements might be the heights or weights and the ages
of boys and girls, or the yield of plants under various growing conditions.
We use the terms response, outcome or dependent variable for measure-
ments that are free to vary in response to other variables called explanatory
variables or predictor variables or independent variables - although
this last term can sometimes be misleading. Responses are regarded as ran-
dom variables. Explanatory variables are usually treated as though they are
non-random measurements or observations; for example, they may be fixed
by the experimental design.
Responses and explanatory variables are measured on one of the following
scales.
1. Nominal classifications: e.g., red, green, blue; yes, no, do not know, not
applicable. In particular, for binary, dichotomous or binomial variables
1.3 Notation
Generally we follow the convention of denoting random variables by upper
case italic letters and observed values by the corresponding lower case letters.
For example, the observations y1 , y2 , ..., yn are regarded as realizations of the
random variables Y1 , Y2 , ..., Yn . Greek letters are used to denote parameters
and the corresponding lower case roman letters are used to denote estimators
and estimates; occasionally the symbol is used for estimators or estimates.
For example, the parameter β is estimated by β or b. Sometimes these con-
ventions are not strictly adhered to, either to avoid excessive notation in cases
where the meaning should be apparent from the context, or when there is a
strong tradition of alternative notation (e.g., e or ε for random error terms).
Vectors and matrices, whether random or not, are denoted by bold face
lower and upper case letters, respectively. Thus, y represents a vector of ob-
servations
y1
..
.
yn
or a vector of random variables
Y1
..
. ,
Yn
β denotes a vector of parameters and X is a matrix. The superscript T is
used for a matrix transpose or when a column vector is written as a row, e.g.,
T
y = [Y1 , ..., Yn ] .
1
N
1
y= yi = y·.
N i=1 N
The expected value and variance of a random variable Y are denoted by
E(Y ) and var(Y ) respectively. Suppose random variables Y1 , ..., YN are inde-
pendent with E(Yi ) = µi and var(Yi ) = σi2 for i = 1, ..., n. Let the random
variable W be a linear combination of the Yi ’s
W = a1 Y1 + a2 Y2 + ... + an Yn , (1.1)
where the ai ’s are constants. Then the expected value of W is
E(W ) = a1 µ1 + a2 µ2 + ... + an µn (1.2)
and its variance is
var(W ) = a21 σ12 + a22 σ22 + ... + a2n σn2 . (1.3)
2
1 1 y−µ
f (y; µ, σ 2 ) = √ exp − .
2πσ 2 2 σ2
where ρij is the correlation coefficient for Yi and Yj . Then the joint distri-
bution of the Yi ’s is the multivariate Normal distribution with mean
T
vector µ = [µ1 , ..., µn ] and variance-covariance matrix V with diagonal
elements σi and non-diagonal elements ρij σi σj for i = j. We write this as
2
T
y ∼ N(µ, V), where y = [Y1 , ..., Yn ] .
4. Suppose the random variables Y1 , ..., Yn are independent and Normally dis-
tributed with the distributions Yi ∼ N (µi , σi2 ) for i = 1, ..., n. If
W = a1 Y1 + a2 Y2 + ... + an Yn ,
T n
In matrix notation, if z = [Z1 , ..., Zn ] then zT z = i=1 Zi2 so that X 2 =
zTz ∼ χ2 (n).
2. If X 2 has the distribution χ2 (n), then its expected value is E(X 2 ) = n and
its variance is var(X 2 ) = 2n.
3. If Y1 , ..., Yn are independent Normally distributed random variables each
with the distribution Yi ∼ N (µi , σi2 ) then
n
Yi − µi
2
X2 = ∼ χ2 (n) (1.4)
i=1
σi
because each of the variables Zi = (Yi − µi ) /σi has the standard Normal
distribution N (0, 1).
4. Let Z1 , ..., Zn be independent random variables each with the distribution
N (0, 1) and let Yi = Zi + µi , where at least one of the µi ’s is non-zero.
Then the distribution of
2
Yi2 = (Zi + µi ) = Zi2 + 2 Zi µi + µ2i
6. More generally if y ∼ N(µ, V) then the random variable yT V−1 y has the
non-central chi-squared distribution χ2 (n, λ) where λ = µT V−1 µ.
7. If X12 , . . . , Xm
2
are m independent random variables with the chi-squared
distributions Xi2 ∼ χ2 (ni , λi ), which may or may not be central, then their
sum also has a chi-squared distribution with ni degrees of freedom and
non-centrality parameter λi , i.e.,
m
m
m
Xi ∼ χ
2 2
ni , λi .
i=1 i=1 i=1
1.4.3 t-distribution
The t-distribution with n degrees of freedom is defined as the ratio of two
independent random variables. The numerator has the standard Normal dis-
tribution and the denominator is the square root of a central chi-squared
random variable divided by its degrees of freedom; that is,
Z
T = (1.6)
(X 2 /n)1/2
where Z ∼ N (0, 1), X 2 ∼ χ2 (n) and Z and X 2 are independent. This is
denoted by T ∼ t(n).
1.4.4 F-distribution
1. The central F-distribution with n and m degrees of freedom is defined
as the ratio of two independent central chi-squared random variables each
This is Cochran’s theorem; for a proof see, for example, Hogg and Craig
(1995). A similar result holds for non-central distributions; see Chapter 3
of Rao (1973).
6. A consequence of Cochran’s theorem is that the difference of two indepen-
dent random variables, X12 ∼ χ2 (m) and X22 ∼ χ2 (k), also has a chi-squared
distribution
X 2 = X12 − X22 ∼ χ2 (m − k)
provided that X 2 ≥ 0 and m > k.
1.6 Estimation
1.6.1 Maximum likelihood estimation
T
Let y = [Y1 , ..., Yn ] denote a random vector and let the joint probability
density function of the Yi ’s be
f (y; θ)
T
which depends on the vector of parameters θ = [θ1 , ..., θp ] .
The likelihood function L(θ; y) is algebraically the same as the joint
probability density function f (y; θ) but the change in notation reflects a shift
of emphasis from the random variables y, with θ fixed, to the parameters θ
with y fixed. Since L is defined in terms of the random vector y, it is itself a
random variable. Let Ω denote the set of all possible values of the parameter
vector θ; Ω is called the parameter space. The maximum likelihood
estimator of θ is the value θ which maximizes the likelihood function, that
is
y) ≥ L(θ; y)
L(θ; for all θ in Ω.
is the value which maximizes the log-likelihood function
Equivalently, θ
y) ≥ l(θ; y)
l(θ; for all θ in Ω.
Often it is easier to work with the log-likelihood function than with the like-
lihood function itself.
Usually the estimator θ is obtained by differentiating the log-likelihood
function with respect to each element θj of θ and solving the simultaneous
equations
∂l(θ; y)
=0 for j = 1, ..., p. (1.9)
∂θj
∂ 2 l(θ; y)
∂θj ∂θk
It is also necessary to check if there are any values of θ at the edges of the
parameter space Ω that give local maxima of l(θ; y). When all local maxima
have been identified, the value of θ corresponding to the largest one is the
maximum likelihood estimator. (For most of the models considered in this
book there is only one maximum and it corresponds to the solution of the
equations ∂l/∂θj = 0, j = 1, ..., p.)
An important property of maximum likelihood estimators is that if g(θ)
is any function of the parameters θ, then the maximum likelihood estimator
This follows from the definition of θ.
of g(θ) is g(θ). It is sometimes called
the invariance property of maximum likelihood estimators. A consequence
is that we can work with a function of the parameters that is convenient
for maximum likelihood estimation and then use the invariance property to
obtain maximum likelihood estimates for the required parameters.
In principle, it is not necessary to be able to find the derivatives of the
likelihood or log-likelihood functions or to solve equation (1.9) if θ can be
found numerically. In practice, numerical approximations are very important
for generalized linear models.
Other properties of maximum likelihood estimators include consistency, suf-
ficiency, asymptotic efficiency and asymptotic normality. These are discussed
in books such as Cox and Hinkley (1974) or Kalbfleisch (1985, Chapters 1 and
2).
where the weights are wi = (σi2 )−1 . In this way, the observations which are
less reliable (that is, the Yi ’s with the larger variances) will have less influence
on the estimates.
More generally, let y = [Y1 , ..., Yn ]T denote a random vector with mean vec-
T
tor µ = [µ1 , ..., µn ] and variance-covariance matrix V. Then the weighted
least squares estimator is obtained by minimizing
S = (y − µ)T V−1 (y − µ).
50
45
40
35
3 4 5 6 7 8
Figure 1.1 Graph showing the location of the maximum likelihood estimate for the
data in Table 1.2 on tropical cyclones.
1.7 Exercises
1.1 Let Y1 and Y2 be independent random variables with
Y1 ∼ N (1, 3) and Y2 ∼ N (2, 5). If W1 = Y1 + 2Y2 and W2 = 4Y1 − Y2 what
is the joint distribution of W1 and W2 ?
1.2 Let Y1 and Y2 be independent random variables with
Y1 ∼ N (0, 1) and Y2 ∼ N (3, 4).
k θ(k) l∗
1 5 50.878
2 6 51.007
3 5.5 51.242
4 5.75 51.192
5 5.625 51.235
6 5.5625 51.243
7 5.5313 51.24354
8 5.5469 51.24352
9 5.5391 51.24360
10 5.5352 51.24359