Lecture 27 - Poisson Regression: I TH 1 N I I I TH I TH

The document discusses Poisson regression, a statistical model used when the response variable takes integer count values. It describes how the Poisson regression model relates the rate parameter λ to covariates using a log-linear function. Maximum likelihood estimation is used to estimate the regression coefficients, and the Fisher information matrix is derived to estimate standard errors. The document notes that while the Poisson model assumes equal mean and variance, it can still be used when the variance exceeds the mean, with robust standard error estimates. Poisson regression is presented as an example of generalized linear models.

Uploaded by

Madina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Lecture 27 - Poisson Regression: I TH 1 N I I I TH I TH

Uploaded by

Madina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

STATS 200: Introduction to Statistical Inference Autumn 2016

Lecture 27 — Poisson regression

27.1 The Poisson log-linear model

Example 27.1. Neurons in the central nervous system transmit signals via a series of action
potentials, or “spikes”. The spiking of a single neuron may be measured by a microelectrode,
and its sequence of spikes over time is called a spike train. A simple and commonly-used
statistical model for a spike train is an inhomogeneous Poisson point process, which has
the following property: For n time windows of length ∆, letting Yi denote the number of
spikes generated by the neuron in the ith time window, the random variables Y1 , . . . , Yn are
independent and distributed as Yi ∼ Poisson(λi ∆), where the parameter λi controls the
spiking rate in the ith time window. For simplicity, we will assume ∆ = 1.
The spiking rate λi of a neuron may be influenced by external sensory stimuli present in
this ith window of time, for example the intensity and pattern of light visible to the eye or
the texture of an object presented to the touch. To understand the effects of these sensory
stimuli on the spiking rate of a particular neuron, we may perform an experiment that applies
different stimuli in different windows of time and records the neural response. Encoding the
stimuli applied in the ith window of time by a set of p covariates xi1 , . . . , xip , a simple model
for the Poisson rate parameter λi is given by

log λi = β0 + β1 xi1 + . . . + βp xip , (27.1)

or equivalently,
λi = eβ0 +β1 xi1 +...+βp xip .
Together with the distributional assumption Yi ∼ Poisson(λi ), this is called the Poisson
log-linear model, or the Poisson regression model. It is a special case of what is known in
neuroscience as the linear-nonlinear Poisson cascade model.

More generally, the Poisson log-linear model is a model for n responses Y1 , . . . , Yn that
take integer count values. Each Yi is modeled as an independent Poisson(λi ) random variable,
where log λi is a linear combination of the covariates corresponding to the ith observation.
As in the cases of linear and logistic regression, we treat the covariates as fixed constants,
and the model parameters to be inferred are the regression coefficients β = (β0 , . . . , βp ).

27.2 Statistical inference

We will describe the procedure for maximum-likelihood estimation of the regression coeffi-
cients and Fisher-information based estimation of their standard errors, and discuss some
issues concerning model misspecification and robust standard error estimates.

27-1
Since Y1 , . . . , Yn are independent Poisson random variables, the likelihood function is
given by
n
Y λYi i e−λi
lik(β0 , . . . , βp ) =
i=1
Yi !
where λi is defined in terms of β0 , . . . , βp and the covariates xi1 , . . . , xip via equation (27.1).
Setting xi0 ≡ 1 for all i, the log-likelihood is then
n
X
l(β0 , . . . , βp ) = Yi log λi − λi − log Yi !
i=1
p
n
!
X X Pp
βj xij
= Yi βj xij −e j=0 − log Yi !
i=1 j=0

and the MLEs are the solutions to the system of score equations, for m = 0, . . . , p,
n
∂l X Pp
0= = xim (Yi − e j=0 βj xij ).
∂βm i=1

These equations may be solved numerically using the Newton-Raphson method.

The Fisher information matrix IY (β) = −Eβ [∇2 l(β)] may be obtained by computing the
second-order partial derivatives of l:
n
∂ 2l X Pp
=− xim xil e j=0 βj xij .
∂βm ∂βl i=1

Writing Xj = (x1j , . . . , xnj ) as the jth column of the covariate matrix X and defining the
diagonal matrix Pp
Pp
βj x1j βj xnj
W = W (β) := diag e j=0 ,...,e j=0 ,
∂2l T
the above may be written as ∂βm ∂βl
= −Xm W Xl , so ∇2 l(β) = −X T W X and IY (β) =
X T W X. For large n, if the Poisson log-linear model is correct, then the MLE vector β̂ is
approximately distributed as N (β, (X T W X)−1 ). We may then estimate the standard error
of β̂j by q
se
ˆj = ((X T Ŵ X)−1 )jj ,

where Ŵ = W (β̂) is the plugin estimate for W . These formulas are the same as for the case
of logistic regression in Lecture 26, except with a different form of the diagonal matrix W .
The modeling assumption of a Poisson distribution for Yi is rather restrictive, as it implies
that the variance of Yi must be equal to its mean. This is rarely true in practice, and it is
frequently the case that the observed variance of Yi is larger than its mean—this problem is
known as overdispersion. Nonetheless, the Poisson regression model is oftentimes used in
overdispersed settings: As long as Y1 , . . . , Yn are independent and

log E[Yi ] = β0 + β1 xi1 + . . . + βp xip

27-2
for each i (so the model for the means of the Yi ’s is correct), then it may be shown that the
MLE β̂ in the Poisson regression model is unbiased for β, even if the distribution of Yi is
not Poisson and the variance of Yi exceeds its mean. The above standard error estimate se ˆj
and the associated confidence interval for βj , though, would not correct in the overdispersed
setting. One may use instead the robust sandwich estimate of the covariance of β̂, given by
(X T Ŵ X)−1 (X T W̃ X)(X T Ŵ X)−1
where
W̃ = diag((Y1 − λ̂1 )2 , . . . , (Yn − λ̂n )2 )
Pp
and λ̂i = e j=0 β̂j xij is the fitted value of λ for the ith observation. Alternatively, one may
use the pairs bootstrap procedure as described in Lecture 26.
Remark 27.2. The linear model, logistic regression model, and Poisson regression model
are all examples of the generalized linear model (GLM). In a generalized linear model,
Y1 , . . . , Yn are modeled as independent observations with distributions Yi ∼ f (y|θi ) for some
one-parameter family f (y|θ). The parameter θi is modeled as
g(θi ) = β0 + β1 xi1 + . . . + βp xip
for some one-to-one transformation g : R → R called the link function, where xi1 , . . . , xip
are covariates corresponding to Yi . In the linear model considered in Lecture 25, the pa-
rameter was θ ≡ µ where f (y|µ) was the PDF of the N (µ, σ02 ) distribution (for a known
variance σ02 ), and g(µ) = µ. In logistic regression, the parameter was θ ≡ p where f (y|p)
p
was the PMF of the Bernoulli(p) distribution, and g(p) = log 1−p . In Poisson regression,
the parameter was θ ≡ λ where f (y|λ) was the PMF of the Poisson(λ) distribution, and
g(λ) = log λ.
The choice of the link function g is an important modeling decision, as it determines which
transform of the model parameter should be modeled as linear in the observed covariates.
In each of the three examples discussed, we used what is called the natural link, which
is motivated by considering a change-of-variable for the parameter, θ 7→ η(θ), so that the
PDF/PMF f (y|η) in terms of the new parameter η has the form
f (y|η) = eηy−A(η) h(y)
for some functions A and h. For example, the Bernoulli PMF is
y
p p
y
f (y) = p (1 − p) 1−y
= (1 − p) = e(log 1−p )y+log(1−p) ,
1−p
p
so we may set η = log 1−p , A(η) = − log(1 − p) = log(1 + eη ), and h(y) = 1. This is called
the exponential family form of the PDF/PMF, and η is called the natural parameter.
In each example, the natural link simply sets g(θ) = η(θ) (or equivalently, g(θ) = cη(θ) for
a constant c).
Use of the natural link leads to some nice mathematical properties for likelihood-based
inference—for instance, since η is modeled as linear in β, the second-order partial derivatives
of
log f (Y |η) = ηY − A(η) + log h(Y )

27-3
with respect to β do not depend on Y , so the Fisher information is always given by −∇2 l(β)
without needing to take an expectation. (We sometimes say in this case that the “observed
and expected Fisher information matrices” are the same.) On the other hand, from the
modeling perspective, there is usually no intrinsic reason to believe that the natural link
g(θ) = η(θ) is the correct transformation of θ that is well-modeled as a linear combination
of the covariates, and other link functions are also commonly used, especially if they lead to
a better fit for the data.

27-4