0% found this document useful (0 votes)
67 views6 pages

The Poisson Regression Model

The document discusses Poisson regression models. It begins by introducing Poisson distributions and how they can be used to model count variables like car accidents or number of children. It then explains how Poisson regression models link explanatory variables to the expected value of the count variable using an exponential function. Maximum likelihood estimation is used to estimate the model parameters. The document concludes by discussing how overdispersion can be addressed using negative binomial regression models instead of Poisson models.

Uploaded by

max
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views6 pages

The Poisson Regression Model

The document discusses Poisson regression models. It begins by introducing Poisson distributions and how they can be used to model count variables like car accidents or number of children. It then explains how Poisson regression models link explanatory variables to the expected value of the count variable using an exponential function. Maximum likelihood estimation is used to estimate the model parameters. The document concludes by discussing how overdispersion can be addressed using negative binomial regression models instead of Poisson models.

Uploaded by

max
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The Poisson Regression Model

The Poisson regression model aims at modeling a counting variable Y , counting the
number of times that a certain event occurs during a given time period. We observe
a sample Y1 , . . . , Yn . Here, Yi can stand for the number of car accidents that person i
has during the last 5 years; the number of children of family i; the number of strikes
in company i over the last 3 years; the number of brevets deposed by firm i during
the last year (as a measure of innovation); .... The Poisson regression model wants to
explain this counting variable Yi using explicative variables xi , for 1 ≤ i ≤ n. This
p-dimensional variable xi contains characteristics for the i th observation.

1 The Poisson distribution


By definition, Y follows a Poisson distribution with parameter λ if and only if

exp(−λ)λk
P (Y = k) = , (1)
k!
for k = 0, 1, 2, . . . , We recall that for a Poisson variable:

E[Y ] = λ and Var[Y ] = λ. (2)

The Poisson distribution is discrete distribution, and we see the shape of its distribution
in Figure 1, for several values of λ. In Figure 1, the distribution is visualized by plotting
P (Y = k) versus k. For low values of λ, the distribution is highly skewed. For large
values of λ, the distribution of Y looks more normal. In the examples given above,
Yi counts rather rare event, so that the value of λ will be rather small. For example,
we have high probabilities of having no or one car accident, but the probabilities of
having several car accidents decay exponentially fast. The Poisson distribution is the
most simple distribution for modeling counting data, but it is not the only one.

2 The Poisson regression model


Like in a linear regression model, we will model the conditional mean function using a
linear combination β t xi of the explicative variables:

E[Yi |xi ] = exp(β t xi ). (3)

The use of the exponential function in (3) assures that the right hand side in the above
equation is always positive, as is the expected value of the counting variable Yi in the
left hand side of the above equation. The choice for this exponential “link” function
is mainly for reasons of simplicity. In principle, other “link” functions returning only
positive values could be used, but then we do not speak about a Poisson regression
model anymore.
lambda=0.5 lambda=1
0.6

0.3
0.4

0.2
p

p
0.2

0.1
0.0

0.0

0 5 10 15 20 0 5 10 15 20

k k

lambda=3 lambda=10
0.12
0.20
0.15

0.08
p

p
0.10

0.04
0.05
0.0

0.0

0 5 10 15 20 0 5 10 15 20

k k

The Poisson distribution for different values of λ


Moreover, to be able to use the Maximum Likelihood framework, we will specify a
distribution for Yi , given the explicative variables xi . We ask that every Yi , conditional
on xi , follows a Poisson distribution with parameter λi . Equations (2) and (3) give

E[Yi |xi ] = λi = exp(β t xi ).

Aim is then to estimate β, the unknown parameter in the model.


Note that estimation of β induces an estimate of the whole conditional distribution
of Yi given xi . This will allow us to estimate quantities like P (Yi = 0|xi ), P (Yi >
5|xi ), .... So we will be able to answer to questions like “What is the probability that
somebody will have no single car accidents during a 5 year period, given the persons
characteristics xi ”, “What is the probability that a family, given its characteristics xi ,
has more than 5 children”, ...

Interpretation of the parameters:


Knowledge about β allows us to know the influence of an explicative variable on
the expected value of Yi . Suppose for example that we have xi = (xi1 , xi2 , 1)t . Then
the Poisson regression model gives

E[Yi |xi ] = exp(β1 xi1 + β2 xi2 + β3 ).

The marginal effect of the first explicative variable on the expected value of Yi , keeping
the other variables constant, is given by
∂E[Yi |xi ]
= β1 exp(β1 xi1 + β2 xi2 + β3 ).
∂xi1
We see that β1 has the same sign as this marginal effect, but the numerical value of
the effect depends on the value of xi . We could summarize the marginal effects by
replacing in the above equation xi1 an xi2 by average values of the explicative variables
over the whole sample. It is also possible to interpret β1 as a semi-elasticity:
∂ log E[Yi |xi ]
= β1 .
∂xi1

3 The Maximum Likelihood estimator


We observe data {(xi , yi )|1 ≤ i ≤ n}. The number yi is a realization of the random
variable Yi . The total log-likelihood is, using independency, given by
n
X
Log L(y1 , . . . , yn |β, x1 , . . . , xn ) = log P (Yi = yi |β, xi ),
i=1

with, according to (1),


exp(−λi )λyi i
P (Yi = yi |β, xi ) = (4)
yi !
and λi = exp(β t xi ). Write now Log L(β) as shorthand notation for the total likelihood.
Then it follows
n
X
Log L(β) = {− exp(β t xi ) + yi (β t xi ) − log(yi !)}. (5)
i=1

The maximum likelihood (ML) estimator is then of course defined as

β̂M L = argmax Log L(β).


β

It is instructive to compute the first order condition that the ML-estimator needs
to fulfill. Derivation of (5) yields
n
X
(yi − ŷi )xi = 0,
i=1

t
with ŷi = exp(β̂M L xi ) the fitted value of yi . The predicted/fitted value has as usual
been taken as the estimated value of E[Yi |xi ]. This first order condition tells us that
the vector of residual is orthogonal to the vectors of explicative variables.
The advantage of the Maximum Likelihood framework is that a formula for cov(β̂M L )
is readily available: Ã ! n −1
X
cov(β̂M L ) = xi xti ŷi
i=1

Also, Hypothesis tests can now be carried by Wald test, Lagrange Multiplier test, or
Likelihood Ratio tests.

4 Overdispersion and the Negative binomial model


If we believe the Poisson regression model, then we have

E[Yi |xi ] = Var[Yi |xi ],

implying that the conditional mean function equals the condition variance function.
This is very restrictive. If E[Yi |xi ] < Var[Yi |xi ], respectively E[Yi |xi ] > Var[Yi |xi ], then
we speak about overdispersion, respectively underdispersion. The Poisson model does
not allow for over- or underdispersion. A richer model is obtained by using the negative
binomial distribution instead of the Poisson distribution. Instead of (4), we then use
à !yi à !θ
Γ(θ + yi ) λi λi
P (Yi = yi |β, xi ) = 1− .
Γ(yi + 1)Γ(θ) λi + θ λi + θ
This negative binomial distribution can be shown to have conditional mean λi and
conditional variance λi (1 + η 2 λi ), with η 2 := 1/θ. Note that the parameter η 2 is not
allowed to vary over the observations. As before, the conditional mean function is
modeled as
E[Yi |xi ] = λi = exp(β t xi ).
The conditional variance function is then given by

Var[Yi |xi ] = exp(β t xi )(1 + η 2 exp(β t xi )).

Using maximum likelihood, we can then estimate the regression parameter β, and
also the extra parameter η. The parameter η measures the degree of over (or under)
dispersion. The limit case η = 0 corresponds to the Poisson model.

Appendix: The Gamma function


The Gamma function is defined as
Z ∞
Γ(x) = sx−1 exp(−s)dx
0

for every x > 0. Its most important properties are

1. Γ(k + 1) = k! for every k = 0, 1, 2, 3, . . .

2. Γ(x + 1) = xΓ(x) for every x > 0.



3. Γ(0.5) = π

The Gamma function can be seen as an extension of the factorial function k → k! =


k(k − 1)(k − 2) . . . .... to all real positive numbers. The Gamma function is increasing
faster to infinity than any polynomial function or even exponential function.

5 Homework
We are interested in the number of accidents per service month for a sample of ships.
The data can be found in the file “ships.wmf”. The endogenous variable is called ACC.
The explicative variables are:

• TYPE: there are 5 ship types, labeled as A-B-C-D-E or 1-2-3-4-5. TYPE is a


categorical variable, and 5 dummy variables can be created: TA, TB, TC, TD,
TE.

• CONSTRUCTION YEAR: the ships are constructed in one of four periods, lead-
ing to the dummy variables T6064, T6569, T7074, and T7579.

• SERVICE: a measure for the amount of service that the ship has already carried
out.

Questions:
1. Make an histogram of the variable ACC. Comment on its form. It this the
histogram for the conditional of unconditional distribution of ACC?

2. Estimate the Poisson regression model, including all explicative variables and a
constant term. (Use estimation method: COUNT- integer counting data).

3. Comment on the coefficient for the variable SERVICE. Is it significant?

4. Perform a Wald test to test for the joint significance of the construction year
dummy variables.

5. Given a ship of category A, constructed in the period 65-69, with SERVICE=1000.


Predict the number of accidents per service month. Also estimate (a) the prob-
ability that no accident will occur for this ship, and (b) the probability that at
most one accident will occur.

6. The computer output mentions: “Convergence achieved after 9 iterations”. What


is this meaning?

7. What do we learn from the value of “Probability(LR stat)”? What is the corre-
sponding null hypothesis?

8. Estimate now a Negative Binomial Model. EViews reports the log(η 2 ) as the
mixture parameter in the estimation output. (a) Compare the estimates of β
given by the two models. (b) Compare the pseudo R2 values of the two models.

9. Estimate now the Poisson model with only a constant term, so without explicative
variables (empty model). Derive mathematically a formula for this estimate of
the constant term (in the empty model), using the first order condition of the
ML-estimator.

You might also like