The Poisson Regression Model
The Poisson Regression Model
The Poisson regression model aims at modeling a counting variable Y , counting the
number of times that a certain event occurs during a given time period. We observe
a sample Y1 , . . . , Yn . Here, Yi can stand for the number of car accidents that person i
has during the last 5 years; the number of children of family i; the number of strikes
in company i over the last 3 years; the number of brevets deposed by firm i during
the last year (as a measure of innovation); .... The Poisson regression model wants to
explain this counting variable Yi using explicative variables xi , for 1 ≤ i ≤ n. This
p-dimensional variable xi contains characteristics for the i th observation.
exp(−λ)λk
P (Y = k) = , (1)
k!
for k = 0, 1, 2, . . . , We recall that for a Poisson variable:
The Poisson distribution is discrete distribution, and we see the shape of its distribution
in Figure 1, for several values of λ. In Figure 1, the distribution is visualized by plotting
P (Y = k) versus k. For low values of λ, the distribution is highly skewed. For large
values of λ, the distribution of Y looks more normal. In the examples given above,
Yi counts rather rare event, so that the value of λ will be rather small. For example,
we have high probabilities of having no or one car accident, but the probabilities of
having several car accidents decay exponentially fast. The Poisson distribution is the
most simple distribution for modeling counting data, but it is not the only one.
The use of the exponential function in (3) assures that the right hand side in the above
equation is always positive, as is the expected value of the counting variable Yi in the
left hand side of the above equation. The choice for this exponential “link” function
is mainly for reasons of simplicity. In principle, other “link” functions returning only
positive values could be used, but then we do not speak about a Poisson regression
model anymore.
lambda=0.5 lambda=1
0.6
0.3
0.4
0.2
p
p
0.2
0.1
0.0
0.0
0 5 10 15 20 0 5 10 15 20
k k
lambda=3 lambda=10
0.12
0.20
0.15
0.08
p
p
0.10
0.04
0.05
0.0
0.0
0 5 10 15 20 0 5 10 15 20
k k
The marginal effect of the first explicative variable on the expected value of Yi , keeping
the other variables constant, is given by
∂E[Yi |xi ]
= β1 exp(β1 xi1 + β2 xi2 + β3 ).
∂xi1
We see that β1 has the same sign as this marginal effect, but the numerical value of
the effect depends on the value of xi . We could summarize the marginal effects by
replacing in the above equation xi1 an xi2 by average values of the explicative variables
over the whole sample. It is also possible to interpret β1 as a semi-elasticity:
∂ log E[Yi |xi ]
= β1 .
∂xi1
It is instructive to compute the first order condition that the ML-estimator needs
to fulfill. Derivation of (5) yields
n
X
(yi − ŷi )xi = 0,
i=1
t
with ŷi = exp(β̂M L xi ) the fitted value of yi . The predicted/fitted value has as usual
been taken as the estimated value of E[Yi |xi ]. This first order condition tells us that
the vector of residual is orthogonal to the vectors of explicative variables.
The advantage of the Maximum Likelihood framework is that a formula for cov(β̂M L )
is readily available: Ã ! n −1
X
cov(β̂M L ) = xi xti ŷi
i=1
Also, Hypothesis tests can now be carried by Wald test, Lagrange Multiplier test, or
Likelihood Ratio tests.
implying that the conditional mean function equals the condition variance function.
This is very restrictive. If E[Yi |xi ] < Var[Yi |xi ], respectively E[Yi |xi ] > Var[Yi |xi ], then
we speak about overdispersion, respectively underdispersion. The Poisson model does
not allow for over- or underdispersion. A richer model is obtained by using the negative
binomial distribution instead of the Poisson distribution. Instead of (4), we then use
à !yi à !θ
Γ(θ + yi ) λi λi
P (Yi = yi |β, xi ) = 1− .
Γ(yi + 1)Γ(θ) λi + θ λi + θ
This negative binomial distribution can be shown to have conditional mean λi and
conditional variance λi (1 + η 2 λi ), with η 2 := 1/θ. Note that the parameter η 2 is not
allowed to vary over the observations. As before, the conditional mean function is
modeled as
E[Yi |xi ] = λi = exp(β t xi ).
The conditional variance function is then given by
Using maximum likelihood, we can then estimate the regression parameter β, and
also the extra parameter η. The parameter η measures the degree of over (or under)
dispersion. The limit case η = 0 corresponds to the Poisson model.
5 Homework
We are interested in the number of accidents per service month for a sample of ships.
The data can be found in the file “ships.wmf”. The endogenous variable is called ACC.
The explicative variables are:
• CONSTRUCTION YEAR: the ships are constructed in one of four periods, lead-
ing to the dummy variables T6064, T6569, T7074, and T7579.
• SERVICE: a measure for the amount of service that the ship has already carried
out.
Questions:
1. Make an histogram of the variable ACC. Comment on its form. It this the
histogram for the conditional of unconditional distribution of ACC?
2. Estimate the Poisson regression model, including all explicative variables and a
constant term. (Use estimation method: COUNT- integer counting data).
4. Perform a Wald test to test for the joint significance of the construction year
dummy variables.
7. What do we learn from the value of “Probability(LR stat)”? What is the corre-
sponding null hypothesis?
8. Estimate now a Negative Binomial Model. EViews reports the log(η 2 ) as the
mixture parameter in the estimation output. (a) Compare the estimates of β
given by the two models. (b) Compare the pseudo R2 values of the two models.
9. Estimate now the Poisson model with only a constant term, so without explicative
variables (empty model). Derive mathematically a formula for this estimate of
the constant term (in the empty model), using the first order condition of the
ML-estimator.