0% found this document useful (0 votes)

67 views6 pages

The Poisson Regression Model

The document discusses Poisson regression models. It begins by introducing Poisson distributions and how they can be used to model count variables like car accidents or number of children. It then explains how Poisson regression models link explanatory variables to the expected value of the count variable using an exponential function. Maximum likelihood estimation is used to estimate the model parameters. The document concludes by discussing how overdispersion can be addressed using negative binomial regression models instead of Poisson models.

Uploaded by

max

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views6 pages

The Poisson Regression Model

Uploaded by

max

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

The Poisson Regression Model

The Poisson regression model aims at modeling a counting variable Y , counting the
number of times that a certain event occurs during a given time period. We observe
a sample Y1 , . . . , Yn . Here, Yi can stand for the number of car accidents that person i
has during the last 5 years; the number of children of family i; the number of strikes
in company i over the last 3 years; the number of brevets deposed by firm i during
the last year (as a measure of innovation); .... The Poisson regression model wants to
explain this counting variable Yi using explicative variables xi , for 1 ≤ i ≤ n. This
p-dimensional variable xi contains characteristics for the i th observation.

1 The Poisson distribution

By definition, Y follows a Poisson distribution with parameter λ if and only if

exp(−λ)λk
P (Y = k) = , (1)
k!
for k = 0, 1, 2, . . . , We recall that for a Poisson variable:

E[Y ] = λ and Var[Y ] = λ. (2)

The Poisson distribution is discrete distribution, and we see the shape of its distribution
in Figure 1, for several values of λ. In Figure 1, the distribution is visualized by plotting
P (Y = k) versus k. For low values of λ, the distribution is highly skewed. For large
values of λ, the distribution of Y looks more normal. In the examples given above,
Yi counts rather rare event, so that the value of λ will be rather small. For example,
we have high probabilities of having no or one car accident, but the probabilities of
having several car accidents decay exponentially fast. The Poisson distribution is the
most simple distribution for modeling counting data, but it is not the only one.

2 The Poisson regression model

Like in a linear regression model, we will model the conditional mean function using a
linear combination β t xi of the explicative variables:

E[Yi |xi ] = exp(β t xi ). (3)

The use of the exponential function in (3) assures that the right hand side in the above
equation is always positive, as is the expected value of the counting variable Yi in the
left hand side of the above equation. The choice for this exponential “link” function
is mainly for reasons of simplicity. In principle, other “link” functions returning only
positive values could be used, but then we do not speak about a Poisson regression
model anymore.
lambda=0.5 lambda=1
0.6

0.3
0.4

0.2
p

p
0.2

0.1
0.0

0.0

0 5 10 15 20 0 5 10 15 20

k k

lambda=3 lambda=10
0.12
0.20
0.15

0.08
p

p
0.10

0.04
0.05
0.0

0.0

0 5 10 15 20 0 5 10 15 20

k k

The Poisson distribution for different values of λ

Moreover, to be able to use the Maximum Likelihood framework, we will specify a
distribution for Yi , given the explicative variables xi . We ask that every Yi , conditional
on xi , follows a Poisson distribution with parameter λi . Equations (2) and (3) give

E[Yi |xi ] = λi = exp(β t xi ).

Aim is then to estimate β, the unknown parameter in the model.

Note that estimation of β induces an estimate of the whole conditional distribution
of Yi given xi . This will allow us to estimate quantities like P (Yi = 0|xi ), P (Yi >
5|xi ), .... So we will be able to answer to questions like “What is the probability that
somebody will have no single car accidents during a 5 year period, given the persons
characteristics xi ”, “What is the probability that a family, given its characteristics xi ,
has more than 5 children”, ...

Interpretation of the parameters:

Knowledge about β allows us to know the influence of an explicative variable on
the expected value of Yi . Suppose for example that we have xi = (xi1 , xi2 , 1)t . Then
the Poisson regression model gives

E[Yi |xi ] = exp(β1 xi1 + β2 xi2 + β3 ).

The marginal effect of the first explicative variable on the expected value of Yi , keeping
the other variables constant, is given by
∂E[Yi |xi ]
= β1 exp(β1 xi1 + β2 xi2 + β3 ).
∂xi1
We see that β1 has the same sign as this marginal effect, but the numerical value of
the effect depends on the value of xi . We could summarize the marginal effects by
replacing in the above equation xi1 an xi2 by average values of the explicative variables
over the whole sample. It is also possible to interpret β1 as a semi-elasticity:
∂ log E[Yi |xi ]
= β1 .
∂xi1

3 The Maximum Likelihood estimator

We observe data {(xi , yi )|1 ≤ i ≤ n}. The number yi is a realization of the random
variable Yi . The total log-likelihood is, using independency, given by
n
X
Log L(y1 , . . . , yn |β, x1 , . . . , xn ) = log P (Yi = yi |β, xi ),
i=1

with, according to (1),

exp(−λi )λyi i
P (Yi = yi |β, xi ) = (4)
yi !
and λi = exp(β t xi ). Write now Log L(β) as shorthand notation for the total likelihood.
Then it follows
n
X
Log L(β) = {− exp(β t xi ) + yi (β t xi ) − log(yi !)}. (5)
i=1

The maximum likelihood (ML) estimator is then of course defined as

β̂M L = argmax Log L(β).

It is instructive to compute the first order condition that the ML-estimator needs
to fulfill. Derivation of (5) yields
n
X
(yi − ŷi )xi = 0,
i=1

t
with ŷi = exp(β̂M L xi ) the fitted value of yi . The predicted/fitted value has as usual
been taken as the estimated value of E[Yi |xi ]. This first order condition tells us that
the vector of residual is orthogonal to the vectors of explicative variables.
The advantage of the Maximum Likelihood framework is that a formula for cov(β̂M L )
is readily available: Ã ! n −1
X
cov(β̂M L ) = xi xti ŷi
i=1

Also, Hypothesis tests can now be carried by Wald test, Lagrange Multiplier test, or
Likelihood Ratio tests.

4 Overdispersion and the Negative binomial model

If we believe the Poisson regression model, then we have

E[Yi |xi ] = Var[Yi |xi ],

implying that the conditional mean function equals the condition variance function.
This is very restrictive. If E[Yi |xi ] < Var[Yi |xi ], respectively E[Yi |xi ] > Var[Yi |xi ], then
we speak about overdispersion, respectively underdispersion. The Poisson model does
not allow for over- or underdispersion. A richer model is obtained by using the negative
binomial distribution instead of the Poisson distribution. Instead of (4), we then use
Ã !yi Ã !θ
Γ(θ + yi ) λi λi
P (Yi = yi |β, xi ) = 1− .
Γ(yi + 1)Γ(θ) λi + θ λi + θ
This negative binomial distribution can be shown to have conditional mean λi and
conditional variance λi (1 + η 2 λi ), with η 2 := 1/θ. Note that the parameter η 2 is not
allowed to vary over the observations. As before, the conditional mean function is
modeled as
E[Yi |xi ] = λi = exp(β t xi ).
The conditional variance function is then given by

Var[Yi |xi ] = exp(β t xi )(1 + η 2 exp(β t xi )).

Using maximum likelihood, we can then estimate the regression parameter β, and
also the extra parameter η. The parameter η measures the degree of over (or under)
dispersion. The limit case η = 0 corresponds to the Poisson model.

Appendix: The Gamma function

The Gamma function is defined as
Z ∞
Γ(x) = sx−1 exp(−s)dx
0

for every x > 0. Its most important properties are

1. Γ(k + 1) = k! for every k = 0, 1, 2, 3, . . .

2. Γ(x + 1) = xΓ(x) for every x > 0.

√
3. Γ(0.5) = π

The Gamma function can be seen as an extension of the factorial function k → k! =

k(k − 1)(k − 2) . . . .... to all real positive numbers. The Gamma function is increasing
faster to infinity than any polynomial function or even exponential function.

5 Homework
We are interested in the number of accidents per service month for a sample of ships.
The data can be found in the file “ships.wmf”. The endogenous variable is called ACC.
The explicative variables are:

• TYPE: there are 5 ship types, labeled as A-B-C-D-E or 1-2-3-4-5. TYPE is a

categorical variable, and 5 dummy variables can be created: TA, TB, TC, TD,
TE.

• CONSTRUCTION YEAR: the ships are constructed in one of four periods, lead-
ing to the dummy variables T6064, T6569, T7074, and T7579.

• SERVICE: a measure for the amount of service that the ship has already carried
out.

Questions:
1. Make an histogram of the variable ACC. Comment on its form. It this the
histogram for the conditional of unconditional distribution of ACC?

2. Estimate the Poisson regression model, including all explicative variables and a
constant term. (Use estimation method: COUNT- integer counting data).

3. Comment on the coefficient for the variable SERVICE. Is it significant?

4. Perform a Wald test to test for the joint significance of the construction year
dummy variables.

5. Given a ship of category A, constructed in the period 65-69, with SERVICE=1000.

Predict the number of accidents per service month. Also estimate (a) the prob-
ability that no accident will occur for this ship, and (b) the probability that at
most one accident will occur.

6. The computer output mentions: “Convergence achieved after 9 iterations”. What

is this meaning?

7. What do we learn from the value of “Probability(LR stat)”? What is the corre-
sponding null hypothesis?

8. Estimate now a Negative Binomial Model. EViews reports the log(η 2 ) as the
mixture parameter in the estimation output. (a) Compare the estimates of β
given by the two models. (b) Compare the pseudo R2 values of the two models.

9. Estimate now the Poisson model with only a constant term, so without explicative
variables (empty model). Derive mathematically a formula for this estimate of
the constant term (in the empty model), using the first order condition of the
ML-estimator.

9781838826321-Managing Data Science
100% (7)
9781838826321-Managing Data Science
276 pages
MA 5160 Applied Probability and Statistics 2 Mark
No ratings yet
MA 5160 Applied Probability and Statistics 2 Mark
3 pages
Forecast Tool For R
No ratings yet
Forecast Tool For R
121 pages
Smriti Sharma
No ratings yet
Smriti Sharma
55 pages
Lecture Notes For Stat
No ratings yet
Lecture Notes For Stat
79 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 2 - Descriptive Analytics
No ratings yet
Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 2 - Descriptive Analytics
92 pages
Classic Topics On The History of Modern Mathematical Statistics: From Laplace To More Recent Times 1st Edition Prakash Gorroochurn PDF Download
No ratings yet
Classic Topics On The History of Modern Mathematical Statistics: From Laplace To More Recent Times 1st Edition Prakash Gorroochurn PDF Download
69 pages
Tutorial 106b - Poisson Regression and Log-Linear Models (Bayesian)
No ratings yet
Tutorial 106b - Poisson Regression and Log-Linear Models (Bayesian)
122 pages
Poisson
No ratings yet
Poisson
54 pages
Module01 ProbabilityAndHypothesisTesting
No ratings yet
Module01 ProbabilityAndHypothesisTesting
62 pages
Prism v3 User Guide
No ratings yet
Prism v3 User Guide
108 pages
Review Statistics
No ratings yet
Review Statistics
24 pages
Cópia de Aula5 - Contagem
No ratings yet
Cópia de Aula5 - Contagem
28 pages
9 Mle
No ratings yet
9 Mle
39 pages
Bio24 Rathouz
No ratings yet
Bio24 Rathouz
45 pages
L19 CountDataModels v2
No ratings yet
L19 CountDataModels v2
36 pages
Lect 12
No ratings yet
Lect 12
36 pages
ABD Formulas
No ratings yet
ABD Formulas
55 pages
7 Generalized Linear Models Padua
No ratings yet
7 Generalized Linear Models Padua
29 pages
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
No ratings yet
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
25 pages
Section 5
No ratings yet
Section 5
18 pages
10.4324 9780203157688 Previewpdf
No ratings yet
10.4324 9780203157688 Previewpdf
36 pages
Chap11 Generalized Linear Models For Nonnormal Response
No ratings yet
Chap11 Generalized Linear Models For Nonnormal Response
41 pages
Introduction To The Canback Global Income Distribution Database (C-GIDD)
No ratings yet
Introduction To The Canback Global Income Distribution Database (C-GIDD)
41 pages
Categorical Notes Ch1
No ratings yet
Categorical Notes Ch1
18 pages
4glm3 Ha Online
No ratings yet
4glm3 Ha Online
51 pages
Chapter 6
No ratings yet
Chapter 6
24 pages
CIS2205-24-25-Assignment 2
No ratings yet
CIS2205-24-25-Assignment 2
10 pages
Development of Empirical Models From Process Data: - An Attractive Alternative
No ratings yet
Development of Empirical Models From Process Data: - An Attractive Alternative
27 pages
Count Models Poisson NB
No ratings yet
Count Models Poisson NB
10 pages
The Influence of Flash Sales and Free Shipping On
No ratings yet
The Influence of Flash Sales and Free Shipping On
12 pages
Number of Dengue Modelling
No ratings yet
Number of Dengue Modelling
11 pages
1 s2.0 S0378375805000285 Main
No ratings yet
1 s2.0 S0378375805000285 Main
14 pages
Countdata2018 2
No ratings yet
Countdata2018 2
23 pages
Shorten - Count Data Analysis
No ratings yet
Shorten - Count Data Analysis
24 pages
Chap1 Introduction 2may24
No ratings yet
Chap1 Introduction 2may24
21 pages
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Statistical Models
26 pages
Glmcourse 003
No ratings yet
Glmcourse 003
49 pages
Eva Output
No ratings yet
Eva Output
24 pages
Countreg
No ratings yet
Countreg
11 pages
Week 6 Homework - Chapter 12
No ratings yet
Week 6 Homework - Chapter 12
58 pages
00 1-S2.0-S2405844023016882-Main
No ratings yet
00 1-S2.0-S2405844023016882-Main
17 pages
ML Exp 8
No ratings yet
ML Exp 8
22 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
Poisson Regression Models
No ratings yet
Poisson Regression Models
14 pages
Likelihood and MLE
No ratings yet
Likelihood and MLE
13 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Problem Set 1 Sol
No ratings yet
Problem Set 1 Sol
7 pages
Effect of Supplier Relationship Management On Procurement Performance of Coast Water Service Board
No ratings yet
Effect of Supplier Relationship Management On Procurement Performance of Coast Water Service Board
18 pages
Copula Regression Parsa Klugman
No ratings yet
Copula Regression Parsa Klugman
10 pages
Probability & Stats
No ratings yet
Probability & Stats
10 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
AML Statistics
No ratings yet
AML Statistics
36 pages
STAT2102 Chapter6
No ratings yet
STAT2102 Chapter6
5 pages
BSCS-243 & 239 2nd & 3rd Semester - Fall 2024
No ratings yet
BSCS-243 & 239 2nd & 3rd Semester - Fall 2024
8 pages
03b.session Notes On Dummy Variable Regression
No ratings yet
03b.session Notes On Dummy Variable Regression
5 pages
AMR Assignment Pilgrim Bank: Indian Institute of Management Raipur
No ratings yet
AMR Assignment Pilgrim Bank: Indian Institute of Management Raipur
20 pages
Lecture 10 Spring 2017
No ratings yet
Lecture 10 Spring 2017
15 pages
Generalized Linear Models: Ariel Alonso Abad
No ratings yet
Generalized Linear Models: Ariel Alonso Abad
43 pages
SSPSS Data Analysis Examples Poisson Regression
No ratings yet
SSPSS Data Analysis Examples Poisson Regression
34 pages
Exercises 2
No ratings yet
Exercises 2
10 pages
Mod I-II - III - Study Material BL 4 - 5 - 6
No ratings yet
Mod I-II - III - Study Material BL 4 - 5 - 6
7 pages
Assignment - DBB2102 - BBA 3 - Set-1 and 2 - Nov - 2023
No ratings yet
Assignment - DBB2102 - BBA 3 - Set-1 and 2 - Nov - 2023
10 pages
Generalized Linear Models-1
No ratings yet
Generalized Linear Models-1
29 pages
Poisson Regression
No ratings yet
Poisson Regression
3 pages
10E-Poisson Regression
No ratings yet
10E-Poisson Regression
19 pages
Answer Keys - Excercise Questions-Ch10
No ratings yet
Answer Keys - Excercise Questions-Ch10
4 pages
Modeling Count Data
No ratings yet
Modeling Count Data
6 pages
Models For Count Data With Overdispersion: 1 Extra-Poisson Variation
No ratings yet
Models For Count Data With Overdispersion: 1 Extra-Poisson Variation
7 pages
Exercise On Correlation and Regression
No ratings yet
Exercise On Correlation and Regression
1 page
Binomial Distribution: ,.... 2, 1, 0 Where) 1 (
No ratings yet
Binomial Distribution: ,.... 2, 1, 0 Where) 1 (
15 pages
Regression 101
No ratings yet
Regression 101
18 pages
Actsc 432 Review Part 1
No ratings yet
Actsc 432 Review Part 1
7 pages
Overdispersion Models and Estimation
No ratings yet
Overdispersion Models and Estimation
20 pages
Bayesian Statistics: 5.3 Poisson Model For Count Data
No ratings yet
Bayesian Statistics: 5.3 Poisson Model For Count Data
6 pages
4basic Econometrics Chapter III
No ratings yet
4basic Econometrics Chapter III
13 pages
Poisson Regression - Stata Data Analysis Examples
No ratings yet
Poisson Regression - Stata Data Analysis Examples
12 pages
MLE Dan Bayesian Estimation From Walpole Book
No ratings yet
MLE Dan Bayesian Estimation From Walpole Book
13 pages
RP HPTLC Method For Determination of Voriconazole in - 2017 - Arabian Journal o
No ratings yet
RP HPTLC Method For Determination of Voriconazole in - 2017 - Arabian Journal o
6 pages
Dayananda Sagar University: A Mini Project Report ON
No ratings yet
Dayananda Sagar University: A Mini Project Report ON
17 pages
ACTL30004 Assignment
No ratings yet
ACTL30004 Assignment
15 pages
Log Explanation
No ratings yet
Log Explanation
3 pages
Poisson Regression
No ratings yet
Poisson Regression
12 pages
Assign 1
No ratings yet
Assign 1
5 pages