Review of Logistic and Poisson Regression Models
Review of Logistic and Poisson Regression Models
LECTURE 16
Generalized linear models are a class of regression models; they include the
standard linear regression model but also many other important models:
2
Motivating Example
Motivating Example
4
Review of Generalized Linear Models for a Single
Response
Recall that the population intercept (for X1 = 1), β1, has interpretation as
the mean value of the response when all of the covariates take on the value
zero.
6
Review: Logistic Regression
Then the mean of the binary response variable, denoted π, is the proportion
of successes or the probability that the response takes on the value 1.
That is,
π = E(Y ) = Pr(Y = 1) = Pr(“success”)
Var(Y ) = π (1 − π)
8
Instead, we can consider a logistic regression model where
10
Under the assumption that the binary responses are Bernoulli random
variables, we can use ML estimation to obtain estimates of the logistic
regression parameters.
π
Odds = ;
1−π
Odds
π= .
1 + Odds
11
the population intercept, β1, has interpretation as the log odds of success
when all of the covariates take on the value zero.
When one of the covariates is dichotomous, say X2, then β2 has a special
interpretation:
exp (β2) is the odds ratio or ratio of odds of success for the two possible
levels of X2 (given that all of the other covariates remain constant).
12
Keep in mind that as:
π increases
Similarly, as:
π decreases
13
14
For the 223 infants in the sample, the estimated logistic regression (obtained
using ML) is
π / (1 − π
ln [ )] = 4.0343 − 0.0042 Weight
For example, the odds of BPD for an infant weighing 1200 grams is
= 0.3658
15
16
Review: Poisson Regression
That is, the count or absolute number of events is often not satisfactory
because any comparison depends almost entirely on the sizes of the groups
(or the “time at risk”) that generated the observations.
17
Note that λ is the expected count or number of events and the expected
rate is given by λ/t, where t is a relevant baseline measure (e.g., t might
be the number of persons or the number of person-years of observation).
18
2. ln(λ/t) = β1X1 + β2X2 + . . . + βpXp
Note that since ln(λ/t) = ln(λ) − ln(t), the Poisson regression model can
also be considered as
19
Note that the relationship between λ (or λ/t) and the covariates is non-
linear.
20
Given the Poisson regression model
the population intercept, β1, has interpretation as the log expected rate
when all the covariates take on the value zero.
When one of the covariates is dichotomous, say X2, then β2 has a special
interpretation:
exp (β2) is the (incidence) rate ratio for the two possible levels of X2 (given
that all of the other covariates remain constant).
21
The study observed 3154 men aged 40-50 for an average of 8 years and
recorded incidence of cases of CHD.
or
ln (λ) = ln(t) + β1 + β2 Smoke
22
Person - Blood
Years Smoking Pressure Behavior CHD
5268.2 0 0 0 20
2542.0 10 0 0 16
1140.7 20 0 0 13
614.6 30 0 0 3
4451.1 0 0 1 41
2243.5 10 0 1 24
1153.6 20 0 1 27
925.0 30 0 1 17
1366.8 0 1 0 8
497.0 10 1 0 9
238.1 20 1 0 3
146.3 30 1 0 7
1251.9 0 1 1 29
640.0 10 1 1 21
374.5 20 1 1 7
338.2 30 1 1 12
23
In this model the ML estimate of β2 is 0.0318. That is, the rate of CHD
increases by a factor of exp(0.0318) = 1.032 for every cigarette smoked.
Alternatively, the rate of CHD in smokers of one pack per day (20 cigs)
is estimated to be (1.032)20 = 1.88 times higher than the rate of CHD in
non-smokers.
24
Now, adjusted rate of CHD (controlling for BP and behavior type) increases
by a factor of exp(0.027) = 1.028 for every cigarette smoked.
Adjusted rate of CHD in smokers of one pack per day (20 cigs) is estimated
to be (1.027)20 = 1.7 times higher than rate of CHD in non-smokers.
In that case, results from a Poisson and logistic regression will not give
discernibly different results.
25
Overdispersion
Count data (or counts of number of successes) often have variability that
far exceeds that predicted by Poisson (or binomial) distribution.
26
Example: Clinical Trial of Antibiotics for Leprosy
27
Table 1: Mean count of leprosy bacilli at six sites of the body (and variance)
post-treatment.
(21.6)
(37.9)
(51.1)
28
Consider outcome (post-treatment) at end of study.
Var(Yi) = φ μi,
29