100% found this document useful (1 vote)
15 views

Lecture Notes 5

CD5

Uploaded by

kenenisa Abdisa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
15 views

Lecture Notes 5

CD5

Uploaded by

kenenisa Abdisa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

' $

ST3241 Categorical Data Analysis I


Generalized Linear Models

Introduction and Some Examples

& %
1
' $

Introduction
• We have discussed methods for analyzing associations in
two-way and three-way tables.
• Now we will use models as the basis of such analysis.
• Models can handle more complicated situations than discussed
so far.
• We can also estimate the parameters, which describe the effects
in a more informative way.

& %
2
' $

Example: Challenger O-ring


• For the 23 space shuttle flights that occurred before the
Challenger mission disaster in 1986, the following table shows
the temperature at the time of flight and whether at least one
primary O-ring suffered thermal distress.

& %
3
' $

The Data

Ft Temp TD Ft Temp TD Ft Temp TD

1 66 0 9 57 1 17 70 0
2 70 1 10 63 1 18 81 0
3 69 0 11 70 1 19 76 0
4 68 0 12 78 0 20 79 0
5 67 0 13 67 0 21 75 1
6 72 0 14 53 1 22 76 0
7 73 0 15 67 0 23 58 1
8 70 0 16 75 0

• Is there any association between Temperature and thermal


distress?
& %
4
' $
Fit From Linear Regression

& %
5
' $
Fit From Logistic Regression

& %
6
' $

Example: Horseshoe Crabs


• Each female horseshoe crab in the crab in the study had a male
crab attached to her in her nest.
• The study investigated factors that affect whether the female
crab had any other males, called satellites, residing nearby her.
• Explanatory variables included the female crab’s color, spine
condition, weight, and carapace width.
• The response outcome for each female crab is her number of
satellites.

& %
7
' $

Example Continued
• We consider the width alone as a predictor.
• To obtain a clearer picture, we grouped the female crabs into a
set of width categories
• ≤ 23.25, 23.25-24.25, 24.25-25.25, 25.25-26.25, 26.25-27.25,
27.25-28.25,28.25-29.25, >29.25
• Calculated sample mean number of satellites for female crabs
in each category.

& %
8
' $

& %
9
' $

& %
10
' $

Components of A GLM
• Random component
– Identifies the response variable Y and assumes a probability
distribution for it
• Systematic component
– Specifies the explanatory variables used as predictors in the
model
• Link
– Describes the functional relation between the systematic
component and expected value of the random component

& %
11
' $

Random Component
• Let Y1 , · · · , YN denote the N observations on the response
variables Y .
• The random component specifies a probability distribution for
Y1 , · · · , YN .
• If the potential outcome for each observation Yi are binary
such as ”success” or ”f ailure”; or, more generally, each Yi
might be number of ”successes” out of a certain fixed number
of trials, we can assume a binomial distribution for the random
component.
• If each response observation is a non-negative count, such as
cell count in a contingency table, then we may assume a
Poisson distribution for the random component.

& %
12
' $

Systematic Component
• The systematic component specifies the explanatory variables.
• It specifies the variables that play the roles of xj in the formula
α + β1 x1 + · · · + βk xk .
• This linear combination of explanatory variables is called the
linear predictors.
• Some xj may be based on others in the model; for instance,
perhaps x3 = x1 x2 , to allow interaction between x1 and x2 in
their effects on Y , or perhaps x3 = x21 to allow a curvilinear
effect of x1 .

& %
13
' $

Link
• It specifies how µ = E(Y ) relates to explanatory variables in
the linear predictor.
• The model formula states that

g(µ) = α + β1 x1 + · · · + βk xk

The function g(.) is called the link f unction.

& %
14
' $

Some Popular Link Functions


• Identity Link

g(µ) = µ = α + β1 x1 + · · · + βk xk

• Log link

g(µ) = log(µ) = α + β1 x1 + · · · + βk xk

• Logit
µ
g(µ) = log[ ] = α + β1 x1 + · · · + βk xk
1−µ

& %
15
' $

More On Link Functions · · ·


• Each potential probability distribution has one special function
of the mean that is called its natural parameter.
• For the normal distribution, it is mean itself.
• For the Poisson, the natural parameter is the log of the mean.
• For the Binomial, the natural parameter is the logit of the
success probability.
• The link function that uses the natural parameter as g(µ) in
the GLM is called the canonical link.
• Though other links are possible, in practice the canonical links
are most common

& %
16
' $

GLM For Binary Data: Random Component


• The distribution of a binary response is specified by
probabilities P (Y = 1) = π of success and P (Y = 0) = 1 − π of
failure.
• For n independent observations on a binary response with
parameter π, the number of successes has the binomial
distribution specified by parameters n and π.

& %
17
' $

Linear Probability Model


• To model the effect of X, use ordinary linear regression, by
which the expected value of Y is a linear function of X.
• The model
π(x) = α + βx
is called a linear probability model.
• Probabilities fall between 0 and 1 but for large of small values
of x, the model may predict π(x) < 0 or π(x) > 1.
• This model is valid only for a finite range of x values

& %
18
' $

Example: Snoring

Heart Disease

Snoring Yes No Proportion Yes Linear Fit

Never 24 1355 0.017 0.017


Occasional 35 603 0.055 0.057
Nearly Every Night 21 192 0.099 0.096
Every Night 30 224 0.118 0.116

& %
19
' $

Example:
• We use (0,2,4,5) for their snoring categories, treating the last
two levels closer.
• Linear Fit using maximizing likelihood:

π(x) = 0.0172 + 0.0198x

• The least squares fit is slightly different.

& %
20
' $
Logistic Regression Model
• Relationship between π(x) and x are usually nonlinear rather
than linear. The most important function describing this
nonlinear relationship has the form
π(x)
log( ) = α + βx
1 − π(x)
• That is,
ex 1
π(x) = F0 (α + βx), where F0 (x) = x
= −x
,
1+e 1+e
where F0 (x) is the cdf of the logistic distribution. Its pdf is
F0 (x)(1 − F0 (x)).
• The associated GLM is called the logistic regression f unction.
• Logistic regression models are often referred as logit models as
the link in this GLM is the logit link.
& %
21
' $

Parameters
• The parameter β determines the rate of increase or decrease of
the curve.
• When β > 0, π(x) increases with x.
• When β < 0, π(x) decreases as x increases.
• The magnitude of β determines how fast the curve increases or
decreases.
• As |β| increases, the curve has a steeper rate of change.

& %
22
' $

Example: Snoring

Heart Disease

Proportion Linear Logit


Snoring Yes No
Yes Fit Fit

Never 24 1355 0.017 0.017 0.021


Occasional 35 603 0.055 0.057 0.044
Nearly Every Night 21 192 0.099 0.096 0.093
Every Night 30 224 0.118 0.116 0.132

& %
23
' $

Effect of Parameters

& %
24
' $

Alternative Binary Links


• For logistic regression curves, the probability of a success
increases or decreases continuously as x increases.
• Let X denote a random variable, the cumulative distribution
function (cdf) F (x) is defined as

F (x) = P (X ≤ x), −∞ < x < ∞

• Such a function, plotted as a function of x, has appearance like


that of the logistic function in the previous figures.
• It suggests a class of models for binary responses of the form

π(x) = F (α + βx)

where F is a cdf for some distribution.

& %
25
' $

Alternative Binary Links


• The logistic regression curve has this form.
• When β > 0, π(x) = F (α + βx) has the shape of the cdf of the
two-parameter logistic distribution.
• When β < 0, 1 − π(x) = 1 − F (α + βx) has the shape of the cdf
of the two-parameter logistic distribution.
• Each choice of α and β > 0 corresponds to a different logistic
distribution.
• The logistic cdf F0 (x) corresponds to a probability distribution
F0 (x)(1 − F0 (x)) with a symmetric, bell shape and very similar
looking to a normal distribution.

& %
26
' $

Probit Models
• The probability of success, π(x), has the form Φ(α + βx) where
Φ is the cdf of a standard normal distribution N (0, 1).
• The link function is known as probit link: g(π) = Φ−1 (π).
• The probit transf orm maps π(x) so that the regression curve
for π(x) (or 1 − π(x), when β < 0) has the appearance of the
normal cdf with mean µ = −α/β and standard deviation
σ = 1/|β|.

& %
27
' $

Example: Snoring

Heart Disease

Proportion Linear Logit Probit


Snoring Yes No
Yes Fit Fit Fit

Never 24 1355 0.017 0.017 0.021 0.020


Occasional 35 603 0.055 0.057 0.044 0.046
Nearly Every Night 21 192 0.099 0.096 0.093 0.095
Every Night 30 224 0.118 0.116 0.132 0.131

& %
28
' $

& %
29
' $

GLM for Count Data


• Many discrete response variables have counts as possible
outcomes.
– For a sample of cities worldwide, each observation might be
the number of automobile thefts in 2003.
– For a sample of silicon wafers used in computer chips, each
observation might be the number of imperfections on a
wafer.
• We have earlier seen the Poisson distribution as a sampling
model for counts.

& %
30
' $

Poisson Regression
• Assume a Poisson distribution for the random component.
• One can model the Poisson mean using the identity link.
• But more common to model the log of the mean.
• A Poisson loglinear model is a GLM that assumes a Poisson
distribution for Y and uses the log − link.

& %
31
' $

Poisson Regression - Continued


• Let µ denote the expected value of Y and let X denote an
explanatory variable.
• Then the Poisson log-linear model has the form

log(µ) = α + βx

• For this model: µ = exp(α + βx) = eα (eβ )x

& %
32
' $
Example: Horseshoe Data

& %
33
' $

Poisson Regression For Rate Data


• It is often relevant to certain types of events which occur over
time, space, or some other index of size, to model the rate at
which events occur.
• In modeling numbers of auto thefts in 2003 for a sample of
cities, we could form a rate for each city by dividing the
number of thefts by the city’s population size.
• The model describes how the rate depends on some other
explanatory variables.

& %
34
' $

Poisson Regression For Rate Data


• When a response count Yi has index (such as population size)
equal to ti , the sample rate of outcomes is Yi /ti .
• The expected value of the rate is µi /ti .
• A log-linear model for the expected rate has form:

log(µi /ti ) = α + βxi

• This has an equivalent representation:

log µi − log ti = α + βxi

• The adjustment term, − log ti , to the above equation is called


an of f set.

& %
35
' $

Exponential Family
• The random variable Y has a distribution in the exponential
family, if its p.d.f (or p.m.f .) can be written as

f (y; θ, ϕ) = exp{(yθ − b(θ))/a(ϕ) + c(y, ϕ)}

for some specific function a(ϕ), b(θ) and c(y, ϕ).


• The parameter θ is called the natural parameter and ϕ is
called the dispersion (or scale) parameter.

& %
36
' $

Examples: Normal Distribution


• The p.d.f . of N (µ, σ 2 )
1
f (y; θ, ϕ) = √ exp{−(y − µ)2 /(2σ 2 )}
2πσ 2
= exp{(yµ − µ2 /2)/σ 2 − (y 2 /σ 2 + log(2πσ 2 ))/2}

• Here, θ = µ, ϕ = σ 2 , and a(ϕ) = ϕ, b(θ) = θ2 /2 and


c(y, ϕ) = −{y 2 /σ 2 + log(2πσ 2 )}/2
• The canonical link: g(µ) = µ.

& %
37
' $

Examples: Binomial Distribution


• The p.d.f . of Bernoulli(π):
π y
f (y; θ, ϕ) = π y (1 − π)1−y = (1 − π)( )
1−π
π 1
= exp{y log( ) − log( )}
1−π 1−π

• Here θ = log( 1−π


π
),ϕ = 1 ,b(θ) = log(1 + eθ )
a(ϕ) = 1, c(y, ϕ) = 0
• The canonical link: g(π) = log( 1−π
π
).

& %
38
' $

Examples: Poisson Distribution


• The p.d.f. of Poisson(λ):
y
−λ λ
f (y; θ, ϕ) = e
y!
= exp{(y log λ − λ) − log y!}

• Here θ = log λ,ϕ = 1, b(θ) = eθ ,


a(ϕ) = 1, c(y, ϕ) = − log y!
• The canonical link: g(λ) = log(λ).

& %
39
' $

Log-Likelihood Functions
• The log-likelihood function

l(θ, ϕ; y) = log f (y; θ, ϕ) = (yθ − b(θ))/a(ϕ) + c(y, ϕ)

• We use general likelihood results, applicable to exponential


∂2l
∂l
families E( ∂θ ) = 0 and E( ∂θ ) = −E( ∂θ
∂l 2
2)

• Here
′ ∂2l
∂l
∂θ = (y − b (θ))/a(ϕ) and ∂θ 2 = −b′′ (θ)/a(ϕ)

& %
40
' $

Mean and Variances


• Now, we have 0 = E( ∂θ
∂l
) = {E(y) − b′ (θ)}/a(ϕ)
• So that, E(Y ) = b′ (θ).
• Similarly,
var(Y ) b′′ (θ)
=
a2 (ϕ) a(ϕ)
• So that, var(Y ) = b′′ (θ)a(ϕ).

& %
41
' $

Examples
• Normal Distribution: b(θ) = θ2 /2
– E(Y ) = b′ (θ) = θ = µ
– var(Y ) = b′′ (θ)a(ϕ) = ϕ = σ 2 .
• Bernoulli Distribution: b(θ) = log(1 + eθ )
– E(Y ) = b′ (θ) = eθ /(1 + eθ ) = π.
– var(Y ) = b′′ (θ) = eθ /(1 + eθ )2 = π(1 − π)
• Poisson Distribution: b(θ) = exp(θ)
– E(Y ) = b′ (θ) = exp(θ) = λ.
– var(Y ) = b′′ (θ) = exp(θ) = λ.

& %
42
' $

Likelihood Equations in GLMs


• Let (y1 , · · · , yN ) denote responses for N independent
observations.
• Let (xi1 , · · · , xip ) denote values of p explanatory variables for
observation i.
• The systematic component for the i-th observation

p
ηi = βj xij
j=1

• If E(yi ) = µi , then the link for the i-th observation is:

ηi = g(µi )

& %
43
' $

Likelihood Equations in GLMs


• For N independent observations, the log-likelihood function is:

N ∑
N ∑
N
yi θi −b(θi ) ∑
N
L(β)= Li = log f (yi ; θ, ϕ) = a(ϕ) + c(yi , ϕ)
i=1 i=1 i=1 i=1

• The notation L(β) reflects the dependence of θ on the model


parameters β.

& %
44
' $

Likelihood Equations in GLMs


• The likelihood equations are:

∂L(β) ∑ ∂Li
N
= = 0,
∂βj i=1
∂βj

for j = 1, · · · , p.
• Simplifying, we have
∑N
(yi − µi )xij ∂µi
=0
i=1
var(yi ) ∂ηi

for j = 1, · · · , p. Notice that ∂µi


∂ηi = 1/g ′ (µi ).

& %
45
' $

Examples
• Logit Model:

p


N exp( βj xij )
(yi − πi )xij = 0, where πi =
j=1

p
i=1 1+exp( βj xij )
j=1

• Probit Model:

N
(yi −πi )xij ∑
p ∑
p
πi (1−πi ) ϕ( βk xik ) = 0, where πi = Φ( βj xij )
i=1 k=1 j=1

• Log-Linear Model:

N ∑
p
(yi − exp( βk xik ))xij = 0
i=1 k=1

& %
46
' $

Maximum Likelihood Estimates


• ML estimates of βj ’s are obtained by solving the likelihood
equations using numerical methods.
• The ML estimates β̂j ’s are approximately normally distributed.
• Thus, a confidence interval for a model parameter βj equals

β̂j ± zα/2 ASE

where ASE is the asymptotic standard error of β̂j .

& %
47
' $

Testing For Significance


• To test: H0 : βj = 0.
• The test statistic, Z = β̂j /ASE has an approximate standard
normal distribution, when H0 is true.
• Equivalently, Z 2 has a chi-squared distribution with d.f. = 1,
which can be used for two-sided alternatives
• This type of test is known as Wald’s test.

& %
48
' $

Likelihood Ratio Test


• The likelihood-ratio test statistic equals

−2 log(L0 /L1 ) = −2[log L0 − log L1 ] = −2[l0 − l1 ]

where L0 and L1 are the maximized likelihood functions under


the null hypothesis and under the full model, respectively.
• Under H0 , this test statistic also has a large sample chi-squared
distribution with d.f. = 1.

& %
49
' $

Score Test
• The score statistic or efficient score statistic uses the size of the
derivative of the log-likelihood function evaluated at βj = 0.
• The score statistic is the square of the ratio of this derivative to
its ASE.
• It also has an approximate chi-squared distribution.

& %
50
' $

Example
• In simple logistic regression with one explanatory variable, the
log-likelihood function is:

N
l(α, β) = {yi (α + βxi ) − log(1 + exp(α + βxi ))}
i=1

• Therefore for H0 : β = 0 and H1 : β ̸= 0, we have



N
l0 = {yi log ȳ
1−ȳ − log 1−ȳ
1
},
i=1
∑N
l1 = {yi (α̂ + β̂xi ) − log(1 + exp(α̂ + β̂xi ))}
i=1

& %
51
' $

Model Residuals
• For i-th observation, the raw residual is:

ri = yi − µ̂i = Observed − fitted,

where yi is the observed response and µ̂i is the fitted value


from the model.
• The Pearson residual is defined as
Pearson residual= √
Oberved-fitted
= √yi −µ̂i .
var(observed)
ˆ var(y
ˆ i)

• For Poisson GLMs, it simplifies to


i −µ̂i
ei = y√ µ̂
i

& %
52
' $

Adjusted Residuals
• The Pearson residuals divided by its estimated standard error
are called adjusted residuals.
• Adjusted residuals have an approximate standard normal
distribution.
• For Poisson GLMs, the general form of the adjusted residual is:
(y − µ̂i ) ei
√ i =√
µ̂i (1 − hi ) 1 − hi
where hi is called the leverage of observation i.

& %
53

You might also like