100% found this document useful (1 vote)

23 views

Lecture Notes 5

CD5

Uploaded by

kenenisa Abdisa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

23 views

Lecture Notes 5

CD5

Uploaded by

kenenisa Abdisa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

' $

ST3241 Categorical Data Analysis I

Generalized Linear Models

Introduction and Some Examples

& %
1
' $

Introduction
• We have discussed methods for analyzing associations in
two-way and three-way tables.
• Now we will use models as the basis of such analysis.
• Models can handle more complicated situations than discussed
so far.
• We can also estimate the parameters, which describe the eﬀects
in a more informative way.

& %
2
' $

Example: Challenger O-ring

• For the 23 space shuttle flights that occurred before the
Challenger mission disaster in 1986, the following table shows
the temperature at the time of flight and whether at least one
primary O-ring suffered thermal distress.

& %
3
' $

The Data

Ft Temp TD Ft Temp TD Ft Temp TD

1 66 0 9 57 1 17 70 0
2 70 1 10 63 1 18 81 0
3 69 0 11 70 1 19 76 0
4 68 0 12 78 0 20 79 0
5 67 0 13 67 0 21 75 1
6 72 0 14 53 1 22 76 0
7 73 0 15 67 0 23 58 1
8 70 0 16 75 0

• Is there any association between Temperature and thermal

distress?
& %
4
' $
Fit From Linear Regression

& %
5
' $
Fit From Logistic Regression

& %
6
' $

Example: Horseshoe Crabs

• Each female horseshoe crab in the crab in the study had a male
crab attached to her in her nest.
• The study investigated factors that aﬀect whether the female
crab had any other males, called satellites, residing nearby her.
• Explanatory variables included the female crab’s color, spine
condition, weight, and carapace width.
• The response outcome for each female crab is her number of
satellites.

& %
7
' $

Example Continued
• We consider the width alone as a predictor.
• To obtain a clearer picture, we grouped the female crabs into a
set of width categories
• ≤ 23.25, 23.25-24.25, 24.25-25.25, 25.25-26.25, 26.25-27.25,
27.25-28.25,28.25-29.25, >29.25
• Calculated sample mean number of satellites for female crabs
in each category.

& %
8
' $

& %
9
' $

& %
10
' $

Components of A GLM
• Random component
– Identiﬁes the response variable Y and assumes a probability
distribution for it
• Systematic component
– Speciﬁes the explanatory variables used as predictors in the
model
• Link
– Describes the functional relation between the systematic
component and expected value of the random component

& %
11
' $

Random Component
• Let Y1 , · · · , YN denote the N observations on the response
variables Y .
• The random component speciﬁes a probability distribution for
Y1 , · · · , YN .
• If the potential outcome for each observation Yi are binary
such as ”success” or ”f ailure”; or, more generally, each Yi
might be number of ”successes” out of a certain ﬁxed number
of trials, we can assume a binomial distribution for the random
component.
• If each response observation is a non-negative count, such as
cell count in a contingency table, then we may assume a
Poisson distribution for the random component.

& %
12
' $

Systematic Component
• The systematic component specifies the explanatory variables.
• It specifies the variables that play the roles of xj in the formula
α + β1 x1 + · · · + βk xk .
• This linear combination of explanatory variables is called the
linear predictors.
• Some xj may be based on others in the model; for instance,
perhaps x3 = x1 x2 , to allow interaction between x1 and x2 in
their effects on Y , or perhaps x3 = x21 to allow a curvilinear
effect of x1 .

& %
13
' $

Link
• It speciﬁes how µ = E(Y ) relates to explanatory variables in
the linear predictor.
• The model formula states that

g(µ) = α + β1 x1 + · · · + βk xk

The function g(.) is called the link f unction.

& %
14
' $

Some Popular Link Functions

• Identity Link

g(µ) = µ = α + β1 x1 + · · · + βk xk

• Log link

g(µ) = log(µ) = α + β1 x1 + · · · + βk xk

• Logit
µ
g(µ) = log[ ] = α + β1 x1 + · · · + βk xk
1−µ

& %
15
' $

GLM For Binary Data: Random Component

• The distribution of a binary response is speciﬁed by
probabilities P (Y = 1) = π of success and P (Y = 0) = 1 − π of
failure.
• For n independent observations on a binary response with
parameter π, the number of successes has the binomial
distribution speciﬁed by parameters n and π.

& %
17
' $

Linear Probability Model

• To model the eﬀect of X, use ordinary linear regression, by
which the expected value of Y is a linear function of X.
• The model
π(x) = α + βx
is called a linear probability model.
• Probabilities fall between 0 and 1 but for large of small values
of x, the model may predict π(x) < 0 or π(x) > 1.
• This model is valid only for a ﬁnite range of x values

& %
18
' $

Example: Snoring

Heart Disease

Snoring Yes No Proportion Yes Linear Fit

Never 24 1355 0.017 0.017

Occasional 35 603 0.055 0.057
Nearly Every Night 21 192 0.099 0.096
Every Night 30 224 0.118 0.116

& %
19
' $

Example:
• We use (0,2,4,5) for their snoring categories, treating the last
two levels closer.
• Linear Fit using maximizing likelihood:

π(x) = 0.0172 + 0.0198x

• The least squares ﬁt is slightly diﬀerent.

& %
20
' $
Logistic Regression Model
• Relationship between π(x) and x are usually nonlinear rather
than linear. The most important function describing this
nonlinear relationship has the form
π(x)
log( ) = α + βx
1 − π(x)
• That is,
ex 1
π(x) = F0 (α + βx), where F0 (x) = x
= −x
,
1+e 1+e
where F0 (x) is the cdf of the logistic distribution. Its pdf is
F0 (x)(1 − F0 (x)).
• The associated GLM is called the logistic regression f unction.
• Logistic regression models are often referred as logit models as
the link in this GLM is the logit link.
& %
21
' $

Parameters
• The parameter β determines the rate of increase or decrease of
the curve.
• When β > 0, π(x) increases with x.
• When β < 0, π(x) decreases as x increases.
• The magnitude of β determines how fast the curve increases or
decreases.
• As |β| increases, the curve has a steeper rate of change.

& %
22
' $

Example: Snoring

Heart Disease

Proportion Linear Logit

Snoring Yes No
Yes Fit Fit

Never 24 1355 0.017 0.017 0.021

Occasional 35 603 0.055 0.057 0.044
Nearly Every Night 21 192 0.099 0.096 0.093
Every Night 30 224 0.118 0.116 0.132

& %
23
' $

Eﬀect of Parameters

& %
24
' $

Alternative Binary Links

• For logistic regression curves, the probability of a success
increases or decreases continuously as x increases.
• Let X denote a random variable, the cumulative distribution
function (cdf) F (x) is deﬁned as

F (x) = P (X ≤ x), −∞ < x < ∞

• Such a function, plotted as a function of x, has appearance like

that of the logistic function in the previous ﬁgures.
• It suggests a class of models for binary responses of the form

π(x) = F (α + βx)

where F is a cdf for some distribution.

& %
25
' $

Alternative Binary Links

• The logistic regression curve has this form.
• When β > 0, π(x) = F (α + βx) has the shape of the cdf of the
two-parameter logistic distribution.
• When β < 0, 1 − π(x) = 1 − F (α + βx) has the shape of the cdf
of the two-parameter logistic distribution.
• Each choice of α and β > 0 corresponds to a diﬀerent logistic
distribution.
• The logistic cdf F0 (x) corresponds to a probability distribution
F0 (x)(1 − F0 (x)) with a symmetric, bell shape and very similar
looking to a normal distribution.

& %
26
' $

Probit Models
• The probability of success, π(x), has the form Φ(α + βx) where
Φ is the cdf of a standard normal distribution N (0, 1).
• The link function is known as probit link: g(π) = Φ−1 (π).
• The probit transf orm maps π(x) so that the regression curve
for π(x) (or 1 − π(x), when β < 0) has the appearance of the
normal cdf with mean µ = −α/β and standard deviation
σ = 1/|β|.

& %
27
' $

Example: Snoring

Heart Disease

Proportion Linear Logit Probit

Snoring Yes No
Yes Fit Fit Fit

Never 24 1355 0.017 0.017 0.021 0.020

Occasional 35 603 0.055 0.057 0.044 0.046
Nearly Every Night 21 192 0.099 0.096 0.093 0.095
Every Night 30 224 0.118 0.116 0.132 0.131

& %
28
' $

& %
29
' $

GLM for Count Data

• Many discrete response variables have counts as possible
outcomes.
– For a sample of cities worldwide, each observation might be
the number of automobile thefts in 2003.
– For a sample of silicon wafers used in computer chips, each
observation might be the number of imperfections on a
wafer.
• We have earlier seen the Poisson distribution as a sampling
model for counts.

& %
30
' $

Poisson Regression
• Assume a Poisson distribution for the random component.
• One can model the Poisson mean using the identity link.
• But more common to model the log of the mean.
• A Poisson loglinear model is a GLM that assumes a Poisson
distribution for Y and uses the log − link.

& %
31
' $

Poisson Regression - Continued

• Let µ denote the expected value of Y and let X denote an
explanatory variable.
• Then the Poisson log-linear model has the form

log(µ) = α + βx

• For this model: µ = exp(α + βx) = eα (eβ )x

& %
32
' $
Example: Horseshoe Data

& %
33
' $

Poisson Regression For Rate Data

• It is often relevant to certain types of events which occur over
time, space, or some other index of size, to model the rate at
which events occur.
• In modeling numbers of auto thefts in 2003 for a sample of
cities, we could form a rate for each city by dividing the
number of thefts by the city’s population size.
• The model describes how the rate depends on some other
explanatory variables.

& %
34
' $

Poisson Regression For Rate Data

• When a response count Yi has index (such as population size)
equal to ti , the sample rate of outcomes is Yi /ti .
• The expected value of the rate is µi /ti .
• A log-linear model for the expected rate has form:

log(µi /ti ) = α + βxi

• This has an equivalent representation:

log µi − log ti = α + βxi

• The adjustment term, − log ti , to the above equation is called

an of f set.

& %
35
' $

Exponential Family
• The random variable Y has a distribution in the exponential
family, if its p.d.f (or p.m.f .) can be written as

f (y; θ, ϕ) = exp{(yθ − b(θ))/a(ϕ) + c(y, ϕ)}

for some speciﬁc function a(ϕ), b(θ) and c(y, ϕ).

• The parameter θ is called the natural parameter and ϕ is
called the dispersion (or scale) parameter.

& %
36
' $

Examples: Normal Distribution

• The p.d.f . of N (µ, σ 2 )
1
f (y; θ, ϕ) = √ exp{−(y − µ)2 /(2σ 2 )}
2πσ 2
= exp{(yµ − µ2 /2)/σ 2 − (y 2 /σ 2 + log(2πσ 2 ))/2}

• Here, θ = µ, ϕ = σ 2 , and a(ϕ) = ϕ, b(θ) = θ2 /2 and

c(y, ϕ) = −{y 2 /σ 2 + log(2πσ 2 )}/2
• The canonical link: g(µ) = µ.

& %
37
' $

Examples: Binomial Distribution

• The p.d.f . of Bernoulli(π):
π y
f (y; θ, ϕ) = π y (1 − π)1−y = (1 − π)( )
1−π
π 1
= exp{y log( ) − log( )}
1−π 1−π

• Here θ = log( 1−π

π
),ϕ = 1 ,b(θ) = log(1 + eθ )
a(ϕ) = 1, c(y, ϕ) = 0
• The canonical link: g(π) = log( 1−π
π
).

& %
38
' $

Examples: Poisson Distribution

• The p.d.f. of Poisson(λ):
y
−λ λ
f (y; θ, ϕ) = e
y!
= exp{(y log λ − λ) − log y!}

• Here θ = log λ,ϕ = 1, b(θ) = eθ ,

a(ϕ) = 1, c(y, ϕ) = − log y!
• The canonical link: g(λ) = log(λ).

& %
39
' $

Log-Likelihood Functions
• The log-likelihood function

l(θ, ϕ; y) = log f (y; θ, ϕ) = (yθ − b(θ))/a(ϕ) + c(y, ϕ)

• We use general likelihood results, applicable to exponential

∂2l
∂l
families E( ∂θ ) = 0 and E( ∂θ ) = −E( ∂θ
∂l 2
2)

• Here
′ ∂2l
∂l
∂θ = (y − b (θ))/a(ϕ) and ∂θ 2 = −b′′ (θ)/a(ϕ)

& %
40
' $

Mean and Variances

• Now, we have 0 = E( ∂θ
∂l
) = {E(y) − b′ (θ)}/a(ϕ)
• So that, E(Y ) = b′ (θ).
• Similarly,
var(Y ) b′′ (θ)
=
a2 (ϕ) a(ϕ)
• So that, var(Y ) = b′′ (θ)a(ϕ).

& %
41
' $

Examples
• Normal Distribution: b(θ) = θ2 /2
– E(Y ) = b′ (θ) = θ = µ
– var(Y ) = b′′ (θ)a(ϕ) = ϕ = σ 2 .
• Bernoulli Distribution: b(θ) = log(1 + eθ )
– E(Y ) = b′ (θ) = eθ /(1 + eθ ) = π.
– var(Y ) = b′′ (θ) = eθ /(1 + eθ )2 = π(1 − π)
• Poisson Distribution: b(θ) = exp(θ)
– E(Y ) = b′ (θ) = exp(θ) = λ.
– var(Y ) = b′′ (θ) = exp(θ) = λ.

& %
42
' $

Likelihood Equations in GLMs

• Let (y1 , · · · , yN ) denote responses for N independent
observations.
• Let (xi1 , · · · , xip ) denote values of p explanatory variables for
observation i.
• The systematic component for the i-th observation
∑
p
ηi = βj xij
j=1

• If E(yi ) = µi , then the link for the i-th observation is:

ηi = g(µi )

& %
43
' $

Likelihood Equations in GLMs

• For N independent observations, the log-likelihood function is:
∑
N ∑
N ∑
N
yi θi −b(θi ) ∑
N
L(β)= Li = log f (yi ; θ, ϕ) = a(ϕ) + c(yi , ϕ)
i=1 i=1 i=1 i=1

• The notation L(β) reﬂects the dependence of θ on the model

parameters β.

& %
44
' $

Likelihood Equations in GLMs

• The likelihood equations are:

∂L(β) ∑ ∂Li
N
= = 0,
∂βj i=1
∂βj

for j = 1, · · · , p.
• Simplifying, we have
∑N
(yi − µi )xij ∂µi
=0
i=1
var(yi ) ∂ηi

for j = 1, · · · , p. Notice that ∂µi

∂ηi = 1/g ′ (µi ).

& %
45
' $

Examples
• Logit Model:
∑
p

∑
N exp( βj xij )
(yi − πi )xij = 0, where πi =
j=1
∑
p
i=1 1+exp( βj xij )
j=1

• Probit Model:
∑
N
(yi −πi )xij ∑
p ∑
p
πi (1−πi ) ϕ( βk xik ) = 0, where πi = Φ( βj xij )
i=1 k=1 j=1

• Log-Linear Model:
∑
N ∑
p
(yi − exp( βk xik ))xij = 0
i=1 k=1

& %
46
' $

Maximum Likelihood Estimates

• ML estimates of βj ’s are obtained by solving the likelihood
equations using numerical methods.
• The ML estimates β̂j ’s are approximately normally distributed.
• Thus, a conﬁdence interval for a model parameter βj equals

β̂j ± zα/2 ASE

where ASE is the asymptotic standard error of β̂j .

& %
47
' $

Testing For Signiﬁcance

• To test: H0 : βj = 0.
• The test statistic, Z = β̂j /ASE has an approximate standard
normal distribution, when H0 is true.
• Equivalently, Z 2 has a chi-squared distribution with d.f. = 1,
which can be used for two-sided alternatives
• This type of test is known as Wald’s test.

& %
48
' $

Likelihood Ratio Test

• The likelihood-ratio test statistic equals

−2 log(L0 /L1 ) = −2[log L0 − log L1 ] = −2[l0 − l1 ]

where L0 and L1 are the maximized likelihood functions under

the null hypothesis and under the full model, respectively.
• Under H0 , this test statistic also has a large sample chi-squared
distribution with d.f. = 1.

& %
49
' $

Score Test
• The score statistic or eﬃcient score statistic uses the size of the
derivative of the log-likelihood function evaluated at βj = 0.
• The score statistic is the square of the ratio of this derivative to
its ASE.
• It also has an approximate chi-squared distribution.

& %
50
' $

Example
• In simple logistic regression with one explanatory variable, the
log-likelihood function is:
∑
N
l(α, β) = {yi (α + βxi ) − log(1 + exp(α + βxi ))}
i=1

• Therefore for H0 : β = 0 and H1 : β ̸= 0, we have

∑
N
l0 = {yi log ȳ
1−ȳ − log 1−ȳ
1
},
i=1
∑N
l1 = {yi (α̂ + β̂xi ) − log(1 + exp(α̂ + β̂xi ))}
i=1

& %
51
' $

Model Residuals
• For i-th observation, the raw residual is:

ri = yi − µ̂i = Observed − ﬁtted,

where yi is the observed response and µ̂i is the ﬁtted value

from the model.
• The Pearson residual is deﬁned as
Pearson residual= √
Oberved-fitted
= √yi −µ̂i .
var(observed)
ˆ var(y
ˆ i)

• For Poisson GLMs, it simpliﬁes to

i −µ̂i
ei = y√ µ̂
i

& %
52
' $

Adjusted Residuals
• The Pearson residuals divided by its estimated standard error
are called adjusted residuals.
• Adjusted residuals have an approximate standard normal
distribution.
• For Poisson GLMs, the general form of the adjusted residual is:
(y − µ̂i ) ei
√ i =√
µ̂i (1 − hi ) 1 − hi
where hi is called the leverage of observation i.

& %
53

An Introduction to Categorical Data Analysis 3rd Edition Wiley Series in Probability and Statistics Agresti download
100% (2)
An Introduction to Categorical Data Analysis 3rd Edition Wiley Series in Probability and Statistics Agresti download
63 pages
TEMPLES ARCHITECTURAL IKS
No ratings yet
TEMPLES ARCHITECTURAL IKS
6 pages
BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
MAF3821 2024 Part1
100% (1)
MAF3821 2024 Part1
35 pages
Trivedi Rudri Assignement 2
No ratings yet
Trivedi Rudri Assignement 2
9 pages
Factorial Experiments R Codes
No ratings yet
Factorial Experiments R Codes
7 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
STAT 2300 General Syllabus
No ratings yet
STAT 2300 General Syllabus
6 pages
Sas Chapter 10 Asda Analysis Examples Replication Winter 2010 Sas
No ratings yet
Sas Chapter 10 Asda Analysis Examples Replication Winter 2010 Sas
7 pages
GWR Presentation
No ratings yet
GWR Presentation
48 pages
Cap - 1 - Peter - Smith - Survival - Analysis - Solution - Manual
100% (1)
Cap - 1 - Peter - Smith - Survival - Analysis - Solution - Manual
9 pages
Syllabus - Stats 412 - Miller-Fink - W21
No ratings yet
Syllabus - Stats 412 - Miller-Fink - W21
13 pages
SERVICE & PARTS ABhgjjGHT sn1500-2889
75% (4)
SERVICE & PARTS ABhgjjGHT sn1500-2889
146 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
ACTL30004 Assignment
No ratings yet
ACTL30004 Assignment
15 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong
64 pages
An Introduction to Generalized Linear Models Annette J. Dobson 2024 scribd download
100% (4)
An Introduction to Generalized Linear Models Annette J. Dobson 2024 scribd download
55 pages
2018 H2 Prelim Compilation (Correlation Regression)
No ratings yet
2018 H2 Prelim Compilation (Correlation Regression)
21 pages
10E-Poisson Regression
No ratings yet
10E-Poisson Regression
19 pages
PDF Probability Statistics and Random Processes for Engineers 4th Edition Stark Solutions Manual download
100% (28)
PDF Probability Statistics and Random Processes for Engineers 4th Edition Stark Solutions Manual download
61 pages
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
No ratings yet
Czekanowski Index-Based Similarity As Alternative Correlation Measure in N-Asset Portfolio Analysis
1 page
Tute Exercises PDF
No ratings yet
Tute Exercises PDF
141 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
10 pages
Nested Logit Models
No ratings yet
Nested Logit Models
3 pages
Intermediate R - Nonlinear Regression in R
No ratings yet
Intermediate R - Nonlinear Regression in R
4 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
In Class Exercise Linear Regression in R
No ratings yet
In Class Exercise Linear Regression in R
6 pages
Lab Report Gassiuan Distribution
100% (1)
Lab Report Gassiuan Distribution
13 pages
An Introduction To Splines: James H. Steiger
No ratings yet
An Introduction To Splines: James H. Steiger
23 pages
Logistic: Regression Sigmoid Function
No ratings yet
Logistic: Regression Sigmoid Function
4 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
Dummy Regression
No ratings yet
Dummy Regression
23 pages
IAL Statistic 1 Revision Worksheet Month 4
No ratings yet
IAL Statistic 1 Revision Worksheet Month 4
4 pages
SPSS Multiple Linear Regression
No ratings yet
SPSS Multiple Linear Regression
55 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
[FREE PDF sample] (Original PDF) Categorical Data Analysis 3rd Edition by Alan Agresti ebooks
No ratings yet
[FREE PDF sample] (Original PDF) Categorical Data Analysis 3rd Edition by Alan Agresti ebooks
45 pages
Hs Written Assignment 1
No ratings yet
Hs Written Assignment 1
6 pages
Practice Midterm2 Fall2011
No ratings yet
Practice Midterm2 Fall2011
9 pages
M1T3 Lesson 4 HW Key
100% (1)
M1T3 Lesson 4 HW Key
2 pages
Exams
No ratings yet
Exams
74 pages
Elementary Statistics Project
No ratings yet
Elementary Statistics Project
1 page
Logit Model For Binary Data
No ratings yet
Logit Model For Binary Data
50 pages
Lecture Notes For STAT2602
No ratings yet
Lecture Notes For STAT2602
104 pages
Linear Regression Analysis For STARDEX: Trend Calculation
No ratings yet
Linear Regression Analysis For STARDEX: Trend Calculation
6 pages
Chapter 8 Fitting Parametric Regression Models: Required Data
No ratings yet
Chapter 8 Fitting Parametric Regression Models: Required Data
11 pages
Exercises PDF
No ratings yet
Exercises PDF
30 pages
The Exponential Family of Distributions: P (X) H (X) e
No ratings yet
The Exponential Family of Distributions: P (X) H (X) e
13 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
13 Pag Design and Analysis of Experiments in The Health Sciences
No ratings yet
13 Pag Design and Analysis of Experiments in The Health Sciences
13 pages
(eBook PDF) Introduction to Probability and Statistics 4rd Canadian Edition - Download the ebook and explore the most detailed content
100% (1)
(eBook PDF) Introduction to Probability and Statistics 4rd Canadian Edition - Download the ebook and explore the most detailed content
52 pages
Download Full (eBook PDF) Biostatistics: A Foundation for Analysis in the Health Sciences, 11th Edition PDF All Chapters
100% (5)
Download Full (eBook PDF) Biostatistics: A Foundation for Analysis in the Health Sciences, 11th Edition PDF All Chapters
56 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
Logit and Probit Models
50% (2)
Logit and Probit Models
11 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
MTCARS Regression Analysis
No ratings yet
MTCARS Regression Analysis
5 pages
CH 2
No ratings yet
CH 2
17 pages
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
32 pages
Statistics Project 17
No ratings yet
Statistics Project 17
13 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
12 pages
PCM 12TH Holiday Homework 2024-25 - 143
No ratings yet
PCM 12TH Holiday Homework 2024-25 - 143
5 pages
SANY SET150S Hybrid Electrical Power Ming Truck
100% (1)
SANY SET150S Hybrid Electrical Power Ming Truck
4 pages
DR9
No ratings yet
DR9
2 pages
IIM A Contribution Essay - GG
No ratings yet
IIM A Contribution Essay - GG
2 pages
Paleopathology in Russia, Buzhilova
No ratings yet
Paleopathology in Russia, Buzhilova
9 pages
To Be With The Truthful
No ratings yet
To Be With The Truthful
233 pages
Trascritto Per Contrabasso e Orchestra Di Vento
No ratings yet
Trascritto Per Contrabasso e Orchestra Di Vento
6 pages
Jescspsu 06
No ratings yet
Jescspsu 06
10 pages
Carbonyl Compound
No ratings yet
Carbonyl Compound
8 pages
8 - Studi Kasus Lapangan Panas Bumi Non-Vulkanik Di Sulawesi
No ratings yet
8 - Studi Kasus Lapangan Panas Bumi Non-Vulkanik Di Sulawesi
13 pages
Network Analysis - Week 12
No ratings yet
Network Analysis - Week 12
4 pages
UNIT2-Topic 2
No ratings yet
UNIT2-Topic 2
3 pages
Week #3: Causes of Climate Change
No ratings yet
Week #3: Causes of Climate Change
67 pages
Rhetoric in Conflict-Related Yoruba Proverbs: Guide To Constructive Conflict Resolution in Africa
No ratings yet
Rhetoric in Conflict-Related Yoruba Proverbs: Guide To Constructive Conflict Resolution in Africa
16 pages
Part One
No ratings yet
Part One
86 pages
Martin - Alexander - Thesis UBC
No ratings yet
Martin - Alexander - Thesis UBC
116 pages
Devotion To The Drops of Jesus Precious Blood
100% (1)
Devotion To The Drops of Jesus Precious Blood
21 pages
9061 902 Edited
No ratings yet
9061 902 Edited
6 pages
British Camo Trial Before MTP
No ratings yet
British Camo Trial Before MTP
5 pages
Sexuality Versus Purity in Dracula: The Corruption of Victorian Morals
100% (1)
Sexuality Versus Purity in Dracula: The Corruption of Victorian Morals
4 pages
Style
No ratings yet
Style
48 pages
GRRRL Germs: A Visual History of Riot GRRRL at Spectacle
No ratings yet
GRRRL Germs: A Visual History of Riot GRRRL at Spectacle
1 page
LESSON 11-Electronic Structure
No ratings yet
LESSON 11-Electronic Structure
22 pages
(eBook PDF) Financing Education in a Climate of Change 12th Edition download
100% (7)
(eBook PDF) Financing Education in a Climate of Change 12th Edition download
56 pages
MDKA 2019-Q4 Financial-Report Audited IND-ENG
No ratings yet
MDKA 2019-Q4 Financial-Report Audited IND-ENG
113 pages
NCM 417 - Midterm Exam 2015
No ratings yet
NCM 417 - Midterm Exam 2015
6 pages
Medical Examinations of Seafarers, Explanation
No ratings yet
Medical Examinations of Seafarers, Explanation
3 pages
Four Noble Truths 2
No ratings yet
Four Noble Truths 2
14 pages

Lecture Notes 5

Uploaded by

Lecture Notes 5

Uploaded by

' $

ST3241 Categorical Data Analysis I

Introduction and Some Examples

Example: Challenger O-ring

Ft Temp TD Ft Temp TD Ft Temp TD

• Is there any association between Temperature and thermal

Example: Horseshoe Crabs

The function g(.) is called the link f unction.

Some Popular Link Functions

More On Link Functions · · ·

GLM For Binary Data: Random Component

Linear Probability Model

Snoring Yes No Proportion Yes Linear Fit

Never 24 1355 0.017 0.017

π(x) = 0.0172 + 0.0198x

• The least squares ﬁt is slightly diﬀerent.

Proportion Linear Logit

Never 24 1355 0.017 0.017 0.021

Alternative Binary Links

F (x) = P (X ≤ x), −∞ < x < ∞

• Such a function, plotted as a function of x, has appearance like

where F is a cdf for some distribution.

Alternative Binary Links

Proportion Linear Logit Probit

Never 24 1355 0.017 0.017 0.021 0.020

GLM for Count Data

Poisson Regression - Continued

• For this model: µ = exp(α + βx) = eα (eβ )x

Poisson Regression For Rate Data

Poisson Regression For Rate Data

log(µi /ti ) = α + βxi

• This has an equivalent representation:

log µi − log ti = α + βxi

• The adjustment term, − log ti , to the above equation is called

f (y; θ, ϕ) = exp{(yθ − b(θ))/a(ϕ) + c(y, ϕ)}

for some speciﬁc function a(ϕ), b(θ) and c(y, ϕ).

Examples: Normal Distribution

• Here, θ = µ, ϕ = σ 2 , and a(ϕ) = ϕ, b(θ) = θ2 /2 and

Examples: Binomial Distribution

• Here θ = log( 1−π

Examples: Poisson Distribution

• Here θ = log λ,ϕ = 1, b(θ) = eθ ,

l(θ, ϕ; y) = log f (y; θ, ϕ) = (yθ − b(θ))/a(ϕ) + c(y, ϕ)

• We use general likelihood results, applicable to exponential

Mean and Variances

Likelihood Equations in GLMs

• If E(yi ) = µi , then the link for the i-th observation is:

Likelihood Equations in GLMs

• The notation L(β) reﬂects the dependence of θ on the model

Likelihood Equations in GLMs

for j = 1, · · · , p. Notice that ∂µi

Maximum Likelihood Estimates

β̂j ± zα/2 ASE

where ASE is the asymptotic standard error of β̂j .

Testing For Signiﬁcance

Likelihood Ratio Test

−2 log(L0 /L1 ) = −2[log L0 − log L1 ] = −2[l0 − l1 ]

where L0 and L1 are the maximized likelihood functions under

• Therefore for H0 : β = 0 and H1 : β ̸= 0, we have

ri = yi − µ̂i = Observed − ﬁtted,

where yi is the observed response and µ̂i is the ﬁtted value

• For Poisson GLMs, it simpliﬁes to

You might also like