0% found this document useful (0 votes)

81 views46 pages

26GeneralizedLinearModelBernoulliAnnotated PDF

Uploaded by

Farah Stat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views46 pages

26GeneralizedLinearModelBernoulliAnnotated PDF

Uploaded by

Farah Stat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

A Generalized Linear Model for

Bernoulli Response Data

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46

Consider the Gauss-Markov linear model with normal
errors:
2
y = X + ✏, ✏ ⇠ N(0, I).

Another way to write this model is

2
8 i = 1, . . . , n, yi ⇠ N(µi , ), µi = x0i ,

and y1 , . . . , yn are independent.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 2 / 46

This is a special case of what is known as a
generalized linear model.

Here is another special case:

8 i = 1, . . . , n, yi ⇠ Bernoulli(⇡i ),

exp(x0i )
⇡i = ,
1 + exp(x0i )
and y1 , . . . , yn are independent.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 3 / 46

In each example, all responses are independent, and
each response is a draw from one type of distribution
whose parameters may depend on explanatory
variables through a linear predictor x0i .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 4 / 46

The second model, for the case of a binary response,
is often called a logistic regression model.

Binary responses are common (success/failure,

survive/die, good customer/bad customer, win/lose,
etc.)

The logistic regression model can help us understand

how explanatory variables are related to the
probability of “success.”

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 5 / 46

Example: Disease Outbreak Study

Source: Applied Linear Statistical Models, 4th edition,

by Neter, Kutner, Nachtsheim, Wasserman (1996)

In a health study to investigate an epidemic outbreak

of a disease that is spread by mosquitoes, individuals
were randomly sampled within two sectors in a city to
determine if the person had recently contracted the
disease under study.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 6 / 46

Response Variable

yi = 0 (person i does not have the disease)

yi = 1 (person i has the disease)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 7 / 46

Potential Explanatory Variables

age in years

socioeconomic status

1 = upper
2 = middle
3 = lower

sector (1 or 2)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 8 / 46

Questions of Interest

The potential explanatory variables and the response

were recorded for 196 randomly selected individuals.

Are any of these variables associated with the

probability of disease and if so how?

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 9 / 46

We will demonstrate how to use R to fit a logistic
regression model to this dataset.

Before delving more deeply into logistic regression,

we will review the basic facts of the Bernoulli
distribution.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 10 / 46

y ⇠ Bernoulli(⇡) has probability mass function

(
⇡ k (1 ⇡)1 k
for k 2 {0, 1}
Pr(y = k) = f (k) =
0 otherwise

Thus,

Pr(y = 0) = f (0) = ⇡ 0 (1 ⇡)1 0

=1 ⇡

and
Pr(y = 1) = f (1) = ⇡ 1 (1 ⇡)1 1
= ⇡.
Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 11 / 46
The variance of y is a function of the mean of y.

1
X
E(y) = kf (k) = 0 · (1 ⇡) + 1 · ⇡ = ⇡
k=0

1
X
2
E(y ) = k2 f (k) = 02 · (1 ⇡) + 12 · ⇡ = ⇡
k=0

Var(y) = E(y2 ) [E(y)]2 = ⇡ ⇡ 2 = ⇡(1 ⇡)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 12 / 46

The Logistic Regression Model

For i = 1, . . . , n, yi ⇠ Bernoulli(⇡i ),

where
exp(x0i )
⇡i =
1 + exp(x0i )

and y1 , . . . , yn are independent.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 13 / 46

The Logit Function

The function ✓ ◆
⇡
g(⇡) = log
1 ⇡
is called the logit function.

The logit function maps the interval (0, 1) to the real

line ( 1, 1).

⇡ is a probability, so log( 1 ⇡ ⇡ ) is the log(odds), where

Pr(A)
the odds of an event A ⌘ 1 Pr(A) .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 14 / 46

Note that
✓ ◆
⇡i
g(⇡i ) = log
1 ⇡i

exp(x0i ) 1
= log
1 + exp(x0i ) 1 + exp(x0i )

= log[exp(x0i )] = x0i .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 15 / 46

Thus, the logistic regression model says that,

yi ⇠ Bernoulli(⇡i ), where
✓ ◆
⇡i
log = x0i
1 ⇡i
In Generalized Linear Models terminology, the logit is
called the link function because it “links” the mean of
yi (i.e., ⇡i ) to the linear predictor x0i .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 16 / 46

For Generalized Linear Models, it is not necessary
that the mean of yi be a linear function of .

Rather, some function of the mean of yi is a linear

function of .

For logistic regression, that function is

✓ ◆
⇡i
logit(⇡i ) = log = x0i .
1 ⇡i

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 17 / 46

When the response is Bernoulli or more generally,
binomial, the logit link function is one natural choice.
However, other link functions can be considered.

Some common choices (that are also available in R)

include the following:

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 18 / 46

1
logit: ✓ ◆
⇡
log = x0 .
1 ⇡
2
probit:
1
(⇡) = x0 ,
1
where (·) is the inverse of N(0, 1) CDF.

3
Complementary log-log (cloglog in R):

log( log(1 ⇡)) = x0 .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 19 / 46

Although any of these link functions (or others) can
be used, the logit link has some advantages when it
comes to interpreting the results (as we will discuss
later).

Thus, the logit link is a good choice if it can provide a

good fit to the data.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 20 / 46

The log likelihood function for logistic regression is
n
X
`( | y) = log[⇡iyi (1 ⇡i ) 1 yi
]
i=1
n
X
= [yi log(⇡i ) + (1 yi ) log(1 ⇡i )]
i=1
Xn
= [yi {log(⇡i ) log(1 ⇡i )} + log(1 ⇡i )]
i=1
Xn  ✓ ◆
⇡i
= yi log + log(1 ⇡i )
i=1
1 ⇡i
n
X
= [yi x0i log(1 + exp{x0i })]
i=1

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 21 / 46

For Generalized Linear Models, Fisher’s Scoring
Method is typically used to obtain an MLE for ,
denoted as ˆ .

Fisher’s Scoring Method is a variation of the

Newton-Raphson algorithm in which the Hessian
matrix (matrix of second partial derivatives) is
replaced by its expected value (-Fisher Information
matrix).

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 22 / 46

For generalized Linear Models, Fisher’s scoring
method results in an iterative weighted least squares
procedure.

The algorithm is presented for the general case in

Section 2.5 of Generalized Linear Models 2nd Edition
(1989) by McCullough and Nelder.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 23 / 46

For sufficiently large samples, ˆ is approximately
normal with mean and a variance-covariance
matrix that can be approximated by the estimated
inverse of the Fisher Information Matrix; i.e.,

ˆ ⇠· N( , Î 1 ( ˆ ))

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 24 / 46

Inference can be conducted using the Wald approach
or via likelihood ratio testing as discussed in our
course notes on likelihood-related topics.

For example, a Wald confidence interval for c0 with

approximate coverage probability of 0.95 is given by
q
c ± 1.96 c0 Î 1 ( ˆ )c
0ˆ

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 25 / 46

Interpretation of Logistic Regression Parameters

Let x = [x1 , x2 , . . . , xj 1 , xj , xj+1 , . . . , xp ]0 .

Let x̃ = [x1 , x2 , . . . , xj 1 , xj + 1, xj+1 , . . . , xp ]0 .

In other words, x̃ is the same as x except that the jth

explanatory variable has been increased by one unit.

exp(x0 ) exp(x̃0 )
Let ⇡ = 1+exp(x0 ) and ⇡
˜= 1+exp(x̃0 ) .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 26 / 46

The Odds Ratio

⇢ ✓ ◆
⇡
˜ ⇡ ⇡
˜ ⇡
= exp log
1 ⇡
˜ 1 ⇡ 1 ⇡˜ 1 ⇡
⇢ ✓ ◆ ✓ ◆
⇡
˜ ⇡
= exp log log
1 ⇡ ˜ 1 ⇡
= exp{x̃0
x }
0

= exp{(xj + 1) j xj j }
= exp{ j }.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 27 / 46

Thus, ⇡
˜
1 ⇡˜ = exp( j ) 1 ⇡ ⇡ .

All other explanatory variables held constant, the

odds of success at xj + 1 are exp( j ) times the odds of
success at xj .

This is true regardless of the initial value xj .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 28 / 46

A one unit increase in the jth explanatory variable
(with all other explanatory variables held constant) is
associated with a multiplicative change in the odds of
success by the factor exp( j ).

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 29 / 46

P ( L
;
E
p; E Uj ) =
100 ( i -

a) %

# P( exp ( L ;) E
exp ( p ;) E
exp ( U
;) ) =
100 It a) %

If (Lj , Uj ) is a 100(1 ↵)% confidence interval for j,

then
(exp(Lj ), exp(Uj ))

is a 100(1 ↵)% confidence interval for exp( j ).

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 30 / 46

Also, note that
exp(x0 ) 1
⇡ = = 1
1 + exp(x0 ) exp(x0 ) +1
1
= .
1 + exp( x0 )

Thus, if (Lj ,Uj ) is a 100(1 ↵)% confidence interval for

x0 , then a 100(1 ↵)% confidence interval for ⇡ is
✓ ◆
1 1
, .
1 + exp( Lj ) 1 + exp( Uj )

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 31 / 46

> d=read.delim("http://dnett.github.io/S510/Disease.txt")

Lwt
> head(d) Ianort THIS
id age ses sector disease savings IN
VARIABLE
1 1 33 1 1 0 1 EXAMPLE
THIS .

2 2 35 1 1 0 1
3 3 6 1 1 0 0
4 4 60 1 1 0 1
5 5 18 3 1 1 0
6 6 26 3 1 0 0
>
> d$ses=factor(d$ses)
> d$sector=factor(d$sector)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 32 / 46

> o=glm(disease˜age+ses+sector,
+ family=binomial(link=logit),
+ data=d)
>
> summary(o)
X =
,
aze , Sester
SEI[1se=3
,
,
2)

Call:
glm(formula = disease ˜ age + ses + sector,
family = binomial(link = logit),
data = d)
LEARN ABOUT THESE LATER

-
WE Win
.

Deviance Residuals:
Min 1Q Median 3Q Max
-1.6576 -0.8295 -0.5652 1.0092 2.0842

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.293933
[email protected]
0.436769 -5.252 1.5e-07 ***
age 0.026991 0.008675 3.111 0.001862 **
ses2 0.044609 0.432490 0.103 0.917849
ses3 0.253433 0.405532 0.625 0.532011
sector2 1.243630 0.352271 3.530 0.000415 ***
Test of

---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for binomial family taken to be 1)

WE WILL LEARN ABOUT THIS LATER .

NULL Moser :
logit ( Hi ) =
M ( IT ,=Tz= - .
.
= Tn )
'

Fuu Morsel :
logit ( Hi ) =
Xi @

lodid.
-
2 ) n -
i

Null deviance: 236.33 on 195 degrees of freedom

Residual deviance: 211.22 on 191 degrees of freedom

,=(
a

AIC: 221.22
-24=(5+45)
-

Zl E) n -5

Number of Fisher Scoring iterations: 3

> coef(o)
(Intercept) age ses2 ses3 sector2
-2.29393347 0.02699100 0.04460863 0.25343316 1.24363036
§ ' B B' $4'
B's 3

> round(vcov(o),3)
VEr ( E) (Intercept) age ses2 ses3 sector2

agebees
(Intercept)

ses2
0.191 -0.002 -0.083 -0.102
-0.002
-0.083
0.000
0.000
0.000
0.187
0.000
0.072
-0.080
0.000
0.003
ses3 -0.102 0.000 0.072 0.164 0.039
sector2 -0.080 0.000 0.003 0.039 0.124

{ bs :
Zech -

D)
2llEb' extra
}

)
¥%¥r%÷¥
" * " " " 1
" ⇒

> confint(o)
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) -3.19560769 -1.47574975
age 0.01024152 0.04445014
ses2 -0.81499026 0.89014587
ses3 -0.53951033 1.05825383
sector2 ( 0.56319260
)<1.94992969
,

APPROXIMATE 95% CONFIDENCE

INTERVAL For Be .

FIT A REDUCED MODEL THAT EXCLUDES Socioeconomic

status ( 5€52 5€53 ) '

×=[}
,

1 age sector
, ,
> oreduced=glm(disease˜age+sector, ~ ~ -

+ family=binomial(link=logit),
+ data=d)
>
> anova(oreduced,o,test="Chisq")
Ho :p3=fy=o
Analysis of Deviance Table
Ho :fsesz=Bses3=O
Model 1: disease ˜ age + sector
Model 2: disease ˜ age + ses + sector
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 n -3 193 24dB)211.64 -2105.11=2 .ME ) -2L ,dEr)
↳
2 M -
5 191 Zl ,=( E) 211.22 2 0.4193 0.8109 =p
.
VALUE

y ,
)
5-3
PM } 0.4193

SWITCH To REDUCED MODEL BECAUSE SES

NOT SIGNIFICANT
> o=oreduced
-
Nt -

> anova(o,test="Chisq")
[email protected]
Analysis of Deviance Table
Model: binomial, link: logit
-

zettel )

/ /
Response: disease

Terms added sequentially (first to last)

)
Yann
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 195 236.33
p( X. 312.013 )
[email protected]
age 1 12.013 194 224.32 0.0005283 ***
sector 1 12.677 193 211.64 0.0003702 ***

zedo.sk
,
) # ( PCXT ? 12.677 )
Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 39 / 46
> head(model.matrix(o))
(Intercept) age sector2
1 1 33 0
2 1 35 0
3 1 6 0
4 1 60 0
5 1 18 0
6 1 26 0
>
> b=coef(o)

E
> b
(Intercept) age sector2
-2.15965912 0.02681289 1.18169345

LIKELIHOOD BASED CONFIDENCE INTERVALS

> ci=confint(o)
Waiting for profiling to be done...
> ci
2.5 % 97.5 %
(Intercept) -2.86990940 -1.51605906
age 0.01010532 0.04421365
sector2 0.52854584 1.85407936

> #How should we interpret our estimate of
> #the slope coefficient on age?
> exp(b[2])

( K)
age
exp
1.027176
> #All else equal, the odds of disease are about 1.027
> #times greater for someone age x+1 than for someone
> #age x. An increase of one year in age is associated
> #with an increase in the odds of disease by about 2.7%.
> #A 95% confidence interval for the multiplicative
> #increase factor is
> exp(ci[2,])
2.5 % 97.5 %
1.010157 1.045206

> #How should we interpret our estimate of
> #the slope coefficient on sector?

( $3 )
> exp(b[3])
sector2 exp
3.25989
> #All else equal, the odds of disease are about 3.26
> #times greater for someone living in sector 2 than for
> #someone living in sector one.
> #A 95% confidence interval for the multiplicative
> #increase factor is
> exp(ci[3,])
2.5 % 97.5 %
1.696464 6.385816

> #Estimate the probability that a randomly
> #selected 40-year-old living in sector 2
> #has the disease. '

X. =
[ I
,
40
,
I
]
> x=c(1,40,1)
> 1/(1+exp(-t(x)%*%b)) I

[,1]
E)
'
I + expt -

x.
[1,] 0.5236198
> #Approximate 95% confidence interval
> #for the probability in question.
> sexb=sqrt(t(x)%*%vcov(o)%*%x) ← a
'
Etesx
> cixb=c(t(x)%*%b-2*sexb,t(x)%*%b+2*sexb)
> 1/(1+exp(-cixb))
[1] 0.3965921 0.6476635
±zWEx. ELITE '

> #Plot estimated probabilities as a function
> #of age for each sector.
>
> x=1:85
plot(x,1/(1+exp(-(b[1]+b[2]*x))),ylim=c(0,1),
type="l",col=4,lwd=2,xlab="Age",
ylab="Estimated Probability of Disease", cex.lab=1.3)
lines(x,1/(1+exp(-(b[1]+b[2]*x+b[3]))),col=2,lwd=2)
legend("topleft", legend=c("Sector 1","Sector 2"),
col=c(4,2),lwd=2)

1.0

Sector 1

Sector 2
0.8
Estimated Probability of Disease
0.6
0.4
0.2
0.0

0 20 40 60 80

Age

Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
ES714glm Generalized Linear Models
No ratings yet
ES714glm Generalized Linear Models
26 pages
Statistics 244 - Binary Response Regression, and Related Issues
100% (1)
Statistics 244 - Binary Response Regression, and Related Issues
30 pages
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
No ratings yet
Logit, Probit and Multinomial Logit Models in R: Oscar Torres-Reyna
27 pages
SAP Deliveries in Background
100% (2)
SAP Deliveries in Background
6 pages
Microeconometrie Chapitre1 BinaryOutcomeModels
No ratings yet
Microeconometrie Chapitre1 BinaryOutcomeModels
42 pages
Logit Probit
No ratings yet
Logit Probit
20 pages
Unit - II Regression-LogisticRegressionModels
No ratings yet
Unit - II Regression-LogisticRegressionModels
7 pages
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
No ratings yet
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
23 pages
Logistic
No ratings yet
Logistic
14 pages
Human Resource Management As Strategic Business Contributor
100% (4)
Human Resource Management As Strategic Business Contributor
19 pages
Operations MCQs
100% (1)
Operations MCQs
43 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Econometric Analysis of Cross Section and Panel Data, 2e: Models For Fractional Responses
No ratings yet
Econometric Analysis of Cross Section and Panel Data, 2e: Models For Fractional Responses
104 pages
Generalized Linear Models: 45 Heagerty, Bio/Stat 571
No ratings yet
Generalized Linear Models: 45 Heagerty, Bio/Stat 571
39 pages
16 Personality Factor
No ratings yet
16 Personality Factor
11 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Week04 Lecture BB
No ratings yet
Week04 Lecture BB
80 pages
Sta116-Final Report (Group Project)
No ratings yet
Sta116-Final Report (Group Project)
20 pages
Presentation Generalized Linear Model Theory
No ratings yet
Presentation Generalized Linear Model Theory
77 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
Econometrics
No ratings yet
Econometrics
37 pages
Week 6 Mle Perraillon 0
No ratings yet
Week 6 Mle Perraillon 0
69 pages
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
No ratings yet
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
6 pages
09 Discrete Choice 1 Notes
No ratings yet
09 Discrete Choice 1 Notes
17 pages
Regression With A Binary Dependent Variable
No ratings yet
Regression With A Binary Dependent Variable
63 pages
PD2004 9
No ratings yet
PD2004 9
26 pages
STAT511Q2Q4
No ratings yet
STAT511Q2Q4
11 pages
Fisher Information For GLM
No ratings yet
Fisher Information For GLM
35 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
Very Large Power System Operators in The World
No ratings yet
Very Large Power System Operators in The World
79 pages
Roni Presentation
No ratings yet
Roni Presentation
17 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
Stat 5102 Lecture Slides: Deck 6 Gauss-Markov Theorem, Sufficiency, Generalized Linear Models, Likelihood Ratio Tests, Categorical Data Analysis
No ratings yet
Stat 5102 Lecture Slides: Deck 6 Gauss-Markov Theorem, Sufficiency, Generalized Linear Models, Likelihood Ratio Tests, Categorical Data Analysis
86 pages
Logit R101
No ratings yet
Logit R101
27 pages
Cap1 Slides
No ratings yet
Cap1 Slides
30 pages
Stat4006 2022-23 PS3
No ratings yet
Stat4006 2022-23 PS3
3 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
07 GLM
No ratings yet
07 GLM
49 pages
Notes 13
No ratings yet
Notes 13
18 pages
Lecture 8
No ratings yet
Lecture 8
39 pages
HWK 5
No ratings yet
HWK 5
16 pages
Detailed Advertisement For Faculty Recruitment 19.07
No ratings yet
Detailed Advertisement For Faculty Recruitment 19.07
8 pages
Project Report On: University of Mumbai
No ratings yet
Project Report On: University of Mumbai
51 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
Poisson Regression
No ratings yet
Poisson Regression
3 pages
Note on Generalized Linear Models: y y Xβ w X β w I y Xβ I y Xβ X w X
No ratings yet
Note on Generalized Linear Models: y y Xβ w X β w I y Xβ I y Xβ X w X
4 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
8a Introduction To Logistic Regression
No ratings yet
8a Introduction To Logistic Regression
20 pages
Regression 101
No ratings yet
Regression 101
18 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
3 Classification
No ratings yet
3 Classification
26 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Binary Logistic Regression
No ratings yet
Binary Logistic Regression
8 pages
Assignment On Probit Model
No ratings yet
Assignment On Probit Model
17 pages
Synthesizing
100% (1)
Synthesizing
29 pages
Basic R Programming: Exercises
No ratings yet
Basic R Programming: Exercises
7 pages
15 GLM
No ratings yet
15 GLM
32 pages
Reiki
No ratings yet
Reiki
4 pages
ILCC 2013 Proceedings
0% (1)
ILCC 2013 Proceedings
644 pages
302 F 14 Logistic Regression
No ratings yet
302 F 14 Logistic Regression
23 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
109 pages
Knife Steels: The Steel Chart
No ratings yet
Knife Steels: The Steel Chart
1 page
Confidence Development: Self-Confidence Is Extremely Important in Almost Every Aspect of Our Lives
No ratings yet
Confidence Development: Self-Confidence Is Extremely Important in Almost Every Aspect of Our Lives
4 pages
SAS Library Data Transformations and Data Manipulation in SAS
No ratings yet
SAS Library Data Transformations and Data Manipulation in SAS
31 pages
Analisis Integrasi Pasar Spasial Komoditi Pangan Antar Provinsi Di Indonesia Arnanto, Sri Hartoyo, Wiwiek Rindayati
No ratings yet
Analisis Integrasi Pasar Spasial Komoditi Pangan Antar Provinsi Di Indonesia Arnanto, Sri Hartoyo, Wiwiek Rindayati
22 pages
Solar Radiation Components
No ratings yet
Solar Radiation Components
14 pages
Pythondjango - Onlinetraining - Vlrtraining PDF
No ratings yet
Pythondjango - Onlinetraining - Vlrtraining PDF
4 pages
Blue Template - Ready For PDF
No ratings yet
Blue Template - Ready For PDF
3 pages
Three Year LL.B Syllabus-Regulations 2016-17 PDF
No ratings yet
Three Year LL.B Syllabus-Regulations 2016-17 PDF
102 pages
A Philosophicall Essay For The Reunion of by Pierre Besnier
No ratings yet
A Philosophicall Essay For The Reunion of by Pierre Besnier
20 pages
Simio Network License Server
No ratings yet
Simio Network License Server
48 pages
Analysis of Risk and Returns of Sahara Mutual Funds: A Project Report On
No ratings yet
Analysis of Risk and Returns of Sahara Mutual Funds: A Project Report On
7 pages
Soalan Jawapan Yg Mungkin Akan Ditanya
No ratings yet
Soalan Jawapan Yg Mungkin Akan Ditanya
10 pages
FEE 532 Power System Stability II
No ratings yet
FEE 532 Power System Stability II
30 pages
Poster Pfe Bardoux 2
No ratings yet
Poster Pfe Bardoux 2
1 page
Class 10 Maths Chapter 1 - REAL NUMBERS EXERCISE SOLUTIONS
No ratings yet
Class 10 Maths Chapter 1 - REAL NUMBERS EXERCISE SOLUTIONS
27 pages
GEL 1503 Ass 02
No ratings yet
GEL 1503 Ass 02
6 pages
PST Worksheets
No ratings yet
PST Worksheets
2 pages
Report Rubrics
No ratings yet
Report Rubrics
2 pages
TECH Resume Darold Kelly 1019
No ratings yet
TECH Resume Darold Kelly 1019
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet

26GeneralizedLinearModelBernoulliAnnotated PDF

Uploaded by

26GeneralizedLinearModelBernoulliAnnotated PDF

Uploaded by

A Generalized Linear Model for

Bernoulli Response Data

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46

Another way to write this model is

and y1 , . . . , yn are independent.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 2 / 46

Here is another special case:

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 3 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 4 / 46

Binary responses are common (success/failure,

The logistic regression model can help us understand

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 5 / 46

Source: Applied Linear Statistical Models, 4th edition,

In a health study to investigate an epidemic outbreak

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 6 / 46

yi = 0 (person i does not have the disease)

yi = 1 (person i has the disease)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 7 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 8 / 46

The potential explanatory variables and the response

Are any of these variables associated with the

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 9 / 46

Before delving more deeply into logistic regression,

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 10 / 46

Pr(y = 0) = f (0) = ⇡ 0 (1 ⇡)1 0

Var(y) = E(y2 ) [E(y)]2 = ⇡ ⇡ 2 = ⇡(1 ⇡)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 12 / 46

and y1 , . . . , yn are independent.

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 13 / 46

The logit function maps the interval (0, 1) to the real

⇡ is a probability, so log( 1 ⇡ ⇡ ) is the log(odds), where

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 14 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 15 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 16 / 46

Rather, some function of the mean of yi is a linear

For logistic regression, that function is

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 17 / 46

Some common choices (that are also available in R)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 18 / 46

log( log(1 ⇡)) = x0 .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 19 / 46

Thus, the logit link is a good choice if it can provide a

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 20 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 21 / 46

Fisher’s Scoring Method is a variation of the

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 22 / 46

The algorithm is presented for the general case in

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 23 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 24 / 46

For example, a Wald confidence interval for c0 with

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 25 / 46

Let x = [x1 , x2 , . . . , xj 1 , xj , xj+1 , . . . , xp ]0 .

In other words, x̃ is the same as x except that the jth

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 26 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 27 / 46

All other explanatory variables held constant, the

This is true regardless of the initial value xj .

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 28 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 29 / 46

If (Lj , Uj ) is a 100(1 ↵)% confidence interval for j,

is a 100(1 ↵)% confidence interval for exp( j ).

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 30 / 46

Thus, if (Lj ,Uj ) is a 100(1 ↵)% confidence interval for

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 31 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 32 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 33 / 46

(Dispersion parameter for binomial family taken to be 1)

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 34 / 46

Null deviance: 236.33 on 195 degrees of freedom

Number of Fisher Scoring iterations: 3

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 35 / 46

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 36 / 46

APPROXIMATE 95% CONFIDENCE

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 37 / 46

status ( 5€52 5€53 ) '

Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 38 / 46