0% found this document useful (0 votes)
42 views12 pages

Regression 12

Logistic regression models the probability of a binary outcome as a function of explanatory variables. It uses a logistic link function to map probabilities between 0 and 1. The logit transformation is applied to convert probabilities to log-odds. Maximum likelihood estimation is used to estimate the model parameters by iteratively finding the values that maximize the likelihood function. The estimated parameters can be interpreted as the change in log-odds of the outcome per unit change in the explanatory variables.

Uploaded by

Benedikta Anna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views12 pages

Regression 12

Logistic regression models the probability of a binary outcome as a function of explanatory variables. It uses a logistic link function to map probabilities between 0 and 1. The logit transformation is applied to convert probabilities to log-odds. Maximum likelihood estimation is used to estimate the model parameters by iteratively finding the values that maximize the likelihood function. The estimated parameters can be interpreted as the change in log-odds of the outcome per unit change in the explanatory variables.

Uploaded by

Benedikta Anna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Ch7 Logistic Regression

The Basic Situation:

Observe a set of n independent binary responses

0, if ith outcome is "failure"


{x1 , ⋯ xn },where xi = {
1, if ith outcome is "success"

In addition to binary response xi ,observed values of

explanatory variables, say z′i =(z1i , ⋯ zri ), they provide

information about the condition under which the ith response

was obtained.

OBJECTIVE:

Construct a model for the conditional probability (π1,⋯, πn )

given z′i =(z1i , ⋯ zri ), πi = pr {xi = 1│z′i = (z1i , ⋯ zri )}

1 − πi = pr {xi = 0│z′i = (z1i , ⋯ zri )}

Linear models ?

You might consider πi = β0 + β1 Z1i + ⋯ + βr Zri , i = 1, ⋯ , n

and use the least squares to obtain parameter estimates

(b0 , ⋯ br ).
One problem is that

π
̂i = b0 + b1 Z1i + ⋯ + br Zri

can be less than 0 or large than 1.

If we set

πi = Pr {Yi = 1│Z′i = (z1i , ⋯ zri )} = F(β0 + ∑rj=1 βj zji )

where F(.) is a cumulative dist. function(c.d.f) and

β = (β0 , β1 , ⋯ βr )′ is a vector of unknown parameters.

Clearly, 0 ≤ πi ≤ 1 for all Zi .

πi can achieve any value between 0 and 1 where F is

continuous. F(.) links the conditional probabilities {πi } to the

parameters β. It is a special kind of “link function”.

(1) Probit model:


u2
w 1 −
Let F(w)=Φ(w) = ∫−∞ √2π e 2 du

be the c.d.f for the standard Normal Distribution. Then


r β0 +∑rj=1(βj zji )
1 u2
−2
πi = Φ (β0 + ∑(βj zji )) = ∫ e du
j=1 −∞ √2π
(2)Logit Model: (Logistic regression)

The c.d.f for the “standard” logistic dist. is


1 ey
F(y)= Pr (Y ≤ y) = =
1+e−y 1+ey

and the logistic regression model is


r
exp(β0 + ∑rj=1 βj zji )
πi = F (β0 + ∑ βj zji ) =
1 + exp(β0 + ∑rj=1 βj zji )
j=1

which is equivalent to
πi
ln (
1−πi
) = β0 + ∑rj=1 βj zji , i=1,⋯ , n.

On the left side is the natural logarithm of the odd of success

given Z′i , which is called a logit.

On the right side is a linear function of the parameters.

Probit and logit models are quite similar except for values of π

close to 0 or 1

exp(β0 +∑rj=1 βj zji )


Logit link: πi =
1+exp(β0 +∑rj=1 βj zji )
2
β0 +∑rj=1(βj zji ) 1 −
u
Probit link: πi = ∫−∞ e 2 du
√2π
Ex: Mortality of a certain species of beetle after 5 hours

exposure to gaseous carbon disulfide.


Dose(mg/liter) Z=ln(dose) Number Number Pi Empirical
killed of P
logit=ln(1−Pi )
survival i

49.057 3.893 6 53 6/(6+53) -2.179

52.991 3.970 13 47 13/(13+47) -1.285

56.991 4.043 18 44 … -0.894

60.542 4.103 28 28 … 0

64.359 4.164 52 11 …. 1.553

68.891 4.233 53 6 … 2.179

72.611 4.258 61 1 … 4.111

76.542 4.338 60 0 60/(60+0) -------------

Logistic regression model:


exp((β0 +β1 Zi ))
πi = Pr {an insect dies exposure to zi = ln(dosei )}=
1+exp(β0 +β1 Zi )
πi
or ln( )=β0 + β1 Zi , Zi = ln(dosei ), i=1,2,…,7
1−πi
xi number of dead insects for the ith dose
Pi = =
ni #exposed to the ith dose

is an unbiased estimate of πi .
Pi
Plot the empirical logit ln( ) against zi = ln(dosei ) to
1−Pi

access the fit of the propose logistic regression model.

Interpretation of model parameters:


πi
We define ln( ) = β0 + β1 Zi , where zi = ln(dosei ).
1−πi
odds that an π
The β0 = ln ( ) = ln (1−π0 ) when dose=1
insect is deaed 0

mg/liter or the conditional probability that a randomly selected

insect is killed exposure to 1 mg/liter above of gaseous carbon


exp (β0 )
disulfide is π0 =
1+exp (β0 )
πi
Since ln( ) = β0 + β1 Zi, β1 represents the increase in the
1−πi

log-dose zi is increased to zi + 1

β1 = (β0 + β1 (Z + 1)) − (β0 + β1 Z)


πz+1 π
= ln (
1−πz+1
) − ln (1−πz )
z

(i.e. conditional log-odds for mortality given exposure to z=

ln(dose) )
πz+1
(1−πz+1 )
= ln ( πz )(i.e. ln of a ratio of odds)
1−πz
π π
Then, ( z+1 ) = eβ1 ( z ) ←
1−π z+1 1−π z

multiplicative increas in the log −

odds when Z is increased by 1

Parameter estimation:

Maximum likelihood estimation:

First, construct the likelihood function. Let ni denote the

number of insects exposed to zi = ln(dosei ).

Assume each of these ni insects corresponds to an independent

trial with

exp((β0 + β1 Zi ))
πi = , i = 1,2, … , r
1 + exp(β0 + β1 Zi )

are the conditional probabilities of mortality.

Let X i =observed # of dead(成功) insects after espouse to zi =

ln(dosei ).

Then X i ~Bin(ni , πi )
n
(i.e. Pr(X i = xi ))=Cxii ∙ πi xi ∙ (1 − πi )ni −xi , xi = 0, ⋯ , ni )

Assume that the results are ”independent” among those given


doses. Then the joint likelihood function is
8
n
∏{Cxii ∙ πi xi ∙ (1 − πi )ni −xi }
i=1
exp((β0 +β1 Zi ))
where πi = .
1+exp(β0 +β1 Zi )

β0 ′
Let β = ( ) and x = (x1,⋯, x8 ) .
β1

The log-likelihood is

l (β, X) = ∑8i=1{ln(ni !) − ln(xi !) − ln(ni − xi ) !}+


8 8

∑ xi ln(πi ) + ∑(ni − xi ) ∙ ln (1 − πi )
i=1 i=1
n!
=∑8i=1 {ln ( i ) − ln(ni − xi ) ! + β0 ∑8i=1 xi }+
x! i

β1 ∑8i=1 xi zi − ∑8i=1 ni ln (1 + eβ0 +β1 Zi )

Likelihood equations:
8
∂l(β, X)
= ∑(xi − ni πi ) = 0
∂β0
i=1
8
∂l(β, X)
= ∑(xi − ni πi )zi = 0
∂β1
i=1

Matrix representation of the likelihood equations


1 z1 x1 n1 n1 π1
1 z x2 n2 n2 π2
T = ( 2 ), x = ( ⋮ ), n = ( ⋮ ), M = ( ⋮ ), m =

1 z8 x8 n8 n8 π8

n1 π
̂1
n π̂
( 2 2)

n8 π
̂8
where π1 , π2 , ⋯ , π8 are the real parameter values and π
̂1 , π
̂2 , ⋯ , π
̂8
are the estimations computed from the likelihood equations.

Then the likelihood equations are:

∂l
0 ∂β0
0=( )= = T′(x − m) = 0
0 ∂l
(∂β1 )

Matrix of second partial derivatives of the log-likelihood

function (multiplied by -1)

∂2 l ∂2 l
− −
∂β0 2 ∂β0 ∂β1
H= = (T′VT)
∂2 l ∂2 l
− −
( ∂β1 ∂β0 ∂β1 2 )

where
n1 π1 (1 − π1 ) ⋯ 0
V=( ⋮ ⋱ ⋮ )
0 ⋯ n8 π8 (1 − π8 )

Var(x1 ) ⋯ 0
=( ⋮ ⋱ ⋮ )
0 ⋯ Var(x8 )

Odds ratio:
π z+1 )
(1−π
z+1
πz =ratio of conditional odds for mortality at levels Z+1
(1−π )
z

and Z.

Newton-Raphson(fisher scoring) algorithm:

β0
To evaluate the m.l.e.’s for β = ( ), we use
β1

β̂(new) = β̂(old) + αH ̂ = β̂(old) + α(T ′ VT)−1 T ′ (x − m)


̂ −1 Q

(1) Take α=1 unless α=1 is too big to give improvement in

log-likelihood,

(2) Evaluate the matrix V and m on right side with β̂(odd) .

For starting values you might use

P
̂ (old)
β = (ln ( 1−P
))
0
total # of success ∑x
where P = = ∑ i.
total # of trials ni

Continue iterations until β̂(new) is close enough to β̂(odd) .

Let β̂ denote the final numerical vector. Since β̂ is an m.l.e.,

we have the following result for large samples,

β̂~N (β, (T ′ VT)−1 )

which the inverse of the Fisher information matrix, or the

covariance matrix for β̂ is (T ′ VT)−1 when V is evaluated at

β̂.

For above data with Z=ln(dose), the m.l.e.’s are


̂
̂β = (β0 ) = (−60.717) , and m. l. e, for
β̂1 14.883
exp ((β̂0 + β̂1 Zi ))
πi = Pr (mortality|Zi ) is π
̂i =
1 + exp(β̂0 + β̂1 Zi )
for i = 1,2, … ,8
0.0586
Thus we have π
̂ = ( ⋮ ) , and
0.9791
8 8 −1

∑ ni π
̂i (1 − π
̂i ) ⋯ ∑ ni zi π
̂i (1 − π
̂i )
i=1 i=1
̂ (β̂) = (T ′ VT)−1 =
V ⋮ ⋱ ⋮
8 8

∑ ni zi π ̂i ) ⋯ ∑ ni zi 2 π
̂i (1 − π ̂i (1 − π
̂i )
( i=1 i=1 )
26.84 −6.55
=( )
−6.55 1.6

To test H0 : β0 = 0 v. s. H1 : β0 ≠ 0 , we use

β̂0 − 0 2 −60.717 2
2
χ =[ ] =[ ] = 137.36~χ2 (1)
S0 5.18
The intercept is smaller than 0 which implies that the probability

of mortality is less than 0.5 when ln(dose)=0 or dose=1.

To test H0 : β1 = 0 v. s. H1 : β1 ≠ 0 , we have

β̂1 − 0 2 14.883 2
2
χ =[ ] =[ ] = 138.49~χ2 (1)
S1 1.265
β1 is positive which implies that the mortality(死亡率) rate

increases as Z=ln(dose) increases.

Note that β̂1 = 14.883 implies that the log-odds for mortality
are about 15 times higher when Z=ln(dose) is increased to Z+1.

When the dose is double Z=ln(dose) increases to

ln(2*dose)=ln2+ ln(dose). Then

π
̂z+ln2 π
̂z
ln ( ) − ln ( ) = β̂1 × ln2
1 − (π
̂z+ln2 ) 1−π̂z

and the odds for mortality increases by a factor of

π
̂z+ln2
1 − (π
̂z+ln2 ) ̂
= exp(β̂1 ln2) = 2β1 = 215
π
̂z
1−π ̂z

An approximation 95% C.I. for


β1 is β̂1 ± (1.96) ∙ S1 or 14.883 ± (1.96)(1.265) ⇒ [12.40,17.36]
An approximation 95% C.I. for 2β1 is:

[β̂1 × ln2] ± 1.96[S1 × ln2] ⟹ [8.60,12.03]

(This is C. I. for (ln2)β1 )

The 95% C.I. for 2β1 = exp (β1 × ln2)

[e8.6 , e12.03 ] ⟹ [5,418,167,700]

So, doubling the dose increases the odds of mortality by a factor

between 5000 and 168,000.

You might also like