0% found this document useful (0 votes)
9K views8 pages

Logit

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

The Mathematics Behind Logistic Regression:

An Illustration of the Newton-Raphson Method


This post is dierent from the others. It shows what is going on behind the
scenes in logistic regression rather than being an example of using Stata for
common tasks in archival research. The example used is a logisitic regression
including a constant and one regressor. After the necessary mathematical
derivations, the Challenger data from Casella & Berger Example 12.3.1. is
used to bridge the gap from theory to practice via manual implementation
of the Newton-Raphson Method to obtain parameter estimates.
The following are derived:
1. the log likelihood function for logistic regression.
2. the rst and second derivatives with respect to the constant (alpha).
3. the rst and second derivatives with respect to the parameter of interest
(beta).
4. the Hessian.
5. manual (i.e., by hand) implementation of the Newton-Raphson Method
which results in equivalent estimates of and obtained by running
the logit command in Stata.
Plenty of detail is shown between each of the steps. The sources used are:
1. Casella and Bergers Statistical Inference, Section 12.3, 2nd Edition
2. Gujarati and Porters Basic Econometrics, Appendix 15.A, 5th Edition
3. Leslie Strattons notes on MLE, NLLS, & Asymptotic Hypothesis Test-
ing from Econ 642: Panel and Nonlinear Methods in Econometrics at
Virginia Commonwealth University.
4. A PDF available from Erik Barry Erhardts website, http://statacumen.com/.
Casella pg. 593 shows that the logistic likelihood function is
L(, |y) =
n

i=1
(x
i
)
y
i
(1 (x
i
))
1y
i
(1)
1
Step 1: Take the log of the likelihood function
lnL(, |y) =
n

i=1
ln[(x
i
)
y
i
(1 (x
i
))
1y
i
]
=
n

i=1
ln[(x
i
)
y
i
+ (1 (x
i
))
1y
i
]
=
n

i=1
[y
i
ln((x
i
)) + (1 y
i
)ln(1 (x
i
))]
Step 2: Distribute (1 y
i
) and rearrange
=
n

i=1
[y
i
ln((x
i
)) + ln(1 (x
i
)) y
i
ln(1 (x
i
))]
=
n

i=1
[y
i
ln((x
i
)) y
i
ln(1 (x
i
)) + ln(1 (x
i
))]
Step 3: Factor out y
i
=
n

i=1
[y
i
(ln((x
i
)) ln(1 (x
i
))) + ln(1 (x
i
))]
Step 4: Use Log Properties and Substitute
=
n

i=1
[y
i
ln

(x
i
)
1 (x
i
)

+ ln(1 (x
i
)]
Where, per Casella pgs. 591 and 592, respectively
ln

(x
i
)
1 (x
i
)

= + (x
i
)
(x
i
) =
e
+(x
i
)
1+e
+(x
i
)
2
Step 5: Multiply the 1 by
1+e
+(x
i
)
1+e
+(x
i
)
to pull over common denominator
=
n

i=1
[y
i
( + (x
i
) + ln

1 + e
+(x
i
)
1 + e
+(x
i
)

e
+(x
i
)
1 + e
+(x
i

]
=
n

i=1
[y
i
( + (x
i
) + ln

1
1 + e
+(x
i
)

]
Step 6: Use log properties
=
n

i=1
[y
i
( + (x
i
) + ln(1) ln(1 + e
+(x
i
)]
=
n

i=1
[y
i
( + (x
i
) ln(1 + e
+(x
i
)
)]
Where the above log likelihood function agrees to Gujarati page 590.
Step 7: Take rst derivatives with respect to and

u
lnu =
1
u
du

x
e
x
= e
x
dx
lnL

= y
i
(1) + 0
(1)e
+(x
i
)
1 + e
+(x
i
)
= y
i

e
+(x
i
)
1 + e
+(x
i
)
lnL

= 0 + y
i
(1)(x
i
)
(x
i
)e
+(x
i
)
1 + e
+(x
i
)
= y
i
(x
i
)
(x
i
)e
+(x
i
)
1 + e
+(x
i
)
Step 8: Take second derivatives with respect to and . For the second
derivative with respect to , use the quotient rule. For the second derivative
3
with respect to , use both the product and quotient rule.
u = e
+(x
i
)
v = 1 + e
+(x
i
)
u

= (1)e
+(x
i
)
= e
+(x
i
)
v

= (1)e
+(x
i
)
= e
+(x
i
)

2
lnL

2
= 0
vu

uv

v
2
=
(1 + e
+(x
i
)
)(e
+(x
i
)
) (e
+(x
i
)
)(e
+(x
i
)
)
(1 + e
+(x
i
)
)
2
=
e
+(x
i
)
(1 + e
+(x
i
)
)
2
For the second derivative with respect to , rst use the product rule for the
derivative of the numerator term, (x
i
)e
+(x
i
)
, with respect to .
u = x
i
v = e
+(x
i
)
u

= 0
v

= (1)x
i
e
+(x
i
)
uv

+ vu

= x
i
x
i
e
+(x
i
)
+ e
+(x
i
)
(0)
= x
2
i
e
+(x
i
)
4
The above will be used as u for application of the quotient rule on
(x
i
)e
+(x
i
)
1+e
+(x
i
)
u = x
i
e
+(x
i
)
v = 1 + e
+(x
i
)
u

= x
2
i
e
+(x
i
)
v

= 0 + x
i
e
+(x
i
)

2
lnL

2
= 0
vu

uv

v
2
=
(1 + e
+(x
i
)
)(x
2
i
e
+(x
i
)
) (x
i
e
+(x
i
)
)(x
i
e
+(x
i
)
)
(1 + e
+(x
i
)
)
2
=
x
2
i
e
+(x
i
)
(1 + e
+(x
i
)
)
2
Step 9: Calculate the last term needed for the Hessian using the quotient
rule.

2
lnL

[y
i

e
+(x
i
)
1 + e
+(x
i
)
]
=
x
i
e
+(x
i
)
(1 + e
+(x
i
)
)
2
Step 10: Collect the second derivatives for the Hessian.

2
f() =

e
+(x
i
)
(
1+e
+(x
i
)
)
2

x
i
e
+(x
i
)
(
1+e
+(x
i
)
)
2

x
i
e
+(x
i
)
(
1+e
+(x
i
)
)
2

x
2
i
e
+(x
i
)
(
1+e
+(x
i
)
)
2

The Newton-Raphson Method iterates via the following:

m+1
=

m
[
2
f()|

m
]
1
f()|

m
At this point, an Excel spreadsheet is necessary that has the following in
each of the below columns:
5
(A) values for the dependent variable from Casella and Bergers Challenger
Data in Example 12.3.1
(B) values for the independent variable from Casella and Bergers Chal-
lenger Data in Example 12.3.1
(C) the guess for
(D) the guess for
(E) column intentionally left blank
(F) element 1,1 from the Hessian
(G) element 1,2 from the Hessian
(H) column intentionally left blank, but note that the Hessian is symmetric
(I) element 2,2 from the Hessian
(J) column intentionally left blank
(K) the rst derivative with respect to evaluated at the guess for each
observation
(L) column intentionally left blank
(M) the rst derivative with respect to evaluated at the guess for each
observation
Using the rst observation as an example, the formulas in each relevant
column of the spreadsheet are:
Col F = (EXP(C2 + (D2 B2))/((1 + EXP(C2 + D2 B2)))
2
)
Col G = B2 (EXP(C2 + (D2 B2))/((1 + EXP(C2 + D2 B2)))
2
)
Col H = Symmetric
Col I = (B2
2
) (EXP(C2 + (D2 B2))/((1 + EXP(C2 + D2 B2)))
2
)
Col K = A2 (EXP(C2 + (D2 B2))/((1 + EXP(C2 + D2 B2))))
Col M = (A2 B2) (B2 (EXP(C2 + (D2 B2))/((1 + EXP(C2 + D2 B2)))))
6
The results from columns F through I, K, and M are summed up over the
23 observations and the sums are used in the Stata code below to manually
implement Newton-Raphson. The initial guesses used are = 20 and =
.3
Here is the manual implementation of the Newton-Raphson Method using
Stata:
clear
******* Casella Data *************
/* first guess */
matrix casella_c= (-3.13, -214.77\ -214.77, -14786.82)
matrix invcasella_c= inv(casella_c)
matrix list invcasella_c
matrix casella_d= (-1.030, -65.869)
matrix transcasella_d= casella_d
matrix list casella_d
matrix list transcasella_d
matrix casella_guess= (20, -.30)
matrix transcasella_guess= casella_guess
matrix logit_coeff= transcasella_guess-(invcasella_c*transcasella_d)
matrix list logit_coeff
matrix translogit_coeff= logit_coeff
matrix list translogit_coeff
/* Second guess */
matrix casella_a_second= (-3.428, -232.656\ -232.656, -15893.986)
matrix invcasella_a_second= inv(casella_a_second)
matrix list invcasella_a_second
matrix casella_d_two= (.155047958, 7.852477079)
matrix transcasella_d_two= casella_d_two
matrix list casella_d_two
7
matrix list transcasella_d_two
matrix casella_guess_two= (13.07929, -.2039352)
matrix transcasella_guess_two= casella_guess_two
matrix logit_coeff_two= transcasella_guess_two- ///
(invcasella_a_second*transcasella_d_two)
matrix list logit_coeff_two
/* Third guess */
matrix casella_1= (-3.27827, -222.83347\ -222.83347, -15223.56817)
matrix invcasella_1= inv(casella_1)
matrix casella_2= (.007626928, .303416161)
matrix transcasella_2= casella_2
matrix transcasella_3= casella_2
matrix guess_3= (14.87091, -.2296669)
matrix transguess_3= guess_3
matrix logit_coeff_3= transguess_3- (invcasella_1*transcasella_2)
matrix list logit_coeff_3
/* Fourth guess */
matrix casella_4= (-3.26061, -221.65395\ -221.65395, -15153.07005)
matrix invcasella_4= inv(casella_4)
matrix casella_5= (.000058, .029397859)
matrix transcasella_5= casella_5
matrix guess_6= (15.0632, -.2324616)
matrix transguess_6= guess_6
matrix logit_coeff_4= transguess_6 -(invcasella_4*transcasella_5)
matrix list logit_coeff_4
import excel using C:\econometrics\logit\casella_data.xlsx, firstrow
logit y x
8

You might also like