0% found this document useful (0 votes)
118 views63 pages

1 - Binary Dependent Variable Models

This document provides an overview of the linear probability model for binary dependent variables. It discusses how the linear regression model can be used when the dependent variable is binary, with probabilities of success being a linear function of the independent variables. The coefficients are interpreted as changes in the probability of success. An example using US labor data illustrates how to estimate probabilities and interpret the results. Key issues with the linear probability model are noted but not described.

Uploaded by

Daniel Patraboy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views63 pages

1 - Binary Dependent Variable Models

This document provides an overview of the linear probability model for binary dependent variables. It discusses how the linear regression model can be used when the dependent variable is binary, with probabilities of success being a linear function of the independent variables. The coefficients are interpreted as changes in the probability of success. An example using US labor data illustrates how to estimate probabilities and interpret the results. Key issues with the linear probability model are noted but not described.

Uploaded by

Daniel Patraboy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

NOVA

IMS
Information

NOVA
Management
School

Binary Dependent
IMS Variable Models.
Information
Management
School Econometrics II

Bruno Damásio
[email protected]
 @bmpdamasio
Carolina Vasconcelos
[email protected]
 @vasconceloscm
2022/2023
Nova Information Management School
NOVA University of Lisbon
Instituto Superior de Estatística e Gestão da Informação
Universidae Nova de Lisboa
NOVA
IMS Table of contents i
Information
Management
School

1. Linear Probability Model


Context
Interpreting the estimates
Estimating probabilities
Main Issues

2. Logit and Probit Models


Framework
Maximum Likelihood Estimation
Interpretation
Statistical Inference
Quality of previsions
Comparing models

1
NOVA
IMS
Information
Management
School

Linear Probability Model


NOVA
IMS Context i
Information
Management
School

• Previously, we have studied the use of binary independent variables


in the multiple linear regression model;
• In all of the models up until now, the dependent variable y has had
quantative meaning (for example, y is a dollar amount, a test score,
etc.);
• In the simplest case, and one that often arises in practice, the event
we would like to explain is a binary outcome. In other words, our
dependent variable, y, takes on only two values: zero and one.

2
NOVA
IMS Context ii
Information
Management
School

Example
The dependent variable can indicate:

• whether an adult has a high school education;


• whether a college student used illegal drugs during a given school
year;
• whether a firm was taken over by another firm during a given year.

3
NOVA
IMS Linear Probability Model i
Information
Management
School

• What does it mean to write down a multiple regression model, such


as
y = β0 + β1 x1 + · · · + βk xk + u

when y is a binary variable?


• Because y can take on only two values, βj cannot be interpreted as
the change in y given a one-unit increase in xj , holding other factors
fixed.

4
NOVA
IMS Linear Probability Model ii
Information
Management
School

• The key point is that when y is a binary variable taking on the


values zero and one, it is always true that P(y = 1|x) = E(y|x): the
probability of “success” is the same as the expected value of y.
Thus, we have the important equation

P(y = 1|x) = β0 + β1 x1 + · · · + βk xk (1)

which says that the probability of success, say, p(x) = P(y = 1|x), is
a linear function of xj .

5
NOVA
IMS Linear Probability Model i
Information
Management
School

• The multiple linear regression model with a binary dependent


variable is called the linear probability model (LPM), because the
response probability (P(y = 1|x)) is linear in the parameters βj .
• In the LPM, βj measures the change in probability of success when
xj changes, holding other factors fixed:

∆P(y = 1|x) = βj ∆xj

• The mechanics of OLS are the same as before.

6
NOVA
IMS Interpreting
Information
Management
School

Labour force participation


Suppose we aim to study the variables that influence labor force
participation by a married woman during 1975: inlf = 1 if the woman
reports working for a wage outside the home at some point during the
year, and zero otherwise.

7
NOVA
IMS Interpreting
Information
Management
School

library(wooldridge)
summary(lm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, data = mroz))

##
## Call:
## lm(formula = inlf ~ nwifeinc + educ + exper + expersq + age +
## kidslt6 + kidsge6, data = mroz)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.93432 -0.37526 0.08833 0.34404 0.99417
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5855192 0.1541780 3.798 0.000158 ***
## nwifeinc -0.0034052 0.0014485 -2.351 0.018991 *
## educ 0.0379953 0.0073760 5.151 3.32e-07 ***
## exper 0.0394924 0.0056727 6.962 7.38e-12 ***
## expersq -0.0005963 0.0001848 -3.227 0.001306 **
## age -0.0160908 0.0024847 -6.476 1.71e-10 ***
## kidslt6 -0.2618105 0.0335058 -7.814 1.89e-14 ***
## kidsge6 0.0130122 0.0131960 0.986 0.324415
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4271 on 745 degrees of freedom
## Multiple R-squared: 0.2642,^^IAdjusted R-squared: 0.2573
## F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16

8
NOVA
IMS Interpreting
Information
Management
School

• To interpret the estimates, we must remember that a change in the


independent variable changes the probability that inlf = 1.
• The coefficient on educ means that, everything else fixed, another
year of education increases the probability of labor force
participation by 0.038.
• Taking the equation literally, if we increase 10 more years of
education, the probability of being in the labor force increases by
0.38, which is a pretty large increase.

9
NOVA
IMS Interpreting
Information
Management
School

10
NOVA
IMS Interpreting
Information
Management
School

• Considering the variable kidslt6, the effect of going from zero


children to one young child reduces the probability of working by
.262.
• This is also the predicted drop if the woman goes from having one
young child to two. Does it make sense?
• It seems more realistic that the first small child would reduce the
probability by a large amount, but subsequent children would have a
smaller marginal effect.

11
NOVA
IMS Estimating probabilities
Information
Management
School

• What is the probability of a woman with 30 years old, 12 years of


education, 1 year of experience, 15.51 thousands of dollars of
household income (without her contribution), with 1 kid with less
than 6 years old and 2 kids older than 6 years old to participate in
the labor force?
• And a woman with 1 years old, 13 years of education, 3 years of
experience, 73.6 thousands of dollars of household income (without
her contribution), with 3 kids with less than 6 years old and 1 kid
older than 6 years old?

12
NOVA
IMS Estimating probabilities
Information
Management
School

reg <- lm(inlf ~ nwifeinc + educ + exper +


expersq + age + kidslt6 +
kidsge6, data = mroz)

sum(coef(reg)*c(1, 15.51, 12, 1, 1, 30, 1, 2))

## [1] 0.3090346

sum(coef(reg)*c(1, 73.6, 13, 3, 9, 31, 3, 1))

## [1] -0.3292861

13
NOVA
IMS Main Issues
Information
Management
School

• If we plug certain combinations of values for the independent


variables, we can get predictions either less than zero or greater than
one.
• A probability cannot be linearly related to the independent variables
for all their possible values.

14
NOVA
IMS Main Issues
Information
Management
School

Due to the binary nature of y, the linear probability model does violate
one of the Gauss-Markov assumptions. When y is a binary variable, its
variance, conditional on x, is

Var(y|x) = p(x)[1 − p(x)],

where p(x) is shorthand for the probability of success.

15
NOVA
IMS
Information
Management
School

Logit and Probit Models


NOVA
IMS Limited Dependent Variable Models i
Information
Management
School

• Previously, we studied the linear probability model, which is simply


an application of the multiple regression model to a binary
dependent variable. A binary dependent variable is an example of a
limited dependent variable (LVD);
• A LVD is broadly defined as a dependent variable whose range of
values is restricted;
• Most economic variables we would like to explain are limited in some
way, often because they must be positive. For example, hourly wage,
housing price, and nominal interest rates must be greater than zero;

16
NOVA
IMS Limited Dependent Variable Models ii
Information
Management
School

• Not all such variables need special treatment. If a strictly positive


variable takes on many different values, a special econometric model
is rarely necessary. When y is discrete and takes on a small number
of values, it makes no sense to treat it as an approximately
continuous variable;
• However, as we saw, the linear probability model has certain
drawbacks. In the following sections, we will present alternative
models that overcome the shortcomings of the LPM.

17
NOVA
IMS Logit and Probit Models
Information
Management
School

In the LPM, we assume that the response probability is linear in a set of


parameters, βj . To avoid the LPM limitations, we will consider a class of
binary response models of the form:

P(y = 1|x) = G(β0 + β1 x1 + · · · + βk xk ) = G(x′i β), (2)

where G is a function taking on values strictly between zero and one:


0 < G(z) < 1, for all real numbers z. This ensures that the estimated
response probabilities are strictly between zero and one.

18
NOVA
IMS Logit Models
Information
Management
School

In the logit model, G is the logistic function:

G(z) = exp(z)/[1 + exp(z)] = Λ(z) (3)

which is between zero and one for all real numbers z. This is the
cumulative distribution function (cdf) for a standard logistic random
variable.

19
NOVA
IMS Probit Models
Information
Management
School

In the probit model, G is the standard normal cdf, which is expressed as


an integral: Z z
G(z) = Φ(z) = ϕ(v)dv (4)
−∞

where ϕ(z) is the standard normal density

ϕ(z) = (2π)−1/2 exp(−z2 /2) (5)

20
NOVA
IMS Logit and Probit Models
Information
Management
School

• Logit and probit models can be derived from an underlying latent


variable model. Let y∗ be and unobserved, or latent, variable, and
suppose that
y∗ = x′i β + e, y = 1[y∗ > 0] (6)
• The notation 1[·] defines the binary outcome, it is called an indicator
function, which takes on the value one if the event in brackets is
true, and zero otherwise.
• We assume that e is independent of x and that e either has the
standard logistic distribution or the standard normal distribution.
• In each case, e is symmetrically distributed about zero, which means
that 1 − G(−z) = G(z) for all real numbers z.

21
NOVA
IMS Logit and Probit Models
Information
Management
School

1.00

0.75

y 0.50 Logit model


Probit model

0.25

0.00

−1.0 −0.5 0.0 0.5 1.0 22


x1
NOVA
IMS Logit and Probit Models
Information
Management
School

We can derive that response probability for y as:

P(y = 1|x) = P(y∗ > 0|x) = P[e > −(x′i β)|x]


= 1 − G[−(x′i β)] = G(x′i β) (7)

23
NOVA
IMS MLE estimator
Information
Management
School

• Up until now, we have had little need for MLE, although we did note
that, under the classical linear model assumptions, the OLS
estimator is the maximum likelihood estimator (conditional on the
explanatory variables);
• Because of the nonlinear nature of E[y|x], OLS and WLS are not
applicable;
• For estimating limited dependent variable models, maximum
likelihood methods are indispensable.

24
NOVA
IMS MLE estimator
Information
Management
School

• Assume that we have a random sample of size n. To obtain the


maximum likelihood estimator, conditional on the explanatory
variables, we need the density of yi given xi . We can write this as:
 y  1−y
f(y|xi ; β) = G(x′i β) 1 − G(x′i β) , y = 0, 1, (8)

where, for simplicity, we absorb the intercept into the vector xi ;


• We can easily see that when y = 1, we get G(x′i β) and when y = 0,
we get 1 − G(x′i β);
• The log-likelihood function for observation i is a function of the
parameters and the data (xi , yi ) and is obtained by taking the log of:
   
ℓi (β) = yi log G(x′i β) + (1 − yi )log 1 − G(x′i β) (9)

25
NOVA
IMS MLE estimator
Information
Management
School

• The log-likelihood for a sample size of n is obtained by summing


Pn
equation (9) across all observations: L(β) = i=1 ℓi (β);
• The MLE of β, denoted by β̂, maximizes this log-likelihood;
• If G(·) is the standard logit cdf, then β̂ is the logit estimator; if G(·)
is the standard normal cdf, then β̂ is the probit estimator.

26
NOVA
IMS MLE estimator - Example
Information
Management
School

#Simulate probit model


set.seed(123)
n <- 1000
beta0 <- 0
beta1 <- 1.5
x1 <- runif(n=n, min=-1, max=1)
u <- rnorm(n)
y <- round(pnorm(beta0 + beta1*x1 + u),0)
glm(y ~x1, family=binomial(link='probit'))

##
## Call: glm(formula = y ~ x1, family = binomial(link = "probit"))
##
## Coefficients:
## (Intercept) x1
## -0.0316 1.4416
##
## Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
## Null Deviance:^^I 1386
## Residual Deviance: 1048 ^^IAIC: 1052
27
NOVA
IMS MLE estimator - Example
Information
Management
School

28
NOVA
IMS MLE estimator
Information
Management
School

• Because of the nonlinear nature of the maximization problem, we


cannot write formulas for the logit or probit maximum likelihood
estimates;
• In addition to raising computational issues, this makes the statistical
theory for logit and probit much more difficult than OLS;
• Nevertheless, the general theory of MLE for random samples implies
that, under very general conditions, the MLE is consistent,
asymptotically normal, and asymptotically efficient.

29
NOVA
IMS Logit and Probit Models
Information
Management
School

• In most applications for binary response models, the primary goal is


to explain the effects of the xj on the response probability
P(y = 1|x).
• For logit and probit, the direction of the effect of xj on
E(y∗ |x) = x′i β and on E(y|x) = P(y = 1|x) = G(x′i β) is always the
same.
• But the latent variable rarely has a well-defined unit of measurement,
thus the magnitudes of each βj are not, by themselves, especially
useful (in contrast to the linear probability model).

30
NOVA
IMS Logit and Probit Models
Information
Management
School

• To find the partial effect of roughly continuous variables on the


response probability, we must rely on calculus. If xj is a roughly
continuous variable, its partial effect on p(x) = P(y = 1|x) is
obtained from the partial derivative:

∂p(x)
= g(x′i β)βj , (10)
∂xj

where g(z) ≡ ∂G
∂z (z).
• Because G is the cdf of a continuous random variable, g is a
probability density function (pdf).

31
NOVA
IMS Logit and Probit Models
Information
Management
School

• In the logit and probit cases, G(·) is a strictly increasing cdf, and so
g(z) > 0 for all z.
• Therefore, the partial effect of xj on p(x) depends on x through the
positive quantity g(x′i β), which means that the partial effect always
has the same sign as βj .

32
NOVA
IMS Logit and Probit Models
Information
Management
School

• If, say, x1 is a binary explanatory variable, then the partial effect


from changing x1 from zero to one, holding all other variables fixed,
is simply:

G(β0 + β1 + β2 x2 + · · · + βk xk ) − G(β0 + β2 x2 + · · · + βk xk ) (11)

33
NOVA
IMS Partial effects
Information
Management
School

• Often, we want to estimate the effects of the xj on the response


probabilities, P(y = 1|x). If xj is (roughly continuous), then
h i
b = 1|x) ≈ g(x′ β̂)β̂j ∆xj
∆P(y (12)
i

for “small” changes in xj .


• So, for ∆xj = 1, the change in the estimated success probability is
roughly g(x′i β̂)β̂j .
• Compared with the linear probability model, the cost of using probit
and logit models is that the partial effects in equation (12) are
harder to summarize because the scale factor,g(x′i β̂), depends on x
(that is, on all of the explanatory variables).

34
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

• One approach is replace each explanatory variable with its sample


average:
g(βˆ0 + βˆ1 x̄1 + · · · + βˆk x̄k ) (13)
where g(·) is the standard normal density in the probit case and
g(z) = exp(z)/[1 + exp(z)]2 in the logit case.
• The idea of 13 is that, when it is multiplied by β̂j , we obtain the
partial effect of xj for the “average” person in the sample. This is
called partial effect at the average (PEA).

35
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

library(mfx)
logitmfx(inlf ~ nwifeinc + educ + exper +
I(exper^2) + age + kidslt6 + kidsge6, data = mroz)

## Call:
## logitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00519005 0.00204820 -2.5340 0.011278 *
## educ 0.05377731 0.01056074 5.0922 3.539e-07 ***
## exper 0.05005693 0.00782462 6.3974 1.581e-10 ***
## I(exper^2) -0.00076692 0.00024768 -3.0965 0.001959 **
## age -0.02140302 0.00353973 -6.0465 1.480e-09 ***
## kidslt6 -0.35094982 0.04963897 -7.0700 1.549e-12 ***
## kidsge6 0.01461621 0.01818832 0.8036 0.421625
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
36
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

probitmfx(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age + kidslt6 + kidsge6, data = mroz)

## Call:
## probitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00469619 0.00192965 -2.4337 0.014945 *
## educ 0.05112843 0.00992310 5.1525 2.571e-07 ***
## exper 0.04817690 0.00734505 6.5591 5.413e-11 ***
## I(exper^2) -0.00073705 0.00023464 -3.1412 0.001683 **
## age -0.02064309 0.00330485 -6.2463 4.203e-10 ***
## kidslt6 -0.33914996 0.04634765 -7.3175 2.526e-13 ***
## kidsge6 0.01406306 0.01719895 0.8177 0.413546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

37
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

The partial effect at the average (PEA) presents two problems:

1. If some explanatory variables are discrete, the averages of them


represent no one in the sample.
2. If a continuous explanatory variable appears as a nonlinear function,
i.e., natural log or in a quadratic, it is not clear whether we want to
average the nonlinear function (log(xj )) or plug the average into the
nonlinear function (log(x̄j )).

An alternative is to use the average partial effect (APE).

38
NOVA
IMS Average partial effect (APE)
Information
Management
School

• The average partial effect (APE) (or, average marginal effect


(AME)) for a continuous explanatory variable xj is:
" #
X
n
−1 ′
n g(xi β̂) β̂j (14)
i=1

• It considers as scale factor:


X
n
n−1 g(x′i β̂) (15)
i=1

which results from averaging the individual partial effects across the
sample.

39
NOVA
IMS Average partial effect (APE)
Information
Management
School

logitmfx(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age + kidslt6 + kidsge6, data = mroz,
atmean=FALSE)

## Call:
## logitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00381181 0.00153898 -2.4769 0.013255 *
## educ 0.03949652 0.00846811 4.6641 3.099e-06 ***
## exper 0.03676411 0.00655577 5.6079 2.048e-08 ***
## I(exper^2) -0.00056326 0.00018795 -2.9968 0.002728 **
## age -0.01571936 0.00293269 -5.3600 8.320e-08 ***
## kidslt6 -0.25775366 0.04263493 -6.0456 1.489e-09 ***
## kidsge6 0.01073482 0.01339130 0.8016 0.422769
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
40
NOVA
IMS Average partial effect (APE)
Information
Management
School

probitmfx(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age + kidslt6 + kidsge6, data = mroz,
atmean=FALSE)

## Call:
## probitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00361618 0.00146972 -2.4604 0.013876 *
## educ 0.03937009 0.00726571 5.4186 6.006e-08 ***
## exper 0.03709734 0.00516823 7.1780 7.076e-13 ***
## I(exper^2) -0.00056755 0.00017708 -3.2050 0.001351 **
## age -0.01589566 0.00235868 -6.7392 1.592e-11 ***
## kidslt6 -0.26115346 0.03190239 -8.1860 2.700e-16 ***
## kidsge6 0.01082889 0.01322413 0.8189 0.412859
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
41
NOVA
IMS Effects of discrete variable
Information
Management
School

• Because both of the scale factors described depend on calculus


approximation, neither makes much sense for discrete explanatory
variables.
• Instead we directly estimate the change in probability. For a change
in xk from ck to ck + 1, the discrete analog of the partial effect at
the average (PEA) is
h i
G β̂0 + β̂1 x̄1 + · · · + β̂k−1 x̄k−1 + β̂k (ck + 1)
h i
− G β̂0 + β̂1 x̄1 + · · · + β̂k−1 x̄k−1 + β̂k ck (16)

• The discrete analog of the average partial effect (APE) is


X
n h i
n−1 G β̂0 + β̂1 xi1 + · · · + β̂k−1 xik−1 + β̂k (ck + 1)
i=1
h i
− G β̂0 + β̂1 xi1 + · · · + β̂ik−1 xk−1 + β̂k ck (17)

42
NOVA
IMS Effects of discrete variable
Information
Management
School

#Effects of having a kid with less than 6 years old


#Fixing x at approximately the sample averages and assuming kidsg6 = 1
x1 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 1, 1)
x0 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 0, 1)

mod1 <- glm(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='logit'), data = mroz)

p1 = exp(sum(coef(mod1)*x1))/(1+exp(sum(coef(mod1)*x1)))
p0 = exp(sum(coef(mod1)*x0))/(1+exp(sum(coef(mod1)*x0)))
p1-p0

## [1] -0.3444345

43
NOVA
IMS Effects of discrete variable
Information
Management
School

#Effects of having a kid with less than 6 years old


#Fixing x at approximately the sample averages and assuming kidsg6 = 1
x1 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 1, 1)
x0 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 0, 1)

mod2 <- glm(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='probit'), data = mroz)

p1 = pnorm(sum(coef(mod2)*x1))
p0 = pnorm(sum(coef(mod2)*x0))
p1-p0

## [1] -0.334577

44
NOVA
IMS Statistical Inference
Information
Management
School

Consider a general set of restrictions to be tested

H0 : c(β) = 0

where:

• β is a vector of parameters in the model


• c(β): d × 1 vector of restrictions
• L(β): log-likelihood for the unrestricted model
• L(β)R : log-likelihood for the restricted model
• β̂MLE and β̂R : unrestricted and restricted MLE, respectively.

45
NOVA
IMS 3 classical test principles
Information
Management
School

46
NOVA
IMS 3 classical test principles
Information
Management
School

• Likelihood ratio test: If the restriction c(β) is valid, then imposing it


should not lead to a large reduction in the log-likelihood function.
Therefore, we base the test on the difference L(β̂MLE ) and L(β̂R ).
• Wald test: If the restriction is valid, then c(β̂MLE ) should be close to
zero, because the MLE is consistent. Therefore, the test is based on
the comparison of c(β̂MLE ) with zero.
• Lagrange Multiplier test: If the restriction is valid, then the
restricted estimator should be near the point that maximizes the
log-likelihood. Therefore, the slope of the log-likelihood should be
near zero at the restricted estimator. The test compares the slope of
the log-likelihood at the point where the function is maximized
subject to the restriction with zero.

47
NOVA
IMS Wald test
Information
Management
School

• Knowing the estimates β̂j and the corresponding standard errors, we


can construct Wald tests and confidence intervals.
• In particular, to test H0 : βj = 0, we form the Wald statistic
Z = β̂j /se(β̂j ) and carry out the test in the usual way, once we have
decided on a one-or two-sided alternative.
• The test statistic is compared to the appropriate value from the
standard normal table.

48
NOVA
IMS LR test
Information
Management
School

• To test multiple restrictions, in particular H0 : βj = βk = 0, where


j ̸= k, we use the likelihood ratio (LR) test;
• The LR test is based on the same concept as the F test in a linear
model;
• The F test measures the increase in the sum of squared residuals
when variables are dropped from the model. The LR test is based on
the difference in the log-likelihood functions for the unrestricted and
restricted models.
• The idea is this: Because the MLE maximizes the log-likelihood
function, dropping variables generally leads to a smaller—or at least
no larger—log-likelihood.
• The question is whether the fall in the log-likelihood is large enough
to conclude that the dropped variables are important.

49
NOVA
IMS LR test
Information
Management
School

• Let us consider the model (unrestricted model):

y = β0 + β1 x1 + · · · + βk xk + u

• To test H0 : βk−q+1 = βk−q+2 = · · · = βk = 0, we have the following


restricted model:

y = β0 + β1 x1 + · · · + βk−q xk−q + u

• For each model (restricted and unrestricted), we will extract the


value of log-likelihood function.

50
NOVA
IMS LR test
Information
Management
School

• Formally, we have

 
Lur a
LR = 2log = 2(Lur − Lr ) ∼ χ2q
Lr

where:
• Lur is the log-likelihood of the unrestricted model, Lr the
log-likelihood of the restricted model and q the number of
restrictions under the null.
• Lur is the likelihood of the unrestricted model, Lr the likelihood of
the restricted model and q the number of restrictions under the null.

51
NOVA
IMS LR test
Information
Management
School

library(lmtest)
full <- glm(inlf ~ nwifeinc + educ + exper +
I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='logit'), data = mroz)

reduced <- glm(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age,
family = binomial(link='logit'), data = mroz)

lrtest(full, reduced)

## Likelihood ratio test


##
## Model 1: inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 +
## kidsge6
## Model 2: inlf ~ nwifeinc + educ + exper + I(exper^2) + age
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -401.77
## 2 6 -432.78 -2 62.023 3.404e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

52
NOVA
IMS LR test
Information
Management
School

• The R output reports the Deviance instead of the log-likelihood.


The LR statistic can also be calculated through the difference in
Deviances.
• The Deviance of a GLM is defined to be

Deviance = 2(LS − LM )

where
• LS denotes the maximized log-likelihood value for the most complex
model possible (a saturated model).
• LM denotes maximized log-likelihood value for a model M of interest.
• It can easily be shown that

LR = 2(Lur − Lr ) = Deviancer − Devianceur

53
NOVA
IMS LR test
Information
Management
School

full <- glm(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='logit'), data = mroz)

reduced <- glm(inlf ~ nwifeinc + educ + exper +


I(exper^2) + age,
family = binomial(link='logit'), data = mroz)

full

##
## Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +
## kidslt6 + kidsge6, family = binomial(link = "logit"), data = mroz)
##
## Coefficients:
## (Intercept) nwifeinc educ exper I(exper^2) age
## 0.425452 -0.021345 0.221170 0.205870 -0.003154 -0.088024
## kidslt6 kidsge6
## -1.443354 0.060112
##
## Degrees of Freedom: 752 Total (i.e. Null); 745 Residual
## Null Deviance:^^I 1030
## Residual Deviance: 803.5 ^^IAIC: 819.5

reduced

##
## Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age,
## family = binomial(link = "logit"), data = mroz)
## 54
## Coefficients:
NOVA
IMS LM test
Information
Management
School

• The LM test is based on the restricted estimation only. Let si (β̂R )


be the P × 1 score of ℓi (β) evaluated at the restricted estimates β̂R .
• That is, we compute the partial derivatives of ℓi (β) with respect to
each of the parameters, bu then we evaluate this vector of partials at
the restricted estimates. The test statistic is
!′ N ! N !
XN X X a

LM = si (β̂R ) si (β̂R )si (β̂R ) si (β̂R ) ∼ χ2(q)
i=1 i=1 i=1

55
NOVA
IMS Classification Tables
Information
Management
School

• A classification table cross-classifies the binary outcome y with a


prediction of whether y = 0 or 1.
• The prediction for observation i is ŷ = 1 when its estimated
probability G(x′i β̂) ≥ 0.5 and ŷ = 0 when G(x′i β̂) < 0.5.
• We can compute a goodness-of-fit measure called the percent
correctly predicted.
• However, if a low (high) proportion of observations have y = 1, the
model fit may never (always) have G(x′i β̂) ≥ 0.5, in which case one
never (always) predicts ŷ = 1.
• So, in these cases, instead of using a threshold value of 0.5, we can
use the sample proportion of 1 outcomes (ȳ) as threshold.

56
NOVA
IMS Classification Tables
Information
Management
School

Table 1: Classification Table

Estimated values
Observed values Total
ŷi = 1 ŷi = 0
yi = 1 n11 n10 n1·
yi = 0 n01 n00 n0·
Total n·1 n·0 n

57
NOVA
IMS Classification Tables
Information
Management
School

• Two useful summaries of predictive power are


n11
Sensitivity = P(ŷ = 1|y = 1) =
n·1
and
n00
Specificity = P(ŷ = 0|y = 0) =
n·0
• The overall proportion of correct classifications is
n00 + n11
p̂ =
n

58
NOVA
IMS Information criteria
Information
Management
School

Two common measures that are baed on the same logic as the adjusted
R-squared for the linear model are:

Akaike information criterion (AIC) = −2ℓ + 2K


Bayes (Schwarz) information criterion (BIC) = −2ℓ + K logn

where K is the number of parameters in the model and n the number of


observations.
We choose a model based on the lowest AIC/BIC value.

59
NOVA
IMS Pseudo R-squared
Information
Management
School

• McFadden (1974) suggested the measure 1 − Lur /L0 , where Lur is


the log-likelihood function of the estimated model and L0 is the
log-likelihood function in the model with only and intercept.
• Why does this measure make sense? Recall that the log-likelihoods
are negative, and so Lur /L0 = 1 − |Lur |/|L0 |. Further, |Lur | ≥ |L0 |.
• If the covariates have no explanatory power, then Lur /L0 = 1, and
the pseudo R-squared is zero.

60

You might also like