0% found this document useful (0 votes)

118 views63 pages

1 - Binary Dependent Variable Models

This document provides an overview of the linear probability model for binary dependent variables. It discusses how the linear regression model can be used when the dependent variable is binary, with probabilities of success being a linear function of the independent variables. The coefficients are interpreted as changes in the probability of success. An example using US labor data illustrates how to estimate probabilities and interpret the results. Key issues with the linear probability model are noted but not described.

Uploaded by

Daniel Patraboy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views63 pages

1 - Binary Dependent Variable Models

Uploaded by

Daniel Patraboy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

NOVA

IMS
Information

NOVA
Management
School

Binary Dependent
IMS Variable Models.
Information
Management
School Econometrics II

Bruno Damásio
 [email protected]
 @bmpdamasio
Carolina Vasconcelos
 [email protected]
 @vasconceloscm
2022/2023
Nova Information Management School
NOVA University of Lisbon
Instituto Superior de Estatística e Gestão da Informação
Universidae Nova de Lisboa
NOVA
IMS Table of contents i
Information
Management
School

1. Linear Probability Model

Context
Interpreting the estimates
Estimating probabilities
Main Issues

2. Logit and Probit Models

Framework
Maximum Likelihood Estimation
Interpretation
Statistical Inference
Quality of previsions
Comparing models

1
NOVA
IMS
Information
Management
School

Linear Probability Model

NOVA
IMS Context i
Information
Management
School

• Previously, we have studied the use of binary independent variables

in the multiple linear regression model;
• In all of the models up until now, the dependent variable y has had
quantative meaning (for example, y is a dollar amount, a test score,
etc.);
• In the simplest case, and one that often arises in practice, the event
we would like to explain is a binary outcome. In other words, our
dependent variable, y, takes on only two values: zero and one.

2
NOVA
IMS Context ii
Information
Management
School

Example
The dependent variable can indicate:

• whether an adult has a high school education;

• whether a college student used illegal drugs during a given school
year;
• whether a firm was taken over by another firm during a given year.

3
NOVA
IMS Linear Probability Model i
Information
Management
School

• What does it mean to write down a multiple regression model, such

as
y = β0 + β1 x1 + · · · + βk xk + u

when y is a binary variable?

• Because y can take on only two values, βj cannot be interpreted as
the change in y given a one-unit increase in xj , holding other factors
fixed.

4
NOVA
IMS Linear Probability Model ii
Information
Management
School

• The key point is that when y is a binary variable taking on the

values zero and one, it is always true that P(y = 1|x) = E(y|x): the
probability of “success” is the same as the expected value of y.
Thus, we have the important equation

P(y = 1|x) = β0 + β1 x1 + · · · + βk xk (1)

which says that the probability of success, say, p(x) = P(y = 1|x), is
a linear function of xj .

5
NOVA
IMS Linear Probability Model i
Information
Management
School

• The multiple linear regression model with a binary dependent

variable is called the linear probability model (LPM), because the
response probability (P(y = 1|x)) is linear in the parameters βj .
• In the LPM, βj measures the change in probability of success when
xj changes, holding other factors fixed:

∆P(y = 1|x) = βj ∆xj

• The mechanics of OLS are the same as before.

6
NOVA
IMS Interpreting
Information
Management
School

Labour force participation

Suppose we aim to study the variables that influence labor force
participation by a married woman during 1975: inlf = 1 if the woman
reports working for a wage outside the home at some point during the
year, and zero otherwise.

7
NOVA
IMS Interpreting
Information
Management
School

library(wooldridge)
summary(lm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, data = mroz))

##
## Call:
## lm(formula = inlf ~ nwifeinc + educ + exper + expersq + age +
## kidslt6 + kidsge6, data = mroz)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.93432 -0.37526 0.08833 0.34404 0.99417
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5855192 0.1541780 3.798 0.000158 ***
## nwifeinc -0.0034052 0.0014485 -2.351 0.018991 *
## educ 0.0379953 0.0073760 5.151 3.32e-07 ***
## exper 0.0394924 0.0056727 6.962 7.38e-12 ***
## expersq -0.0005963 0.0001848 -3.227 0.001306 **
## age -0.0160908 0.0024847 -6.476 1.71e-10 ***
## kidslt6 -0.2618105 0.0335058 -7.814 1.89e-14 ***
## kidsge6 0.0130122 0.0131960 0.986 0.324415
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4271 on 745 degrees of freedom
## Multiple R-squared: 0.2642,^^IAdjusted R-squared: 0.2573
## F-statistic: 38.22 on 7 and 745 DF, p-value: < 2.2e-16

8
NOVA
IMS Interpreting
Information
Management
School

• To interpret the estimates, we must remember that a change in the

independent variable changes the probability that inlf = 1.
• The coeﬀicient on educ means that, everything else fixed, another
year of education increases the probability of labor force
participation by 0.038.
• Taking the equation literally, if we increase 10 more years of
education, the probability of being in the labor force increases by
0.38, which is a pretty large increase.

9
NOVA
IMS Interpreting
Information
Management
School

10
NOVA
IMS Interpreting
Information
Management
School

• Considering the variable kidslt6, the effect of going from zero

children to one young child reduces the probability of working by
.262.
• This is also the predicted drop if the woman goes from having one
young child to two. Does it make sense?
• It seems more realistic that the first small child would reduce the
probability by a large amount, but subsequent children would have a
smaller marginal effect.

11
NOVA
IMS Estimating probabilities
Information
Management
School

• What is the probability of a woman with 30 years old, 12 years of

education, 1 year of experience, 15.51 thousands of dollars of
household income (without her contribution), with 1 kid with less
than 6 years old and 2 kids older than 6 years old to participate in
the labor force?
• And a woman with 1 years old, 13 years of education, 3 years of
experience, 73.6 thousands of dollars of household income (without
her contribution), with 3 kids with less than 6 years old and 1 kid
older than 6 years old?

12
NOVA
IMS Estimating probabilities
Information
Management
School

reg <- lm(inlf ~ nwifeinc + educ + exper +

expersq + age + kidslt6 +
kidsge6, data = mroz)

sum(coef(reg)*c(1, 15.51, 12, 1, 1, 30, 1, 2))

## [1] 0.3090346

sum(coef(reg)*c(1, 73.6, 13, 3, 9, 31, 3, 1))

## [1] -0.3292861

13
NOVA
IMS Main Issues
Information
Management
School

• If we plug certain combinations of values for the independent

variables, we can get predictions either less than zero or greater than
one.
• A probability cannot be linearly related to the independent variables
for all their possible values.

14
NOVA
IMS Main Issues
Information
Management
School

Due to the binary nature of y, the linear probability model does violate
one of the Gauss-Markov assumptions. When y is a binary variable, its
variance, conditional on x, is

Var(y|x) = p(x)[1 − p(x)],

where p(x) is shorthand for the probability of success.

15
NOVA
IMS
Information
Management
School

Logit and Probit Models

NOVA
IMS Limited Dependent Variable Models i
Information
Management
School

• Previously, we studied the linear probability model, which is simply

an application of the multiple regression model to a binary
dependent variable. A binary dependent variable is an example of a
limited dependent variable (LVD);
• A LVD is broadly defined as a dependent variable whose range of
values is restricted;
• Most economic variables we would like to explain are limited in some
way, often because they must be positive. For example, hourly wage,
housing price, and nominal interest rates must be greater than zero;

16
NOVA
IMS Limited Dependent Variable Models ii
Information
Management
School

• Not all such variables need special treatment. If a strictly positive

variable takes on many different values, a special econometric model
is rarely necessary. When y is discrete and takes on a small number
of values, it makes no sense to treat it as an approximately
continuous variable;
• However, as we saw, the linear probability model has certain
drawbacks. In the following sections, we will present alternative
models that overcome the shortcomings of the LPM.

17
NOVA
IMS Logit and Probit Models
Information
Management
School

In the LPM, we assume that the response probability is linear in a set of

parameters, βj . To avoid the LPM limitations, we will consider a class of
binary response models of the form:

P(y = 1|x) = G(β0 + β1 x1 + · · · + βk xk ) = G(x′i β), (2)

where G is a function taking on values strictly between zero and one:

0 < G(z) < 1, for all real numbers z. This ensures that the estimated
response probabilities are strictly between zero and one.

18
NOVA
IMS Logit Models
Information
Management
School

In the logit model, G is the logistic function:

G(z) = exp(z)/[1 + exp(z)] = Λ(z) (3)

which is between zero and one for all real numbers z. This is the
cumulative distribution function (cdf) for a standard logistic random
variable.

19
NOVA
IMS Probit Models
Information
Management
School

In the probit model, G is the standard normal cdf, which is expressed as

an integral: Z z
G(z) = Φ(z) = ϕ(v)dv (4)
−∞

where ϕ(z) is the standard normal density

ϕ(z) = (2π)−1/2 exp(−z2 /2) (5)

20
NOVA
IMS Logit and Probit Models
Information
Management
School

• Logit and probit models can be derived from an underlying latent

variable model. Let y∗ be and unobserved, or latent, variable, and
suppose that
y∗ = x′i β + e, y = 1[y∗ > 0] (6)
• The notation 1[·] defines the binary outcome, it is called an indicator
function, which takes on the value one if the event in brackets is
true, and zero otherwise.
• We assume that e is independent of x and that e either has the
standard logistic distribution or the standard normal distribution.
• In each case, e is symmetrically distributed about zero, which means
that 1 − G(−z) = G(z) for all real numbers z.

21
NOVA
IMS Logit and Probit Models
Information
Management
School

1.00

0.75

y 0.50 Logit model

Probit model

0.25

0.00

−1.0 −0.5 0.0 0.5 1.0 22

x1
NOVA
IMS Logit and Probit Models
Information
Management
School

We can derive that response probability for y as:

P(y = 1|x) = P(y∗ > 0|x) = P[e > −(x′i β)|x]

= 1 − G[−(x′i β)] = G(x′i β) (7)

23
NOVA
IMS MLE estimator
Information
Management
School

• Up until now, we have had little need for MLE, although we did note
that, under the classical linear model assumptions, the OLS
estimator is the maximum likelihood estimator (conditional on the
explanatory variables);
• Because of the nonlinear nature of E[y|x], OLS and WLS are not
applicable;
• For estimating limited dependent variable models, maximum
likelihood methods are indispensable.

24
NOVA
IMS MLE estimator
Information
Management
School

• Assume that we have a random sample of size n. To obtain the

maximum likelihood estimator, conditional on the explanatory
variables, we need the density of yi given xi . We can write this as:
y 1−y
f(y|xi ; β) = G(x′i β) 1 − G(x′i β) , y = 0, 1, (8)

where, for simplicity, we absorb the intercept into the vector xi ;

• We can easily see that when y = 1, we get G(x′i β) and when y = 0,
we get 1 − G(x′i β);
• The log-likelihood function for observation i is a function of the
parameters and the data (xi , yi ) and is obtained by taking the log of:

ℓi (β) = yi log G(x′i β) + (1 − yi )log 1 − G(x′i β) (9)

25
NOVA
IMS MLE estimator
Information
Management
School

• The log-likelihood for a sample size of n is obtained by summing

Pn
equation (9) across all observations: L(β) = i=1 ℓi (β);
• The MLE of β, denoted by β̂, maximizes this log-likelihood;
• If G(·) is the standard logit cdf, then β̂ is the logit estimator; if G(·)
is the standard normal cdf, then β̂ is the probit estimator.

26
NOVA
IMS MLE estimator - Example
Information
Management
School

#Simulate probit model

set.seed(123)
n <- 1000
beta0 <- 0
beta1 <- 1.5
x1 <- runif(n=n, min=-1, max=1)
u <- rnorm(n)
y <- round(pnorm(beta0 + beta1*x1 + u),0)
glm(y ~x1, family=binomial(link='probit'))

##
## Call: glm(formula = y ~ x1, family = binomial(link = "probit"))
##
## Coefficients:
## (Intercept) x1
## -0.0316 1.4416
##
## Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
## Null Deviance:^^I 1386
## Residual Deviance: 1048 ^^IAIC: 1052
27
NOVA
IMS MLE estimator - Example
Information
Management
School

28
NOVA
IMS MLE estimator
Information
Management
School

• Because of the nonlinear nature of the maximization problem, we

cannot write formulas for the logit or probit maximum likelihood
estimates;
• In addition to raising computational issues, this makes the statistical
theory for logit and probit much more diﬀicult than OLS;
• Nevertheless, the general theory of MLE for random samples implies
that, under very general conditions, the MLE is consistent,
asymptotically normal, and asymptotically eﬀicient.

29
NOVA
IMS Logit and Probit Models
Information
Management
School

• In most applications for binary response models, the primary goal is

to explain the effects of the xj on the response probability
P(y = 1|x).
• For logit and probit, the direction of the effect of xj on
E(y∗ |x) = x′i β and on E(y|x) = P(y = 1|x) = G(x′i β) is always the
same.
• But the latent variable rarely has a well-defined unit of measurement,
thus the magnitudes of each βj are not, by themselves, especially
useful (in contrast to the linear probability model).

30
NOVA
IMS Logit and Probit Models
Information
Management
School

• To find the partial effect of roughly continuous variables on the

response probability, we must rely on calculus. If xj is a roughly
continuous variable, its partial effect on p(x) = P(y = 1|x) is
obtained from the partial derivative:

∂p(x)
= g(x′i β)βj , (10)
∂xj

where g(z) ≡ ∂G
∂z (z).
• Because G is the cdf of a continuous random variable, g is a
probability density function (pdf).

31
NOVA
IMS Logit and Probit Models
Information
Management
School

• In the logit and probit cases, G(·) is a strictly increasing cdf, and so
g(z) > 0 for all z.
• Therefore, the partial effect of xj on p(x) depends on x through the
positive quantity g(x′i β), which means that the partial effect always
has the same sign as βj .

32
NOVA
IMS Logit and Probit Models
Information
Management
School

• If, say, x1 is a binary explanatory variable, then the partial effect

from changing x1 from zero to one, holding all other variables fixed,
is simply:

G(β0 + β1 + β2 x2 + · · · + βk xk ) − G(β0 + β2 x2 + · · · + βk xk ) (11)

33
NOVA
IMS Partial effects
Information
Management
School

• Often, we want to estimate the effects of the xj on the response

probabilities, P(y = 1|x). If xj is (roughly continuous), then
h i
b = 1|x) ≈ g(x′ β̂)β̂j ∆xj
∆P(y (12)
i

for “small” changes in xj .

• So, for ∆xj = 1, the change in the estimated success probability is
roughly g(x′i β̂)β̂j .
• Compared with the linear probability model, the cost of using probit
and logit models is that the partial effects in equation (12) are
harder to summarize because the scale factor,g(x′i β̂), depends on x
(that is, on all of the explanatory variables).

34
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

• One approach is replace each explanatory variable with its sample

average:
g(βˆ0 + βˆ1 x̄1 + · · · + βˆk x̄k ) (13)
where g(·) is the standard normal density in the probit case and
g(z) = exp(z)/[1 + exp(z)]2 in the logit case.
• The idea of 13 is that, when it is multiplied by β̂j , we obtain the
partial effect of xj for the “average” person in the sample. This is
called partial effect at the average (PEA).

35
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

library(mfx)
logitmfx(inlf ~ nwifeinc + educ + exper +
I(exper^2) + age + kidslt6 + kidsge6, data = mroz)

## Call:
## logitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00519005 0.00204820 -2.5340 0.011278 *
## educ 0.05377731 0.01056074 5.0922 3.539e-07 ***
## exper 0.05005693 0.00782462 6.3974 1.581e-10 ***
## I(exper^2) -0.00076692 0.00024768 -3.0965 0.001959 **
## age -0.02140302 0.00353973 -6.0465 1.480e-09 ***
## kidslt6 -0.35094982 0.04963897 -7.0700 1.549e-12 ***
## kidsge6 0.01461621 0.01818832 0.8036 0.421625
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
36
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

probitmfx(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age + kidslt6 + kidsge6, data = mroz)

## Call:
## probitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00469619 0.00192965 -2.4337 0.014945 *
## educ 0.05112843 0.00992310 5.1525 2.571e-07 ***
## exper 0.04817690 0.00734505 6.5591 5.413e-11 ***
## I(exper^2) -0.00073705 0.00023464 -3.1412 0.001683 **
## age -0.02064309 0.00330485 -6.2463 4.203e-10 ***
## kidslt6 -0.33914996 0.04634765 -7.3175 2.526e-13 ***
## kidsge6 0.01406306 0.01719895 0.8177 0.413546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

37
NOVA
IMS Partial effect at the average (PEA)
Information
Management
School

The partial effect at the average (PEA) presents two problems:

1. If some explanatory variables are discrete, the averages of them

represent no one in the sample.
2. If a continuous explanatory variable appears as a nonlinear function,
i.e., natural log or in a quadratic, it is not clear whether we want to
average the nonlinear function (log(xj )) or plug the average into the
nonlinear function (log(x̄j )).

An alternative is to use the average partial effect (APE).

38
NOVA
IMS Average partial effect (APE)
Information
Management
School

• The average partial effect (APE) (or, average marginal effect

(AME)) for a continuous explanatory variable xj is:
" #
X
n
−1 ′
n g(xi β̂) β̂j (14)
i=1

• It considers as scale factor:

X
n
n−1 g(x′i β̂) (15)
i=1

which results from averaging the individual partial effects across the
sample.

39
NOVA
IMS Average partial effect (APE)
Information
Management
School

logitmfx(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age + kidslt6 + kidsge6, data = mroz,
atmean=FALSE)

## Call:
## logitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00381181 0.00153898 -2.4769 0.013255 *
## educ 0.03949652 0.00846811 4.6641 3.099e-06 ***
## exper 0.03676411 0.00655577 5.6079 2.048e-08 ***
## I(exper^2) -0.00056326 0.00018795 -2.9968 0.002728 **
## age -0.01571936 0.00293269 -5.3600 8.320e-08 ***
## kidslt6 -0.25775366 0.04263493 -6.0456 1.489e-09 ***
## kidsge6 0.01073482 0.01339130 0.8016 0.422769
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
40
NOVA
IMS Average partial effect (APE)
Information
Management
School

probitmfx(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age + kidslt6 + kidsge6, data = mroz,
atmean=FALSE)

## Call:
## probitmfx(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) +
## age + kidslt6 + kidsge6, data = mroz, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## nwifeinc -0.00361618 0.00146972 -2.4604 0.013876 *
## educ 0.03937009 0.00726571 5.4186 6.006e-08 ***
## exper 0.03709734 0.00516823 7.1780 7.076e-13 ***
## I(exper^2) -0.00056755 0.00017708 -3.2050 0.001351 **
## age -0.01589566 0.00235868 -6.7392 1.592e-11 ***
## kidslt6 -0.26115346 0.03190239 -8.1860 2.700e-16 ***
## kidsge6 0.01082889 0.01322413 0.8189 0.412859
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
41
NOVA
IMS Effects of discrete variable
Information
Management
School

• Because both of the scale factors described depend on calculus

approximation, neither makes much sense for discrete explanatory
variables.
• Instead we directly estimate the change in probability. For a change
in xk from ck to ck + 1, the discrete analog of the partial effect at
the average (PEA) is
h i
G β̂0 + β̂1 x̄1 + · · · + β̂k−1 x̄k−1 + β̂k (ck + 1)
h i
− G β̂0 + β̂1 x̄1 + · · · + β̂k−1 x̄k−1 + β̂k ck (16)

• The discrete analog of the average partial effect (APE) is

X
n h i
n−1 G β̂0 + β̂1 xi1 + · · · + β̂k−1 xik−1 + β̂k (ck + 1)
i=1
h i
− G β̂0 + β̂1 xi1 + · · · + β̂ik−1 xk−1 + β̂k ck (17)

42
NOVA
IMS Effects of discrete variable
Information
Management
School

#Effects of having a kid with less than 6 years old

#Fixing x at approximately the sample averages and assuming kidsg6 = 1
x1 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 1, 1)
x0 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 0, 1)

mod1 <- glm(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='logit'), data = mroz)

p1 = exp(sum(coef(mod1)*x1))/(1+exp(sum(coef(mod1)*x1)))
p0 = exp(sum(coef(mod1)*x0))/(1+exp(sum(coef(mod1)*x0)))
p1-p0

## [1] -0.3444345

43
NOVA
IMS Effects of discrete variable
Information
Management
School

#Effects of having a kid with less than 6 years old

#Fixing x at approximately the sample averages and assuming kidsg6 = 1
x1 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 1, 1)
x0 <- c(1, 20.13, 12.3, 10.6, 10.6^2, 42.5, 0, 1)

mod2 <- glm(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='probit'), data = mroz)

p1 = pnorm(sum(coef(mod2)*x1))
p0 = pnorm(sum(coef(mod2)*x0))
p1-p0

## [1] -0.334577

44
NOVA
IMS Statistical Inference
Information
Management
School

Consider a general set of restrictions to be tested

H0 : c(β) = 0

where:

• β is a vector of parameters in the model

• c(β): d × 1 vector of restrictions
• L(β): log-likelihood for the unrestricted model
• L(β)R : log-likelihood for the restricted model
• β̂MLE and β̂R : unrestricted and restricted MLE, respectively.

45
NOVA
IMS 3 classical test principles
Information
Management
School

46
NOVA
IMS 3 classical test principles
Information
Management
School

• Likelihood ratio test: If the restriction c(β) is valid, then imposing it

should not lead to a large reduction in the log-likelihood function.
Therefore, we base the test on the difference L(β̂MLE ) and L(β̂R ).
• Wald test: If the restriction is valid, then c(β̂MLE ) should be close to
zero, because the MLE is consistent. Therefore, the test is based on
the comparison of c(β̂MLE ) with zero.
• Lagrange Multiplier test: If the restriction is valid, then the
restricted estimator should be near the point that maximizes the
log-likelihood. Therefore, the slope of the log-likelihood should be
near zero at the restricted estimator. The test compares the slope of
the log-likelihood at the point where the function is maximized
subject to the restriction with zero.

47
NOVA
IMS Wald test
Information
Management
School

• Knowing the estimates β̂j and the corresponding standard errors, we

can construct Wald tests and confidence intervals.
• In particular, to test H0 : βj = 0, we form the Wald statistic
Z = β̂j /se(β̂j ) and carry out the test in the usual way, once we have
decided on a one-or two-sided alternative.
• The test statistic is compared to the appropriate value from the
standard normal table.

48
NOVA
IMS LR test
Information
Management
School

• To test multiple restrictions, in particular H0 : βj = βk = 0, where

j ̸= k, we use the likelihood ratio (LR) test;
• The LR test is based on the same concept as the F test in a linear
model;
• The F test measures the increase in the sum of squared residuals
when variables are dropped from the model. The LR test is based on
the difference in the log-likelihood functions for the unrestricted and
restricted models.
• The idea is this: Because the MLE maximizes the log-likelihood
function, dropping variables generally leads to a smaller—or at least
no larger—log-likelihood.
• The question is whether the fall in the log-likelihood is large enough
to conclude that the dropped variables are important.

49
NOVA
IMS LR test
Information
Management
School

• Let us consider the model (unrestricted model):

y = β0 + β1 x1 + · · · + βk xk + u

• To test H0 : βk−q+1 = βk−q+2 = · · · = βk = 0, we have the following

restricted model:

y = β0 + β1 x1 + · · · + βk−q xk−q + u

• For each model (restricted and unrestricted), we will extract the

value of log-likelihood function.

50
NOVA
IMS LR test
Information
Management
School

• Formally, we have

Lur a
LR = 2log = 2(Lur − Lr ) ∼ χ2q
Lr

where:
• Lur is the log-likelihood of the unrestricted model, Lr the
log-likelihood of the restricted model and q the number of
restrictions under the null.
• Lur is the likelihood of the unrestricted model, Lr the likelihood of
the restricted model and q the number of restrictions under the null.

51
NOVA
IMS LR test
Information
Management
School

library(lmtest)
full <- glm(inlf ~ nwifeinc + educ + exper +
I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='logit'), data = mroz)

reduced <- glm(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age,
family = binomial(link='logit'), data = mroz)

lrtest(full, reduced)

## Likelihood ratio test

##
## Model 1: inlf ~ nwifeinc + educ + exper + I(exper^2) + age + kidslt6 +
## kidsge6
## Model 2: inlf ~ nwifeinc + educ + exper + I(exper^2) + age
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -401.77
## 2 6 -432.78 -2 62.023 3.404e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

52
NOVA
IMS LR test
Information
Management
School

• The R output reports the Deviance instead of the log-likelihood.

The LR statistic can also be calculated through the difference in
Deviances.
• The Deviance of a GLM is defined to be

Deviance = 2(LS − LM )

where
• LS denotes the maximized log-likelihood value for the most complex
model possible (a saturated model).
• LM denotes maximized log-likelihood value for a model M of interest.
• It can easily be shown that

LR = 2(Lur − Lr ) = Deviancer − Devianceur

53
NOVA
IMS LR test
Information
Management
School

full <- glm(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age + kidslt6 + kidsge6,
family = binomial(link='logit'), data = mroz)

reduced <- glm(inlf ~ nwifeinc + educ + exper +

I(exper^2) + age,
family = binomial(link='logit'), data = mroz)

full

##
## Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age +
## kidslt6 + kidsge6, family = binomial(link = "logit"), data = mroz)
##
## Coefficients:
## (Intercept) nwifeinc educ exper I(exper^2) age
## 0.425452 -0.021345 0.221170 0.205870 -0.003154 -0.088024
## kidslt6 kidsge6
## -1.443354 0.060112
##
## Degrees of Freedom: 752 Total (i.e. Null); 745 Residual
## Null Deviance:^^I 1030
## Residual Deviance: 803.5 ^^IAIC: 819.5

reduced

##
## Call: glm(formula = inlf ~ nwifeinc + educ + exper + I(exper^2) + age,
## family = binomial(link = "logit"), data = mroz)
## 54
## Coefficients:
NOVA
IMS LM test
Information
Management
School

• The LM test is based on the restricted estimation only. Let si (β̂R )

be the P × 1 score of ℓi (β) evaluated at the restricted estimates β̂R .
• That is, we compute the partial derivatives of ℓi (β) with respect to
each of the parameters, bu then we evaluate this vector of partials at
the restricted estimates. The test statistic is
!′ N ! N !
XN X X a
′
LM = si (β̂R ) si (β̂R )si (β̂R ) si (β̂R ) ∼ χ2(q)
i=1 i=1 i=1

55
NOVA
IMS Classification Tables
Information
Management
School

• A classification table cross-classifies the binary outcome y with a

prediction of whether y = 0 or 1.
• The prediction for observation i is ŷ = 1 when its estimated
probability G(x′i β̂) ≥ 0.5 and ŷ = 0 when G(x′i β̂) < 0.5.
• We can compute a goodness-of-fit measure called the percent
correctly predicted.
• However, if a low (high) proportion of observations have y = 1, the
model fit may never (always) have G(x′i β̂) ≥ 0.5, in which case one
never (always) predicts ŷ = 1.
• So, in these cases, instead of using a threshold value of 0.5, we can
use the sample proportion of 1 outcomes (ȳ) as threshold.

56
NOVA
IMS Classification Tables
Information
Management
School

Table 1: Classification Table

Estimated values
Observed values Total
ŷi = 1 ŷi = 0
yi = 1 n11 n10 n1·
yi = 0 n01 n00 n0·
Total n·1 n·0 n

57
NOVA
IMS Classification Tables
Information
Management
School

• Two useful summaries of predictive power are

n11
Sensitivity = P(ŷ = 1|y = 1) =
n·1
and
n00
Specificity = P(ŷ = 0|y = 0) =
n·0
• The overall proportion of correct classifications is
n00 + n11
p̂ =
n

58
NOVA
IMS Information criteria
Information
Management
School

Two common measures that are baed on the same logic as the adjusted
R-squared for the linear model are:

Akaike information criterion (AIC) = −2ℓ + 2K

Bayes (Schwarz) information criterion (BIC) = −2ℓ + K logn

where K is the number of parameters in the model and n the number of

observations.
We choose a model based on the lowest AIC/BIC value.

59
NOVA
IMS Pseudo R-squared
Information
Management
School

• McFadden (1974) suggested the measure 1 − Lur /L0 , where Lur is

the log-likelihood function of the estimated model and L0 is the
log-likelihood function in the model with only and intercept.
• Why does this measure make sense? Recall that the log-likelihoods
are negative, and so Lur /L0 = 1 − |Lur |/|L0 |. Further, |Lur | ≥ |L0 |.
• If the covariates have no explanatory power, then Lur /L0 = 1, and
the pseudo R-squared is zero.

(Chapman & Hall - CRC Texts in Statistical Science) Paul Roback and Julie Legler - Beyond Multiple Linear Regression-Applied Generalized Linear Models and Multilevel Models in R-CRC Press (2020)
No ratings yet
(Chapman & Hall - CRC Texts in Statistical Science) Paul Roback and Julie Legler - Beyond Multiple Linear Regression-Applied Generalized Linear Models and Multilevel Models in R-CRC Press (2020)
437 pages
That Time I Got Reincarnated As A Slime, Vol. 10
No ratings yet
That Time I Got Reincarnated As A Slime, Vol. 10
456 pages
Econometrics 1: Dummy Dependent Variables Models
0% (1)
Econometrics 1: Dummy Dependent Variables Models
12 pages
Discrete Choice Models 230919 191735
No ratings yet
Discrete Choice Models 230919 191735
132 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
K Kiran Kumar IIM Indore
100% (1)
K Kiran Kumar IIM Indore
115 pages
Minimally Invasive Cardiac Surgery A Practical Guide - 1st Edition PDF
100% (9)
Minimally Invasive Cardiac Surgery A Practical Guide - 1st Edition PDF
14 pages
Achievement Test Answer Key BPP NC Ii SHS
100% (1)
Achievement Test Answer Key BPP NC Ii SHS
6 pages
Lecture 8 - Limited Dependent Var PDF
No ratings yet
Lecture 8 - Limited Dependent Var PDF
78 pages
Econ Shu301 CH11
No ratings yet
Econ Shu301 CH11
53 pages
3 - Panel Data
No ratings yet
3 - Panel Data
35 pages
Limited Dependent Variables Models PDF
No ratings yet
Limited Dependent Variables Models PDF
47 pages
Ecmetrics II Ch1
No ratings yet
Ecmetrics II Ch1
56 pages
Discrete Choice Model Soderbom
No ratings yet
Discrete Choice Model Soderbom
43 pages
Binary
No ratings yet
Binary
47 pages
Binary
No ratings yet
Binary
47 pages
Binary
No ratings yet
Binary
40 pages
PE Pipe Design and Engineering Guide (Polypipe) PDF
No ratings yet
PE Pipe Design and Engineering Guide (Polypipe) PDF
78 pages
Chapter - Five - Limited Dependent Variable Models
No ratings yet
Chapter - Five - Limited Dependent Variable Models
75 pages
Econometrics II Distance Module
No ratings yet
Econometrics II Distance Module
97 pages
Binary Data Advanced
No ratings yet
Binary Data Advanced
42 pages
Chapter 15.1
No ratings yet
Chapter 15.1
22 pages
Lec 5 V 11
No ratings yet
Lec 5 V 11
44 pages
2part Latent Trait
No ratings yet
2part Latent Trait
33 pages
Probit Logit Models
No ratings yet
Probit Logit Models
26 pages
Ecntr Assmm
No ratings yet
Ecntr Assmm
23 pages
09 Discrete Choice 1 Notes
No ratings yet
09 Discrete Choice 1 Notes
17 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Week 12 LPN Logit 0
No ratings yet
Week 12 LPN Logit 0
35 pages
Econometrics 2 Module 5 Video 2 Canvas
No ratings yet
Econometrics 2 Module 5 Video 2 Canvas
13 pages
17 Ae2
No ratings yet
17 Ae2
29 pages
PD2004 9
No ratings yet
PD2004 9
26 pages
Chapter 5 MGT
No ratings yet
Chapter 5 MGT
60 pages
Us20 Allison
No ratings yet
Us20 Allison
10 pages
Cap1 Slides
No ratings yet
Cap1 Slides
30 pages
Coir
No ratings yet
Coir
34 pages
MicroEconometrics Lecture10
No ratings yet
MicroEconometrics Lecture10
27 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
17 pages
Econometria Avanzada: Generalized Linear Models
No ratings yet
Econometria Avanzada: Generalized Linear Models
30 pages
09-Limited Dependent Variable Models
No ratings yet
09-Limited Dependent Variable Models
71 pages
3.handouts Binary Dependent Variables
No ratings yet
3.handouts Binary Dependent Variables
8 pages
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
100% (1)
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
4 pages
Chapter Four
No ratings yet
Chapter Four
8 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
EBE Ch8
No ratings yet
EBE Ch8
9 pages
Econometrics CH 4
No ratings yet
Econometrics CH 4
14 pages
CH 5. Discrete Choice Model
No ratings yet
CH 5. Discrete Choice Model
38 pages
Topic 3: Qualitative Response Regression Models
No ratings yet
Topic 3: Qualitative Response Regression Models
29 pages
Logit Probit
No ratings yet
Logit Probit
11 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
411 Note LDV
No ratings yet
411 Note LDV
12 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Chapter 15 Qualitative Response Regression Models Part 2
No ratings yet
Chapter 15 Qualitative Response Regression Models Part 2
31 pages
Probit Logit Ohio PDF
No ratings yet
Probit Logit Ohio PDF
16 pages
Econometrics - Qualitative Response Models
No ratings yet
Econometrics - Qualitative Response Models
17 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
Limited Dependent Variables - Binary Dependent Variables
No ratings yet
Limited Dependent Variables - Binary Dependent Variables
24 pages
2 16KW
No ratings yet
2 16KW
1 page
Applied Econometrics For Managers (MBAA-II, AY: 2023-24) IIM Kashipur
No ratings yet
Applied Econometrics For Managers (MBAA-II, AY: 2023-24) IIM Kashipur
3 pages
Binaryresponsemf IMP
No ratings yet
Binaryresponsemf IMP
11 pages
Section 9 Limited Dependent Variables
No ratings yet
Section 9 Limited Dependent Variables
17 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
Dynamometer Type Wattmeter
No ratings yet
Dynamometer Type Wattmeter
9 pages
English Year 4 - Paper 1
No ratings yet
English Year 4 - Paper 1
26 pages
Msfe Week9
No ratings yet
Msfe Week9
5 pages
Newsletter 23 - Logit, Probit, Tobit (2P)
No ratings yet
Newsletter 23 - Logit, Probit, Tobit (2P)
2 pages
Tiger Grass Industry Studies in Romblon PDF
No ratings yet
Tiger Grass Industry Studies in Romblon PDF
10 pages
Flexural Analysis of Singly Reinforced Beams-Example
100% (1)
Flexural Analysis of Singly Reinforced Beams-Example
2 pages
D3D Dec20jan21
No ratings yet
D3D Dec20jan21
52 pages
Parkinson y Parkinsonismo
No ratings yet
Parkinson y Parkinsonismo
5 pages
BTS100 - Battery
No ratings yet
BTS100 - Battery
4 pages
Company Experience - NDT Services
No ratings yet
Company Experience - NDT Services
10 pages
Bài ôn tập học kì II - Unit 7 - Test 1
No ratings yet
Bài ôn tập học kì II - Unit 7 - Test 1
5 pages
Lyrics
No ratings yet
Lyrics
56 pages
Snowflake Bentley
No ratings yet
Snowflake Bentley
82 pages
UT-II Sample Questions
No ratings yet
UT-II Sample Questions
2 pages
Turbo Flanges and Wastegate Flanges Product Information
No ratings yet
Turbo Flanges and Wastegate Flanges Product Information
8 pages
Government of India Ministry of Road Transport Highways (Transport Division)
No ratings yet
Government of India Ministry of Road Transport Highways (Transport Division)
4 pages
Unit 202: Principles of Engineering Technology: Handout 14: Work, Power and Energy
No ratings yet
Unit 202: Principles of Engineering Technology: Handout 14: Work, Power and Energy
3 pages
Kubota Tractor B219 Loader - Model 25 Maximum Payload - 500 Pounds Figure: 1 - Safety Precautions
No ratings yet
Kubota Tractor B219 Loader - Model 25 Maximum Payload - 500 Pounds Figure: 1 - Safety Precautions
17 pages
Analysis and Design of Sprinkler Irrigation Laterals
No ratings yet
Analysis and Design of Sprinkler Irrigation Laterals
16 pages
Pump Cycle Calculator: Input Data
No ratings yet
Pump Cycle Calculator: Input Data
2 pages
CSR Initiatives Related To Procurement and Suppliers: Organic Cotton
No ratings yet
CSR Initiatives Related To Procurement and Suppliers: Organic Cotton
2 pages
FIDE World Cup 2023 - The Week in Chess
No ratings yet
FIDE World Cup 2023 - The Week in Chess
4 pages
Complete Urine Analysis 07-03-2022
No ratings yet
Complete Urine Analysis 07-03-2022
1 page
Description and Application: 80%ar - 20%CO / 100%CO EN ISO 17633-A T 19 9 L P C1/M21 1 AWS A5.22 E308LT1-1/4 EN 1.4316
No ratings yet
Description and Application: 80%ar - 20%CO / 100%CO EN ISO 17633-A T 19 9 L P C1/M21 1 AWS A5.22 E308LT1-1/4 EN 1.4316
1 page
Areas of Social Sciences PDF
No ratings yet
Areas of Social Sciences PDF
1 page