Slides Week4
Slides Week4
G023. III
Estimating a Probability
= L(p; y).
With any set of data L(p; y) can be calculated for any value of
p between 0 and 1. The result is the probability of observing
the data to hand for each chosen value of p.
• One strategy for estimating p is to use that value that max-
imises this probability. The resulting estimator is called the
maximum likelihood estimator (MLE) and the maximand, L(p; y),
is called the likelihood function.
G023. III
Log Likelihood Function
• The maximum of the log likelihood function, l(p; y) = log L(p, y),
is at the same value of p as is the maximum of the likelihood
function (because the log function is monotonic).
• It is often easier to maximise the log likelihood function (LLF).
For the problem considered here the LLF is
à n ! n
X X
l(p; y) = yi log p + (1 − yi ) log(1 − p).
i=1 i=1
Let
p̂ = arg maxL(p; y) = arg maxl(p; y).
p p
G023. III
Likelihood Functions and Estimation in General
Here only the joint density function depends upon θ and the
value of θ that maximises f (y1 , . . . , yn , θ) also maximises A.
• In this case the likelihood function is defined to be the joint
density function of the Yi ’s.
• When the Yi ’s are discrete random variables the likelihood func-
tion is the joint probability mass function of the Yi ’s, and in
cases in which there are discrete and continuous elements the
likelihood function is a combination of probability density ele-
ments and probability mass elements.
• In all cases the likelihood function is a function of the observed
data values that is equal to, or proportional to, the probability
of observing these particular values, where the constant of pro-
portionality does not depend upon the parameters which are
to be estimated.
G023. III
Likelihood Functions and Estimation in General
G023. III
Invariance
• The reason for this is that we omit the infinitesimals dy1 , . . . dyn
from the likelihood function for continuous variates and these
change when we move from y to z because they are denomi-
nated in the units in which y or z are measured.
G023. III
Maximum Likelihood: Properties
G023. III
Maximum Likelihood: Improving Numerical Properties
G023. III
Properties Of Maximum Likelihood Estimators
d
n1/2 (θ̂ − θ) → N (0, V0 )
where
V0 = − plim(n−1 lθθ (θ0 ; Y ))−1
n→∞
and θ0 is the unknown parameter value. To get an ap-
proximate distribution that can be used in practice we use
(n−1 lθθ (θ̂; Y ))−1 or some other consistent estimator of V0
in place of V0 .
G023. III
Properties Of Maximum Likelihood Estimators
d
n1/2 (θ̂ − θ0 ) → N (0, A(θ0 )−1 B(θ0 )A(θ0 )−10 ).
G023. III
Maximum Likelihood: Limiting Distribution
G023. III
Maximum Likelihood: Limiting Distribution
Differentiating again
Z Z
∂
lθ (θ; y)L(θ; y)dy = (lθθ0 (θ; y)L(θ; y) + lθ (θ; y)Lθ0 (θ; y)) dy
∂θ0 Z
= (lθθ0 (θ; y) + lθ (θ; y)lθ (θ; y)0 ) L(θ; y)dy
= E [lθθ0 (θ; Y ) + lθ (θ; Y )lθ (θ; Y )0 ]
= 0.
and so
giving
B(θ0 ) = − plim n−1 lθθ0 (θ0 ; Y ).
n→∞
The matrix
I(θ) = −E [lθθ (θ; Y )]
plays a central role in likelihood theory - it is called the Infor-
mation Matrix .
Finally, because B(θ0 ) = −A(θ0 )
µ ¶−1
A(θ)−1 B(θ)A(θ)−10 = − plim n−1 lθθ0 (θ; Y ) .
n→∞
Of course a number of conditions are required to hold for the
results above to hold. These include the boundedness of third
order derivatives of the log likelihood function, independence or
at most weak dependence of the Yi ’s, existence of moments of
derivatives of the log likelihood, or at least of probability limits
of suitably scaled versions of them, and lack of dependence of
the support of the Yi ’s on θ.
The result in equation (4) above leads, under suitable condi-
tions concerning convergence, to
¡ ¢ ¡ ¢
plim n−1 lθ (θ; Y )lθ (θ; Y )0 = − plim n−1 lθθ0 (θ; Y ) .
n→∞ n→∞
G023. III
Estimating a Conditional Probability
• Both models are widely used. Note that in both cases a single
index model is specified, the probability functions are monotonic
increasing, probabilities arbitrarily close to zero or one are ob-
tained when x0 θ is sufficiently large or small, and there is a
symmetry in both of the models in the sense that p(−x, θ) =
1 − p(x, θ). Any or all of these properties might be inappropri-
ate in a particular application but there is rarely discussion of
this in the applied econometrics literature.
G023. III
More on Logit and Probit
Yi∗ = Xi0 θ + εi
pi = P (Yi = 1) = P (Yi∗ ≥ 0)
= P (Xi0 θ + εi ≥ 0)
= P (εi ≥ −Xi0 θ)
= 1 − Fε (−Xi0 θ)
G023. III
Odds-Ratio
G023. III
Marginal Effects
• Logit model:
∂pi θ exp(Xi0 θ)(1 + exp(Xi0 θ)) − θ exp(Xi0 θ)2
=
∂X (1 + exp(Xi0 θ))2
θ exp(Xi0 θ)
=
(1 + exp(Xi0 θ))2
= θpi (1 − pi )
• Probit model:
∂pi
= θφ(Xi0 θ)
∂Xi
A one unit increase in X leads to an increase of θφ(Xi0 θ).
G023. III
ML in Single Index Models
n
X gw (x0 θ)xi
i gw (x0i θ)xi
lθ (θ; y) = yi − (1 − yi )
i=1
g(x0i θ) 1 − g(x0i θ)
Xn
gw (x0i θ)
= (yi − g(x0i θ)) xi
i=1
g(x0i θ) (1 − g(x0i θ))
G023. III
Asymptotic Properties of the Logit Model
g(w) = Φ(w)
gw (w) = φ(w)
gw (w) φ(w)
⇒ = .
g(w) (1 − g(w)) Φ(w)(1 − Φ(w))
Therefore in the probit model the MLE satisfies
n ³
X ´ φ(x0i θ̂)
yi − Φ(x0i θ̂) xi = 0,
i=1 Φ(x0i θ̂)(1 − Φ(x0i θ̂))
G023. III
Example: Logit and Probit
Concerni = β0 +β1 agei +β2 sexi +β3 log incomei +β4 smelli +ui
G023. III
Multinomial Logit
exp(X 0 θj )
pj = PJ
0θk )
k=1 exp(X
G023. III
Identification
G023. III
Independence of Irrelevant Alternatives
After a few minutes the waitress returns and says that they also
have cherry pie at which point Morgenbesser says ”In that case I’ll
have the blueberry pie.”
G023. III
Independence of Irrelevant Alternatives
• However, the IIA implies that odds ratios are the same whether
of not another alternative exists. The only probabilities for
which the three odds ratios are equal to one are:
G023. III
Marginal Effects: Multinomial Logit
G023. III
Example
G023. III
Ordered Models
G023. III
Ordered Probit
G023. III
Ordered Probit
• Marginal Effects:
∂P (Yi = 0)
= −θφ(−Xi0 θ)
∂Xi
∂P (Yi = 1)
= θ (φ(Xi0 θ) − φ(µ − Xi0 θ))
∂Xi
∂P (Yi = 2)
= θφ(µ − Xi0 θ)
∂Xi
• Note that if θ > 0, ∂P (Yi = 0)/∂Xi < 0 and ∂P (Yi = 2)/∂Xi >
0:
– If Xi has a positive effect on the latent variable, then by
increasing Xi , fewer individuals will stay in category 0.
– Similarly, more individuals will be in category 2.
– In the intermediate category, the fraction of individual will
either increase or decrease, depending on the relative size
of the inflow from category 0 and the outflow to category 2.
G023. III
Ordered Probit: Example
G023. III
Ordered Probit: Example
G023. III
Models for Count Data
and
m!
P [Y = j] = pj (1 − p)m−j , j ∈ {0, 1, 2, . . . , m}
j!(m − j)!
G023. III
Models for Count Data
G023. III
Models for Count Data
G023. III
Models for Count Data
where, note, the first term has expected value zero. Therefore
the Information Matrix for this conditional Poisson model is
n
X λw (x0 θ)2
i
I(θ) = 0 xi x0i .
i=1
λ(xi θ)
with V0 estimated by
à n
!−1
X λw (x0 θ̂)2
i
V̂0 = n−1 xi x0i .
0
λ(xi θ̂)
i=1
G023. III