0% found this document useful (0 votes)
21 views60 pages

Chapter 5 MGT

Uploaded by

Neway Alem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views60 pages

Chapter 5 MGT

Uploaded by

Neway Alem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Chapter 5

Limited Dependent Variable Models


Limited dependent variables
• Appropriate estimation of relations between variables depends on

selecting an appropriate statistical model.

• If dependent variables are qualitative, they require different

modeling.

• Models of this type are called qualitative response models, because

the dependent variables are discrete

• Discrete Variables - For example, we might model labor force

participation, purchase or not purchase, etc.

• There are several types of such models.


Zenegnaw Abiy Hailu (PhD) 2
Types of qualitative response models
• Qualitative dichotomy/Binomial - We equate "no" with zero and
"yes" with 1. However, these are qualitative choices and the
coding of 0-1 is arbitrary (e.g., purchase/not type variables)

• Ordered multinomial (Ranking) - opinions about job


satisfaction)- Strongly satisfied (5), satisfied (4), neither/nor (3),
dissatisfied (2), strongly dissatisfied (1)

• Qualitative multichotomy/multinomial - Let 0 be a clerk, 1 an


engineer, 2 an attorney, 3 a politician, 4 a college professor, and
5 other – nominal (e.g., occupational choice by an individual)
Zenegnaw Abiy Hailu (PhD) 3
Dichotomous Dependent Variables - Dummy
• We discuss how to deal with binomial independent variable

• There are, however, various problems associated with estimating a


dichotomous dependent variable using OLS

• Obviously the statistical experiment is not drawn from a normal


distribution, but from something called a Bernoulli distribution.

• Thus, estimation is likely to be inefficient. It is also theoretically


inconsistent with the nature of the statistical experiment.

• The dependent variable is discrete and truncated on both ends at 0


and 1. This leads to a number of other serious problems.

Zenegnaw Abiy Hailu (PhD) 4


The Linear Probability Model
• We will first examine a simple and obvious, but unfortunately

flawed, method for dealing with binary dependent variables,

known as the linear probability model (LPM).

• It is based on an assumption that the probability of an event

occurring, Pi, is linearly related to a set of explanatory variables

Pi  p( yi  1)  1   2 x2i  3 x3i     k xki  ui

• This is then a linear regression model and would be estimated by

OLS.
Zenegnaw Abiy Hailu (PhD) 5
The Linear Probability Model
• Suppose, for example, that we wanted to model the probability

that a firm i will pay a dividend p(yi = 1) as a function of its size

(x2i, total asset measured in millions of US dollars).

• We fit the following line:

Pˆi  0.3  0.012 x2i


where P̂i denotes the fitted or estimated probability for firm i.

• How do we interpret the slope coefficient?

Zenegnaw Abiy Hailu (PhD) 6


The Linear Probability Model
• Graphically this is

Size
themil. of
US dollars

Zenegnaw Abiy Hailu (PhD) 7


Disadvantages of the Linear
Probability Model
• While the linear probability model is simple to estimate and

intuitive to interpret, the diagram on the previous slide should

immediately signal a problem with this setup.

• For any firm whose asset is less than $25m, the model-predicted

probability of dividend payment is negative, while for any firm

worth more than $108.3m, the probability is greater than one.

• Clearly, such predictions cannot be allowed to stand, since the

probabilities should lie within the range (0,1).


Zenegnaw Abiy Hailu (PhD) 8
Disadvantages of the Linear
Probability Model
• An obvious solution is to truncate the probabilities at 0 or 1, so that
a probability > 1= 1 and < 0 = 0

• However, there are reasons why this is still not adequate.

– The process of truncation will result in too many observations


for which the estimated probabilities are exactly zero or one.

– It is also not plausible to suggest that the firm's probability of


paying a dividend is either exactly zero or exactly one.

– Are we really certain that very small firms will definitely never
pay a dividend and that large firms will always make a payout?
Zenegnaw Abiy Hailu (PhD) 9
Disadvantages of the Linear
Probability Model
• The LPM also suffers from more standard econometric problems
that we have examined in the previous chapters.

– Since the dependent variable only takes one or two values, for
given (fixed in repeated samples) values of the explanatory
variables, the disturbance term will also only take on one of
two values – normality is violated

– Since the disturbance term changes systematically with the


explanatory variables, the former will also be heteroscedastic
Zenegnaw Abiy Hailu (PhD) 10
Logit and Probit
• Both the logit and probit model approaches are able to overcome
the limitation of the LPM that it can produce estimated
probabilities that are negative or greater than one.

• They do this by using a function that effectively transforms the


regression model so that the fitted values are bounded within the
(0,1) interval.

• Note that the observed dependent variable is a Bernoulli (or


binary) variable. But what we are really interested in is predicting
the probability that an event occurs (i.e., the probability that y=1).

Zenegnaw Abiy Hailu (PhD) 11


Logit and Probit

Size (mil. of US dollars)

Zenegnaw Abiy Hailu (PhD) 12


Redefining the Dependent Var.
• How to solve this problem?

– We need to transform the dichotomous Y into a continuous


variable Y′ ϵ (-∞, ∞)

– So we need a link function F(Y) that takes a dichotomous Y


and gives us a continuous, real-valued Y′

– Then we can run

• F(Y) = Y′ = Xb + e

Zenegnaw Abiy Hailu (PhD) 13


Redefining the Dependent Var…

Zenegnaw Abiy Hailu (PhD) 14


Probit Model
• What function F(Y) goes from the [0,1] interval to
the real line?
• Well, we know at least one function that goes the
other way around.
– That is, given any real value it produces a number
(probability) between 0 and 1.
• This is the cumulative normal distribution F
– That is, given any Z-score, F(Z) ϵ [0,1]
Zenegnaw Abiy Hailu (PhD) 15
Probit Model
• So we would say that
Y = Φ(Xb + e)
Φ-1(Y) = Xb + e
Y′ = Xb + e
• Then our link function is F(Y) = Φ-1 (Y)
• This link function is known as the Probit link
– This term was coined in the 1930’s by biologists
studying the dosage-cure rate link
– It is short for “probability unit”
Zenegnaw Abiy Hailu (PhD) 16
Probit Estimation

Say that for a given observation, Xβ = -1


Zenegnaw Abiy Hailu (PhD) 17
Probit Estimation …

Say that for a given observation, Xβ = 2

Zenegnaw Abiy Hailu (PhD) 18


Probit Estimation …
• In a probit model, the value of Xβ is taken to be the z-value of
a normal distribution

• Higher values of Xβ mean that the event is more likely to


happen

• A one unit change in Xi leads to a βi change in the z-score of Y.

• The estimated curve is an S-shaped cumulative normal


distribution

Zenegnaw Abiy Hailu (PhD) 19


PDF and CDF
• Recall the probability distribution function and cumulative
distribution function for a standard normal:
1

PDF

0 CDF
0

Zenegnaw Abiy Hailu (PhD) 20


Logit Model
• The logit model is so-called because it uses a the cumulative logistic
distribution to transform the model so that the probabilities follow the
S-shape given on the previous slide.

• With the logistic model, 0 and 1 are asymptotes to the function and thus
the probabilities will never actually fall to exactly zero or rise to one,
although they may come infinitesimally close.

• The logit model is not linear (and cannot be made linear by a


transformation) and thus is not estimable using OLS.

• Instead, maximum likelihood is usually used to estimate the parameters


of the model [More on this later].
Zenegnaw Abiy Hailu (PhD) 21
Redefining the Dependent Var.
• Let’s return to the problem of transforming Y from {0,1} to the

real line

• We’ll look at an alternative approach based on the odds ratio

• If some event occurs with probability p, then the odds of it

happening are O(p) = p/(1-p)


– p=0 O(p) = 0
– p=¼ O(p) = 1/3 (“Odds are 1-to-3 against”)
– p=½ O(p) = 1 (“Even odds”)
– p=¾ O(p) = 3 (“Odds are 3-to-1 in favor”)
– p=1 O(p) = ∞
Zenegnaw Abiy Hailu (PhD) 22
Redefining the Dependent Var.…

• So taking the odds of Y occurring moves us from the [0,1]


interval to the half-line [0, ∞)
• The odds ratio is always non-negative
• As a final step, then, take the log of the odds ratio
Zenegnaw Abiy Hailu (PhD) 23
Redefining the Dependent Var.…

Zenegnaw Abiy Hailu (PhD) 24


Logit Function
• This is called the logit function
– logit(Y) = log[y/(1-y)]

• Why would we want to do this?


– At first, this was computationally easier than working with normal
distributions
– Now, it still has some nice properties that we’ll investigate later
with multinomial dep. vars.

• The density function associated with it is very close to a


standard normal distribution
Zenegnaw Abiy Hailu (PhD) 25
Logit vs. Probit

The logit function is similar, but has thinner tails


than the normal distribution
Zenegnaw Abiy Hailu (PhD) 26
Logistic Regression, s - shaped curve

Zenegnaw Abiy Hailu (PhD) 27


Logistic Regression—Model fit
• Recall that in OLS, we minimized the sum of the residual
square was used in order to find the line that best fit the
data.
• In logistic regression analysis, we use a calculus-based
function called Maximum Likelihood.
• Through an iterative process, MLE finds the function that
will maximize our ability to predict the probability of y
based on what we know about x.

Zenegnaw Abiy Hailu (PhD) 28


Maximum Likelihood Estimation
• MLE starts with an initial (arbitrary) guesstimate of what the

coefficients will be, and then determines the direction and size

change which will increase the log likelihood (goodness of fit)

– How likely it is that the observed value of the dependent

variable can be predicted from the observed variables of the

independent variables?

Zenegnaw Abiy Hailu (PhD) 29


Maximum Likelihood Estimation
• After estimating an initial function, the program continues

estimating with new estimates to reach an improved

function—until convergence is reached (that is, the log

likelihood, or the goodness of fit, does not change

significantly).

• The higher the L, the higher the probability of observing the


ps in the sample.

Zenegnaw Abiy Hailu (PhD) 30


Maximum Likelihood Estimation (MLE)
• MLE involves finding the coefficients (, ) that makes the log of
the likelihood function (LL < 0) as large as possible
• The maximum likelihood estimates solve the following
condition:
{Y - p(Y=1)}Xi = 0
summed over all observations, i = 1,…,n
 Since:
ln[p/(1-p)] =  + X + e

The slope coefficient () is interpreted as the rate of change in


the "log odds" as X changes … not very useful.
Zenegnaw Abiy Hailu (PhD) 31
Maximum Likelihood Estimation (MLE)
• OLS adapts the model to the data you have : you only
have one model derived from your data.

• MLE instead supposes there is an infinity of models,


and chooses the model most likely to explain your data.

Zenegnaw Abiy Hailu (PhD) 32


Example: Binary Dependent Variable
• We want to explore the factors affecting the probability of
being successful innovator (inno = 1)
– 352 (81.7%) innovate and 79 (18.3%) do not.
• The odds of carrying out a successful innovation is about 4
against 1 (as 352/79=4.45).
• The log of the odds is 1.494 (z = 1.494)
• For the sample (and the population?) of firms the probability
of being innovative is four times higher than the probability
of NOT being innovative
Zenegnaw Abiy Hailu (PhD) 33
Logistic Regression
• Let’s start and run a constant only model
– logit inno
.
. logit inno

Iteration 0: log likelihood = -205.30803

Logistic regression Number of obs = 431


LR chi2(0) = 0.00
Prob > chi2 = .
Log likelihood = -205.30803 Pseudo R2 = 0.0000

inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

_cons 1.494183 .1244955 12.00 0.000 1.250177 1.73819

Zenegnaw Abiy Hailu (PhD) 34


Interpretation of Coefficients

• What does this simple model tell us ?

• Remember that we need to use the logit formula to


transform the logit into a probability :

 Xβ 
e
P(Y  1| X)   Xβ 
1 e

Zenegnaw Abiy Hailu (PhD) 35


Interpretation of Coefficients
1.494
e
P  0.817
1 e1.494

• The constant 1.491 must be interpreted as the log of the odds


ratio.
• Using the logit link function, the average probability to
innovate is
dis exp(_b[_cons])/(1+exp(_b[_cons]))
• We find exactly the empirical sample value: 81.7%

Zenegnaw Abiy Hailu (PhD) 36


Interpretation of Coefficients
• A positive coefficient indicates that the probability of innovation
success increases with the corresponding explanatory variable.
• A negative coefficient implies that the probability to innovate
decreases with the corresponding explanatory variable.
• Warning! One of the problems encountered in interpreting
probabilities is their non-linearity: the probabilities do not vary
in the same way according to the level of regressors
• This is the reason why it is normal in practice to calculate the
probability of (the event occurring) at the average point of the
sample

Zenegnaw Abiy Hailu (PhD) 37


Interpretation of Coefficients
• Let’s run the more complete model
• logit inno lrdi lassets spe biotech
. logit inno lrdi lassets spe biotech

Iteration 0: log likelihood = -205.30803


Iteration 1: log likelihood = -167.71312
Iteration 2: log likelihood = -163.57746
Iteration 3: log likelihood = -163.45376
Iteration 4: log likelihood = -163.45352

Logistic regression Number of obs = 431


LR chi2(4) = 83.71
Prob > chi2 = 0.0000
Log likelihood = -163.45352 Pseudo R2 = 0.2039

inno Coef. Std. Err. z P>|z| [95% Conf. Interval]

lrdi .7527497 .2110683 3.57 0.000 .3390634 1.166436


lassets .997085 .1368534 7.29 0.000 .7288574 1.265313
spe .4252844 .4204924 1.01 0.312 -.3988654 1.249434
biotech 3.799953 .577509 6.58 0.000 2.668056 4.93185
_cons -11.63447 1.937191 -6.01 0.000 -15.43129 -7.837643

Zenegnaw Abiy Hailu (PhD) 38


Interpretation of Coefficients
-11.63 0.75rdi 0.99lassets 0.43spe3.79 biotech
e
P -11.63 0.75rdi 0.99lassets 0.43spe3.79 biotech
1 e
• Using the sample mean values of rdi, lassets, spe and
biotech, we compute the conditional probability :
-11.63 0.75rdi  0.99lassets  0.43spe 3.79biotech
e
P -11.63 0.75rdi  0.99lassets  0.43spe 3.79biotech
1 e
1.953
e
  0.8758
1 e 1.953

Zenegnaw Abiy Hailu (PhD) 39


Evaluating the Performance of the
Model
• There are several statistics which can be used for comparing
alternative models or evaluating the performance of a
single model:

– Model Chi-Square

– Pseudo-R2

Zenegnaw Abiy Hailu (PhD) 40


Model Chi-Square
• Test of the overall model (model chi-square test). Compares the
researcher’s model to a reduced model (the baseline model with
the constant only).

• The model’s likelihood ratio (LR), statistic is


LR[i] = -2[LL() - LL(, ) ]
LR[i] = [-2LL (of beginning model)] - [-2LL (of ending model)]

• The LR statistic is distributed chi-square with i degrees of freedom,


where i is the number of independent variables

Zenegnaw Abiy Hailu (PhD) 41


Pseudo - R2
• One psuedo-R2 statistic is the McFadden's-R2 statistic:

McFadden's-R2 = 1 - [LL(,)/LL()]

where the R2 is a scalar measure which varies between 0

and (somewhat close to) 1 much like the R2 in a OLS

model.

Zenegnaw Abiy Hailu (PhD) 42


Categorical Outcomes
• Ordered models assume there's some underlying, unobservable
true outcome variable, occurring on an interval scale.

• We don't observe that interval-level information about the


outcome, but only whether that unobserved value crosses some
threshold(s) that put the outcome into a lower or a higher
category, categories which are ranked, revealing ordinal but not
interval-level information.

• Standard logistic is not applicable unless you ‘threshold’ the date


or collapse categories
Zenegnaw Abiy Hailu (PhD) 43
Ordinal Logistic Regression

• Ordinal Dependent Variable


– Degree of Agreement

– Level of satisfaction

– Level of motivation

– Ability level (e.g. literacy, reading)

Zenegnaw Abiy Hailu (PhD) 44


Ordered Logit: Motivation
• Linear regression is often used for ordered categorical outcomes

• Ex: Strongly disagree=0, disagree=1, agree=2, etc.

• This makes arbitrary – usually unjustifiable – assumptions


about the distance between categories

– Why not: Strongly disagree=0, disagree=3, agree=3.5?

• If numerical values assigned to categories do not


accurately reflect the true distance, linear regression may
not be appropriate

Zenegnaw Abiy Hailu (PhD) 45


Ordered Logit: Motivation
• Strategies to deal with ordered categorical variables
1. Use OLS regression anyway

– Commonly done; but can give incorrect results

– Possibly check robustness by varying coding of interval


between outcomes

2. Collapse variables to dichotomy, use a binary model

– Combine “strongly disagree” & “disagree”; “strongly agree” &


“agree” and then model “disagree” vs. “agree”

– Works fine, but “throws away” useful information.


Zenegnaw Abiy Hailu (PhD) 46
Ordered Logit
• Ordered logit is often conceptualized as a latent variable model
• Observed responses result from individuals falling within
ranges on an underlying continuous measure
– Example: There is some underlying variable “agreement”…
• If you fall below a certain (unobserved) threshold, you’ll
respond “strongly disagree”
– Whereas logit looks at P(Y=1), ologit looks at probability of
falling in particular ranges…
• If you are using ordered logit, you will get results that include
“cut points” (intercepts) and coefficients.
Zenegnaw Abiy Hailu (PhD) 47
Logistic Regression—ordered logit
• OLR essentially runs multiple equations—one less than the
number of options on one’s scale.

• For example, assume that you have a 4 point scale,


1=not at all optimistic, 2=not very optimistic,
3=somewhat optimistic, and 4=very optimistic.
• The first equation compares the likelihood that
y=1 to the likelihood that y does not =1 (that is,
y=2 or 3 or 4)
• The second equation compares the likelihood that y=1 or 2
to the likelihood that y=3 or 4.
• The third equation compares the likelihood that y=1, 2, or 3
to the likelihood that y=4.
Zenegnaw Abiy Hailu (PhD) 48
Ordered Logit Example: Environment Spending

• Government spending on the environment

• GSS question: is the government spending too little


money, about the right amount, too much?

• GSS variable “envspending”

• Recoded: 1 = too little, 2 = about right, 3 = too much

Zenegnaw Abiy Hailu (PhD) 49


Ordered logit Example
. ologit envspend educ incomea female age dblack class city suburb attendchurch

Ordered logistic regression Number of obs = 5169


LR chi2(9) = 192.88
Prob > chi2 = 0.0000
Log likelihood = -4191.1232 Pseudo R2 = 0.0225

------------------------------------------------------------------------------
envspend | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0419784 .0108409 3.87 0.000 .0207307 .0632261
income | .0023984 .0057545 0.42 0.677 -.0088802 .013677
female | .2753095 .0591542 4.65 0.000 .1593693 .3912496
age | -.012762 .0017667 -7.22 0.000 -.0162247 -.0092994
dblack | .2898025 .0930178 3.12 0.002 .1074911 .472114
class | -.0719344 .0485173 -1.48 0.138 -.1670266 .0231578
city | .227895 .080983 2.81 0.005 .0691711 .3866188
suburb | .0752643 .0695921 1.08 0.279 -.0611337 .2116624
attendchurch | -.086372 .0109998 -7.85 0.000 -.1079312 -.0648128
-------------+----------------------------------------------------------------
/cut1 | -2.872315 .1930206 -3.250628 -2.494001
/cut2 | -.8156047 .1867621 -1.181652 -.4495577

Instead of a constant, ologit indicates “cutpoints”, which can be used to


compute probabilities of falling into a particular value of Y
Zenegnaw Abiy Hailu (PhD) 50
Ordered logit Example
. ologit envspend educ incomea female age dblack class city suburb attendchurch

Ordered logistic regression Number of obs = 5169


LR chi2(9) = 192.88
Prob > chi2 = 0.0000
Log likelihood = -4191.1232 Pseudo R2 = 0.0225

------------------------------------------------------------------------------
envspend | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0419784 .0108409 3.87 0.000 .0207307 .0632261
income | .0023984 .0057545 0.42 0.677 -.0088802 .013677
female | .2753095 .0591542 4.65 0.000 .1593693 .3912496
age | -.012762 .0017667 -7.22 0.000 -.0162247 -.0092994
dblack | .2898025 .0930178 3.12 0.002 .1074911 .472114
class | -.0719344 .0485173 -1.48 0.138 -.1670266 .0231578
city | .227895 .080983 2.81 0.005 .0691711 .3866188
suburb | .0752643 .0695921 1.08 0.279 -.0611337 .2116624
attendchurch | -.086372 .0109998 -7.85 0.000 -.1079312 -.0648128
-------------+----------------------------------------------------------------
/cut1 | -2.872315 .1930206 -3.250628 -2.494001
/cut2 | -.8156047 .1867621 -1.181652 -.4495577

Women have 1.32 times the odds of falling in a higher category


than men… a difference of (1-1.31)*100 = 32%.
Zenegnaw Abiy Hailu (PhD) 51
Proportional Odds Assumption
• The fact that you can calculate odds ratios highlights a key
assumption of ordered logit:
• “Proportional odds assumption”
• Also known as the “parallel regression assumption”
– Which also applies to ordered probit
– Model assumes that variable effects on the odds of lower vs.
higher outcomes are consistent
• Effect on odds of “too little” vs “about right” is same for
“about right” vs “too much”
– Controlling for all other vars in the model
– If this assumption doesn’t seem reasonable, consider
stereotype logit or multinomial logit.

Zenegnaw Abiy Hailu (PhD) 52


Ologit Interpretation
• Like logit, interpretation is difficult because effect of Xs on Y is
nonlinear
• Effects vary with values of all X variables
• Interpretation strategies are similar to logit:
– You can produce predicted probabilities
• For each category of Y: Y= 1, Y=2, Y=3
• For real or hypothetical cases
– You can look at effect of change in X on predicted
probabilities of Y
• Given particular values of X variables
– You can present marginal effects.

Zenegnaw Abiy Hailu (PhD) 53


Ordered logit vs. OLS
. reg envspend educ incomea female age dblack class city suburb attendchur

Source | SS df MS Number of obs = 5169


-------------+------------------------------ F( 9, 5159) = 21.27
Model | 71.1243142 9 7.90270158 Prob > F = 0.0000
Residual | 1916.7124 5159 .371527894 R-squared = 0.0358
-------------+------------------------------ Adj R-squared = 0.0341
Total | 1987.83672 5168 .384643328 Root MSE = .60953

------------------------------------------------------------------------------
envspend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+-------------------------------------------------------
---------
educ | .012701 .0032069 3.96 0.000 .0064141 .0189878
income | .0006037 .0016821 0.36 0.720 -.002694 .0039013
female | .0900251 .0173081 5.20 0.000 .0560938 .1239563
age | -.0038736 .0005258 -7.37 0.000 -.0049044 -.0028428
dblack | .0726494 .0261632 2.78 0.006 .0213585 .1239403
class | -.0165553 .0142495 -1.16 0.245 -.0444904 .0113797
city | .0555329 .0229917 2.42 0.016 .0104594 .1006065
suburb | .031217 .0205407 1.52 0.129 -.0090515 .0714855
attendchur | -.0243782 .0032213 -7.57 0.000 -.0306934 -.0180631
_cons | 2.618234 .0547459 47.83 0.000 2.510909 2.72556
In this case, OLS produced similar results to ordered logit. But, that doesn’t
always happen… and you won’t know if you don’t check.
Zenegnaw Abiy Hailu (PhD) 54
Multinomial Logistic Regression
• What if you want have a dependent variable has several non-
ordinal outcomes?

– Ex: Mullen, Goyette, Soares (2003): What kind of grad


school?

• None vs. MA vs MBA vs Prof’l School vs PhD.

Zenegnaw Abiy Hailu (PhD) 55


Multinomial Logistic Regression
• Multinomial Logit strategy: Contrast outcomes with a common
“reference point”

• Similar to conducting a series of 2-outcome logit models


comparing pairs of categories

• The “reference category” is like the reference group when


using dummy variables in regression

– It serves as the contrast point for all analyses

Zenegnaw Abiy Hailu (PhD) 56


Multinomial Logistic Regression
• Choice of “reference” category drives interpretation of
multinomial logit results

• Similar to when you use dummy variables…

1. Choose the contrast(s) that makes most sense

• Try out different possible contrasts

2. Be aware of the reference category when interpreting


results

• Otherwise, you can make BIG mistakes

• Effects are always in reference to the contrast category.


Zenegnaw Abiy Hailu (PhD) 57
MLogit Example: Family Vacation
. mlogit mode income familysize

Multinomial logistic regression Number of obs = 152


LR chi2(4) = 42.63
Prob > chi2 = 0.0000
Log likelihood = -138.68742 Pseudo R2 = 0.1332
Large families less likely to take bus (vs. train)
------------------------------------------------------------------------------
mode | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Bus |
income | .0311874 .0141811 2.20 0.028 .0033929 .0589818
family size | -.6731862 .3312153 -2.03 0.042 -1.322356 -.0240161
_cons | -.5659882 .580605 -0.97 0.330 -1.703953 .5719767
-------------+----------------------------------------------------------------
Car |
income | .057199 .0125151 4.57 0.000 .0326698 .0817282
family size | .1978772 .1989113 0.99 0.320 -.1919817 .5877361
_cons | -2.272809 .5201972 -4.37 0.000 -3.292377 -1.253241
------------------------------------------------------------------------------
(mode==Train is the base outcome)

Note: It is hard to directly compare Car vs. Bus in this table


Zenegnaw Abiy Hailu (PhD) 58
MLogit Example: Family Vacation
. mlogit mode income familysize

Multinomial logistic regression Number of obs = 152


LR chi2(4) = 42.63
Prob > chi2 = 0.0000
Log likelihood = -138.68742 Pseudo R2 = 0.1332

------------------------------------------------------------------------------
mode | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Bus |
income | .0311874 .0141811 2.20 0.028 .0033929 .0589818
family size | -.6731862 .3312153 -2.03 0.042 -1.322356 -.0240161
_cons | -.5659882 .580605 -0.97 0.330 -1.703953 .5719767
-------------+----------------------------------------------------------------
Car |
income | .057199 .0125151 4.57 0.000 .0326698 .0817282
family size | .1978772 .1989113 0.99 0.320 -.1919817 .5877361
_cons | -2.272809 .5201972 -4.37 0.000 -3.292377 -1.253241
------------------------------------------------------------------------------
(mode==Train is the base outcome)

Here, the pattern is clearer: Wealthy & large families use cars
Zenegnaw Abiy Hailu (PhD) 59
End

You might also like