Mlogit 2004
Mlogit 2004
multinomial logit
The data for this exercise comes for the 1991 General Social Survey. The categorical
dependent variable occ is coded as follows:
The independent variables are : educ is years of schooling; age is age in years; sexx
is coded 1 male, 0 female; rural is coded 1 if grew up in rural area, 0 otherwise.
1. tab occ
2. mlogit occ,base(0)
------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 |
_cons | .3659343 .0992281 3.688 0.000 .1714508 .5604177
---------+--------------------------------------------------------------------
2 |
_cons | .2137977 .1025124 2.086 0.037 .0128771 .4147183
------------------------------------------------------------------------------
(Outcome occ==0 is the comparison group)
The coefficients above are on the logodds scale. In particular, they are the log odds
of being in occupation 1 versus 0 and 2 versus 0. Hence, they should equal the
following:
------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 |
educ | .2175129 .0495753 4.388 0.000 .120347 .3146788
_cons | -2.341483 .6221847 -3.763 0.000 -3.560943 -1.122024
---------+--------------------------------------------------------------------
2 |
educ | .7404903 .0630034 11.753 0.000 .6170059 .8639747
_cons | -9.937645 .8608307 -11.544 0.000 -11.62484 -8.250448
------------------------------------------------------------------------------
(Outcome occ==0 is the comparison group)
To get the coefficients on the odds ratio scale we just add the option ,rrr like so:
------------------------------------------------------------------------------
occ | RRR Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 |
educ | 1.242981 .0616212 4.388 0.000 1.127888 1.369819
---------+--------------------------------------------------------------------
2 |
educ | 2.096963 .1321158 11.753 0.000 1.853371 2.372572
------------------------------------------------------------------------------
(Outcome occ==0 is the comparison group)
The interpretation of the odds ratio is analogous to logistic regression. Hence, for
category 1, exp(.2175129)= 1.242981, and similarly for category 2. This means that one
additional year of schooling multiplies the odds of being in occupation 1 rather than
0 by 1.2429, i.e., one year of schooling increases the odds of being in category 1
instead of 0 by about 24%. Similarly, the odds of being in category 2 instead of 0
are more than doubled (2.09) for each one year increase in schooling.
Hence, if one additional year of schooling increases the logodds of occ 2 instead of 0
by .7404, and increases the logodds of 1 instead of 0 by .2175, then it increases the
logodds of 2 versus 1 (taking occ 1 as the base category) by .7404-.2175=.5229
To get the odds ratio, we just take exp(.5229)=1.687. Note that this is identical
(aside from rounding error) to the ratio of the odds ratio for category 2 to the odds
ratio for category 1 from the regression above with 0 as the base category:
2.09/1.24=.5229.
------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 |
educ | -.2175129 .0495753 -4.388 0.000 -.3146788 -.120347
_cons | 2.341483 .6221847 3.763 0.000 1.122024 3.560943
---------+--------------------------------------------------------------------
2 |
educ | .5229774 .0514263 10.169 0.000 .4221837 .6237711
_cons | -7.596161 .7404896 -10.258 0.000 -9.047494 -6.144828
------------------------------------------------------------------------------
(Outcome occ==1 is the comparison group)
Note that the education coefficient for the comparison of occupation 0 to occupation 1
is identical in magnitude but opposite in sign to the education coefficient for the
comparison of occ 1 to occ 0.
Now that you know how to recover coefficients and odds ratios by hand, here’s a
command that does it automatically and covers all possibilities:
6. listcoef educ
Odds comparing |
Group 1 - Group 2 | b z P>|z| e^b e^bStdX
------------------+---------------------------------------------
1 -2 | -0.52298 -10.169 0.000 0.5928 0.2415
1 -0 | 0.21751 4.388 0.000 1.2430 1.8056
2 -1 | 0.52298 10.169 0.000 1.6870 4.1403
2 -0 | 0.74049 11.753 0.000 2.0970 7.4758
0 -1 | -0.21751 -4.388 0.000 0.8045 0.5538
0 -2 | -0.74049 -11.753 0.000 0.4769 0.1338
----------------------------------------------------------------
Probability interpretations
How about computing the probability of being in each occupation for a given value of
schooling. To do this, ask stata to compute the probabilities with the following
command:
7. predict p0 p1 p2
Now to get these for each year of schooling value, I did the following (which by the
way, destroys your original data file)
educ p0 p1 p2 summ_p
1. 3 0.8438 0.1559 0.0004 1
2. 4 0.8127 0.1866 0.0008 1
3. 5 0.7768 0.2217 0.0015 1
4. 6 0.7359 0.2611 0.0030 1
5. 7 0.6899 0.3042 0.0059 1
6. 8 0.6385 0.3499 0.0115 1
7. 9 0.5817 0.3963 0.0220 1
8. 10 0.5192 0.4396 0.0412 1
9. 11 0.4506 0.4743 0.0751 1
10. 12 0.3763 0.4923 0.1314 1
11. 13 0.2977 0.4842 0.2181 1
12. 14 0.2194 0.4435 0.3371 1
13. 15 0.1485 0.3731 0.4784 1
14. 16 0.0919 0.2871 0.6210 1
15. 17 0.0525 0.2038 0.7437 1
16. 18 0.0281 0.1358 0.8360 1
17. 19 0.0144 0.0866 0.8990 1
18. 20 0.0072 0.0536 0.9392 1
Notice that for each value of education, the probabilities (as given by summ_p) sum to
1.
Here’s an example of computing the probabilities. I do this for case of educ=16 years.
As an exercise, you should try to compute some of the other probabilities at some
other levels of education to make sure you know how.
Note that the effect of a one year change in schooling on the probability, of, say,
being in occupation 2 depends on the value of schooling that you start from. This is
just like the binary case and is due to the fact that the probabilities are a
nonlinear function of schooling.