STATA - Logit-Probit-Tobit - IInd Sem 23-24
STATA - Logit-Probit-Tobit - IInd Sem 23-24
STATA Command:
logit admit gre gpa i.rank
4
Interpretation:
⚫ The logistic regression coefficients give the change in the log odds of the outcome
for a one unit increase in the predictor variable.
– For every one unit change in gre, the log odds of admission (versus non-
admission) increases by 0.002.
– For a one unit increase in gpa, the log odds of being admitted to graduate school
increases by 0.804.
5
To test overall effect of rank To test for additional hypothesis:
⚫ We can test for an overall effect ⚫ We can also test additional hypotheses
of rank using the test command. about the differences in the coefficients
– Below we see that the overall for different levels of rank.
effect of rank is statistically – Below we test that the coefficient
significant. for rank=2 is equal to the coefficient
for rank=3.
STATA Command:
STATA Command:
6
ODDS RATIOS IN LOGISTIC
REGRESSION
⚫ Odds are defined as the ratio of – This looks a little strange but it is
the probability of success and the really saying that the odds of failure
probability of failure. are 1 to 4.
8
Another example [Pedhazur (1997)]
⚫ Suppose that seven out of 10 males are ⚫ Now we can use the probabilities to
admitted to an engineering school compute the odds of admission for
while three of 10 females are admitted. both males and females,
– Odds(male) = .7/.3 = 2.33333
⚫ The probabilities for admitting a male – Odds(female) = .3/.7 = .42857
are,
– p = 7/10 = .7 q = 1 – .7 = .3
⚫ Next, we compute the odds ratio
– If you are male, the probability of
for admission,
being admitted is 0.7 and the
probability of not being admitted is – OR = 2.3333/.42857 = 5.44
0.3.
⚫ Thus, for a male, the odds of being
⚫ Here are the same probabilities for admitted are 5.44 times larger than
females, the odds for a female being admitted.
– p = 3/10 = .3 q = 1 – .3 = .7
– If you are female it is just the
opposite, the probability of being
admitted is 0.3 and the probability of
9 not being admitted is 0.7.
Logistic regression in Stata
⚫ Data: In this example admit is coded 1 for yes and 0 for no and gender is coded 1 for
male and 0 for female.
⚫ STATA Command:
input admit gender freq
117
103
013
007
end
Admission
⚫ This data represents a 2×2 1 0
table that looks like this:
Gender 1 7 3
0 3 7
10
Logistic regression in Stata
STATA Command:
⚫ Logit
logit admit gender [fweight=freq], nolog or
⚫ Note that z = 1.74 for the coefficient for gender and for the odds ratio for gender.
11
12
Note:
⚫ Many Stata commands fit a model by maximum likelihood, and in so doing, they
include a report on the iterations of the algorithm towards (it is hoped) eventual
convergence.
⚫ There may be tens or even hundreds or thousands of such lines in a report, which
are faithfully recorded in any log file you may have open.
⚫ Those lines are of little or no statistical interest in most examples and may be
omitted by adding the nolog option
13
About Logits
⚫ The end result of all the mathematical manipulations is that the odds ratio can be
computed by raising e to the power of the logistic coefficient,
– [6] OR = eb = e1.69 = 5.44
16
⚫ Logit/Logistic regression for original problem:
– Now, coming back to our original problem (binary data)
– Dependent Variable: admit
– Independent Variables: gre, gpa & rank
STATA Command:
logit admit gre gpa i.rank
logit admit gre gpa i.rank, or
17
18
⚫ Interpretation:
– Now we can say that for a one unit increase in gpa, the odds of being admitted to
graduate school (versus not being admitted) increase by a factor of 2.23.
– Also, the odds can be interpreted as (for rank variable)
⚫ Odds of being admitted decrease for the student who has attended an
undergraduate institution with rank of 2, versus an institution with a rank of 1
and so on.
⚫ General Rule:
⚫ For Odd Ratio (OR)
– If OR > 1: Odds of being admitted increase
– If OR < 1: Odds of being admitted decrease
⚫ For Log (OR) or Logit
– If Logit > 0: Log odds of admission increase
– If Logit < 0: Log odds of admission decrease
19
Margins
⚫ You can also use predicted probabilities to help you understand the model.
– You can calculate predicted probabilities using the margins command.
– Below we use the margins command to calculate the predicted probability of
admission at each level of rank, holding all other variables in the model at their means.
STATA Command:
margins rank, atmeans
20
Margins - Interpretation
⚫ In the above output we see that the predicted probability of being accepted
into a graduate program is 0.51 for the highest prestige undergraduate
institutions (rank=1), and
– 0.18 for the lowest ranked institutions (rank=4), holding gre and gpa at
their means.
⚫ Please note that the predicted probabilities decreases with the decrease in the
rank
21
Things to consider
⚫ Pseudo-R-squared:
– Many different measures of psuedo-R-squared exist.
– They all attempt to provide information similar to that provided by R-squared in OLS regression;
however, none of them can be interpreted exactly as R-squared in OLS regression is interpreted.
⚫ Sample size:
– Both logit and probit models require more cases than OLS regression because they use
maximum likelihood estimation techniques.
– It is sometimes possible to estimate models for binary outcomes in datasets with only a small
number of cases using exact logistic regression (using the exlogistic command).
– It is also important to keep in mind that when the outcome is rare, even if the overall dataset is
large, it can be difficult to estimate a logit model.
22
PROBIT REGRESSION
Example 2:
⚫ A researcher is interested in how variables, such as GRE (Graduate Record Exam scores),
GPA (grade point average) and prestige of the undergraduate institution, effect admission
into graduate school.
⚫ The response variable, admit/don’t admit, is a binary variable.
24
Probit Model
⚫ You can also use predicted probabilities to help you understand the model.
– Below we use the margins command to calculate the predicted
probability of admission at each level of rank, holding all other variables
in the model at their means.
STATA Commands:
margins rank, atmeans
28
⚫ Margin Interpretation:
– In the above output we see that the predicted probability of being accepted into a
graduate program is 0.52 for the highest prestige undergraduate institutions
(rank=1), and 0.19 for the lowest ranked institutions (rank=4),
holding gre and gpa at their means.
29
Things to consider
⚫ Same as LOGIT
30
MULTINOMIAL LOGISTIC
REGRESSION
32
Examples of multinomial logistic regression Description of the data: hsbdemo
STATA Command:
tab prog ses, chi2
Conclusion: Reject H0
34
Multinomial logistic regression: STATA Example
⚫ mlogit Command:
– Below we use the mlogit command to estimate a multinomial logistic regression
model.
⚫ The i. before ses indicates that ses is a indicator variable (i.e., categorical
variable), and that it should be included in the model.
⚫ We have also used the option “base” to indicate the category we would want
to use for the baseline comparison group.
– In the model below, we have chosen to use the academic program type
as the baseline category.
⚫ Stata Commands:
mlogit prog i.ses write, base(2)
35
⚫ Output Interpretation:
– In the output above, we first see the iteration log, indicating how quickly the model
converged.
⚫ The log likelihood (-179.98173) can be used in comparisons of nested models, but we
won’t show an example of comparing models here
– The likelihood ratio chi-square of 48.23 with a p-value < 0.0001 tells us that our model as a
whole fits significantly better than an empty model (i.e., a model with no predictors)
36
⚫ The output above has two parts, labeled with the categories of the outcome variable prog.
– They correspond to the two equations below:
37
⚫ Coefficient Interpretation:
– write:
⚫ A one-unit increase in the variable write is associated with a .058 decrease in the relative
log odds of being in general program vs. academic program .
⚫ A one-unit increase in the variable write is associated with a .1136 decrease in the
38 relative log odds of being in vocation program vs. academic program.
⚫ Coefficient Interpretation:
– ses:
⚫ The relative log odds of being in general program vs. in academic program will decrease
by 1.163, if moving from the lowest level of ses (ses==1) to the highest level
39 of ses (ses==3).
Objective: To test for an overall effect of ses ⚫ More specifically, we can also test if the
⚫ We can test for an overall effect effect of 3.ses in predicting general vs.
of ses using the test command. academic equals the effect of 3.ses in
⚫ Below we see that the overall effect
predicting vocation vs. academic using
of ses is statistically significant. the test command again.
– The test shows that the effects are
not statistically different from
Stata Commands: each other.
test 2.ses 3.ses
Stata Commands:
Result test [general]3.ses = [vocation]3.ses
– chi2( 4) = 10.82
– Prob > chi2 = 0.0287 Result:
– [general]3.ses - [vocation]3.ses = 0
– chi2( 1) = 0.08
– Prob > chi2 = 0.7811
40
Margins
⚫ You can also use predicted probabilities to help you understand the model.
– You can calculate predicted probabilities using the margins command.
Stata Commands:
margins ses, atmeans predict(outcome(1))
margins ses, atmeans predict(outcome(2))
margins ses, atmeans predict(outcome(3))
41
42
Censored and Truncated
Regression: Tobit Regression
44
Examples of tobit regression
⚫ Example 1:
– In the 1980s there was a federal law restricting speedometer
readings to no more than 85 mph.
– So if you wanted to try and predict a vehicle’s top-speed
from a combination of horse-power and engine size, you
would get a reading no higher than 85, regardless of how fast
the vehicle was really traveling.
– This is a classic case of right-censoring (censoring from
above) of the data.
– The only thing we are certain of is that those vehicles were
traveling at least 85 mph.
45
⚫ Example 2:
– A research project is studying the level of lead in home
drinking water as a function of the age of a house and
family income.
– The water testing kit cannot detect lead concentrations below
5 parts per billion (ppb).
– The EPA considers levels above 15 ppb to be dangerous.
– These data are an example of left-censoring (censoring
from below).
46
⚫ Example 3:
– Consider the situation in which we have a measure of
academic aptitude (scaled 200-800) which we want to model
using reading and math test scores, as well as, the type of
program the student is enrolled in (academic, general, or
vocational).
– The problem here is that students who answer all questions on
the academic aptitude test correctly receive a score of 800,
even though it is likely that these students are not “truly” equal
in aptitude.
– The same is true of students who answer all of the questions
incorrectly. All such students would have a score of 200,
although they may not all be of equal aptitude.
47
Description of the data
48
⚫ Descriptive Statistics:
⚫ Stata Commands:
summarize apt read math
tabulate prog
histogram apt, normal bin(10) xline(800)
histogram apt, discrete freq
49
⚫ First Histogram:
⚫ Looking at the first histogram showing the distribution of apt, we can see the
censoring in the data, that is, there are far more cases with scores of 750
to 800 than one would expect looking at the rest of the distribution.
.004
.003
Density
.002
.001
10
5
0
⚫ Second Histogram:
⚫ Alternative histogram that further highlights the excess of cases where apt=800.
– In the second histogram, the discrete option produces a histogram where each
unique value of apt has its own bar.
– The freq option causes the y-axis to be labeled with the frequency for each value,
rather than the density.
– Because apt is continuous, most values of apt are unique in the dataset, although
close to the center of the distribution there are a few values of apt that have two or
three cases.
– The spike on the far right of the histogram is the bar for cases where apt=800,
the height of this bar relative to all the others clearly shows the excess number
51 of cases with this value.
⚫ Next we’ll explore the bivariate
relationships in our dataset.
read
Stata Commands: 80
800
⚫ Below we run the tobit model, using read, math, and prog to
predict apt.
⚫ The ul( ) option in the tobit command indicates the value at which
the right-censoring begins (i.e., the upper limit).
– There is also a ll( ) option to indicate the value of the left-
censoring (the lower limit) which was not needed in this
example.
⚫ STATA Commands:
tobit apt read math i.prog, ul(800)
53
Output Interpretation:
⚫ The final log likelihood (-1041.0629) is shown at the top of the output, it can be used
in comparisons of nested models, but we won’t show an example of that here.
⚫ Also at the top of the output we see that all 200 observations in our data set were
used in the analysis (fewer observations would have been used if any of our
variables had missing values).
– The likelihood ratio chi-square of 188.97 (df=4) with a p-value of 0.0001 tells us
that our model as a whole fits significantly better than an empty model (i.e., a
model with no predictors).
54
⚫ Coefficient Interpretation:
⚫ In the table we see the coefficients, their standard errors, the t-statistic,
associated p-values, and the 95% confidence interval of the coefficients.
– The coefficients for read and math are statistically significant, as is the
coefficient for prog=3.
55
⚫ Coefficient Interpretation:
⚫ Tobit regression coefficients are interpreted in the similiar manner to OLS regression
coefficients; however, the linear effect is on the uncensored latent variable, not the
observed outcome.
– For a one unit increase in read, there is a 2.7 point increase in the predicted value
of apt.
– A one unit increase in math is associated with a 5.91 unit increase in the predicted
value of apt.
– The terms for prog have a slightly different interpretation.
⚫ The predicted value of apt is 46.14 points lower for students in a vocational
56 program (prog=3) than for students in an academic program (prog=1).
⚫ The ancillary statistic /sigma is analogous to the square root of the
residual variance in OLS regression.
– The value of 65.67 can be compared to the standard deviation of
academic aptitude which was 99.21, a substantial reduction.
– The output also contains an estimate of the standard error of /sigma as
well as the 95% confidence interval.
57
⚫ Objective: To test for an overall effect of prog using
the test command.
– Below we see that the overall effect of prog is statistically
significant.
⚫ STATA Commands:
test 2.prog 3.prog
58
⚫ Objective: To test for additional hypotheses about the differences in
the coefficients for different levels of prog.
– Below we test that the coefficient for prog=2 is equal to the
coefficient for prog=3.
– In the output below we see that the coefficient for prog=2 is
significantly different than the coefficient for prog=3.
⚫ STATA Commands:
test 2.prog = 3.prog
59
THANK YOU
60
ORDERED LOGISTIC
REGRESSION
⚫ Example 1:
– A marketing research firm wants to investigate what factors
influence the size of soda (small, medium, large or extra large)
that people order at a fast-food chain.
⚫ These factors may include what type of sandwich is ordered
(burger or chicken), whether or not fries are also ordered,
and age of the consumer.
62
⚫ Example 2:
– A study looks at factors that influence the decision of
whether to apply to graduate school.
⚫ College juniors are asked if they are unlikely, somewhat
likely, or very likely to apply to graduate school.
⚫ Hence, our outcome variable has three categories.
– Data on parental educational status, whether the undergraduate
institution is public or private, and current GPA is also collected.
64
Descriptive Statistics:
⚫ Stata Commands:
tab apply
tab apply, nolab # No Label
table apply, cont(mean gpa sd gpa) # Contents of Table
65
Ordered logistic regression: STATA
⚫ ologit Command:
– Below we use the ologit command to estimate an ordered
logistic regression model.
⚫ The i. before pared indicates that pared is a factor variable
(i.e., categorical variable), and that it should be included in
the model as a series of indicator variables.
⚫ The same goes for i.public.
⚫ STATA Commands:
ologit apply i.pared i.public gpa
66
67
⚫ Output Interpretation:
– In the output above, we first see the iteration log.
⚫ At iteration 0, Stata fits a null model, i.e. the intercept-only model.
⚫ It then moves on to fit the full model and stops the iteration process
once the difference in log likelihood between successive iterations
become sufficiently small.
⚫ Coefficient Interpretation:
– In the table we see the coefficients, their standard errors, z-tests
and their associated p-values, and the 95% confidence interval of
the coefficients.
⚫ Both pared and gpa are statistically significant; public is not.
71
⚫ Ordinal Logistic Regression (Odd Ratios):
– We can obtain odds ratios using the or option after
the ologit command.
⚫ STATA Commands:
ologit apply i.pared i.public gpa, or
72
73
⚫ Coefficient Interpretation:
– In the output above the results are displayed as proportional odds ratios.
⚫ We would interpret these pretty much as we would odds ratios from a binary
logistic regression.
– For pared, we would say that for a one unit increase in pared, i.e., going from 0 to
1, the odds of high apply versus the combined middle and low categories
are 2.85 greater, given that all of the other variables in the model are held
constant.
⚫ Likewise, the odds of the combined middle and high categories versus
low apply is 2.85 times greater, given that all of the other variables in the
model are held constant.
– For a one unit increase in gpa, the odds of the high category of apply versus
the low and middle categories of apply are 1.85 times greater, given that the
other variables in the model are held constant.
⚫ Because of the proportional odds assumption, the same increase, 1.85
times, is found between low apply and the combined categories of
74 middle and high apply.
Proportional odds assumption
⚫ Because the relationship between all pairs of groups is the same, there is only
one set of coefficients (only one model).
– If this was not the case, we would need different models to describe the
relationship between each pair of outcome groups.
⚫ We need to test the proportional odds assumption, and there are tests that can
75 be used to do so.
⚫ Omodel test:
– First, we need to download a user-written command
called omodel (type search omodel).
– The first test that we will show does a likelihood ratio test.
⚫ The null hypothesis is that there is no difference in the
coefficients between models, so we "hope" to get a non-
significant result.
– Please note that the omodel command does not recognize
factor variables, so the i. is ommited.
76
omodel Tests
⚫ STATA Commands:
search omodel
omodel logit apply pared public gpa
77
78
⚫ The above tests indicate that we have not violated the
proportional odds assumption.
– If we had, we would want to run our model as a generalized
ordered logistic model using gologit2.
79
Margins
80
⚫ STATA Commands:
margins, at(pared=(0/1)) predict(outcome(0)) atmeans
margins, at(pared=(0/1)) predict(outcome(1)) atmeans
margins, at(pared=(0/1)) predict(outcome(2)) atmeans
81
82
⚫ As you can see, the predicted probability of being in the lowest
category of apply is 0.59 if neither parent has a graduate level
education and 0.34 otherwise.
– For the middle category of apply, the predicted probabilities are
0.33 and 0.47, and for the highest category of apply, 0.078 and
0.196.
83
Things to consider
⚫ Perfect prediction:
– Perfect prediction means that one value of a predictor variable is associated with
only one value of the response variable.
– If this happens, Stata will usually issue a note at the top of the output and will drop
the cases so that the model can run.
⚫ Sample size:
– Both ordered logistic and ordered probit, using maximum likelihood estimates,
require sufficient sample size.
– How big is big is a topic of some debate, but they almost always require more cases
than OLS regression.
⚫ Empty cells or small cells:
– You should check for empty or small cells by doing a crosstab between categorical
predictors and the outcome variable.
– If a cell has very few cases, the model may become unstable or it might not run at
all.
⚫ Pseudo-R-squared:
– There is no exact analog of the R-squared found in OLS.
84 – There are many versions of pseudo-R-squares.