0% found this document useful (0 votes)
195 views84 pages

STATA - Logit-Probit-Tobit - IInd Sem 23-24

The document discusses logistic regression, which models binary outcome variables. It describes how to perform logistic regression in STATA and interpret the results, specifically focusing on odds ratios. Key predictor variables for graduate school admission are examined, including GRE scores, GPA, and undergraduate institution rank.

Uploaded by

f20212579
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views84 pages

STATA - Logit-Probit-Tobit - IInd Sem 23-24

The document discusses logistic regression, which models binary outcome variables. It describes how to perform logistic regression in STATA and interpret the results, specifically focusing on odds ratios. Key predictor variables for graduate school admission are examined, including GRE scores, GPA, and undergraduate institution rank.

Uploaded by

f20212579
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Logit-Probit-Tobit

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


LOGISTIC REGRESSION

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


LOGISTIC REGRESSION

⚫ Logistic regression, also called a logit Description of the data: binary


model, is used to model dichotomous
outcome variables.
– In the logit model the log odds of
⚫ This data set has a binary response
the outcome is modeled as a linear (outcome, dependent) variable
combination of the predictor called admit.
variables.
⚫ There are three predictor
variables: gre gpa and rank.
Examples of logistic regression
– We will treat the
⚫ A researcher is interested in how
variables gre and gpa as continuous.
variables, such as GRE (Graduate
– The variable rank takes on the
Record Exam scores), GPA (grade point
average) and prestige of the values 1 through 4 [Institutions with
undergraduate institution, effect a rank of 1 have the highest
admission into graduate school. prestige, while those with a rank of
4 have the lowest].
– The response variable, admit/
don’t admit, is a binary variable.
Logistic regression: STATA Command

STATA Command:
logit admit gre gpa i.rank

4
Interpretation:

⚫ The logistic regression coefficients give the change in the log odds of the outcome
for a one unit increase in the predictor variable.
– For every one unit change in gre, the log odds of admission (versus non-
admission) increases by 0.002.

– For a one unit increase in gpa, the log odds of being admitted to graduate school
increases by 0.804.

– The indicator variables for rank have a slightly different interpretation.


⚫ For example, having attended an undergraduate institution with rank of 2,
versus an institution with a rank of 1, decreases the log odds of admission by
0.675.

5
To test overall effect of rank To test for additional hypothesis:

⚫ We can test for an overall effect ⚫ We can also test additional hypotheses
of rank using the test command. about the differences in the coefficients
– Below we see that the overall for different levels of rank.
effect of rank is statistically – Below we test that the coefficient
significant. for rank=2 is equal to the coefficient
for rank=3.
STATA Command:
STATA Command:

6
ODDS RATIOS IN LOGISTIC
REGRESSION

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


Introduction

⚫ Let’s begin with probability. ⚫ The odds of success are


– Let’s say that the probability of – odds(success) = p/(1-p) or p/q =

success is .8, thus .8/.2 = 4,


⚫ p = .8 – that is, the odds of success are 4 to
1.
– Then the probability of failure
is
⚫ q = 1 – p = .2 ⚫ The odds of failure would be
– odds(failure) = q/p = .2/.8 = .25.

⚫ Odds are defined as the ratio of – This looks a little strange but it is

the probability of success and the really saying that the odds of failure
probability of failure. are 1 to 4.

⚫ The odds of success and the odds of


failure are just reciprocals of one
another, i.e., 1/4 = .25 and 1/.25 = 4.

8
Another example [Pedhazur (1997)]
⚫ Suppose that seven out of 10 males are ⚫ Now we can use the probabilities to
admitted to an engineering school compute the odds of admission for
while three of 10 females are admitted. both males and females,
– Odds(male) = .7/.3 = 2.33333
⚫ The probabilities for admitting a male – Odds(female) = .3/.7 = .42857
are,
– p = 7/10 = .7 q = 1 – .7 = .3
⚫ Next, we compute the odds ratio
– If you are male, the probability of
for admission,
being admitted is 0.7 and the
probability of not being admitted is – OR = 2.3333/.42857 = 5.44
0.3.
⚫ Thus, for a male, the odds of being
⚫ Here are the same probabilities for admitted are 5.44 times larger than
females, the odds for a female being admitted.
– p = 3/10 = .3 q = 1 – .3 = .7
– If you are female it is just the
opposite, the probability of being
admitted is 0.3 and the probability of
9 not being admitted is 0.7.
Logistic regression in Stata

⚫ Data: In this example admit is coded 1 for yes and 0 for no and gender is coded 1 for
male and 0 for female.

⚫ STATA Command:
input admit gender freq
117
103
013
007
end
Admission
⚫ This data represents a 2×2 1 0
table that looks like this:
Gender 1 7 3
0 3 7
10
Logistic regression in Stata

⚫ In Stata, the logistic command produces results in terms of odds ratios


while logit produces results in terms of coefficients scales in log odds.

STATA Command:
⚫ Logit
logit admit gender [fweight=freq], nolog or

⚫ The above command is equivalent to:


⚫ Logistics
logistic admit gender [weight=freq], nolog

⚫ Note that z = 1.74 for the coefficient for gender and for the odds ratio for gender.

11
12
Note:
⚫ Many Stata commands fit a model by maximum likelihood, and in so doing, they
include a report on the iterations of the algorithm towards (it is hoped) eventual
convergence.
⚫ There may be tens or even hundreds or thousands of such lines in a report, which
are faithfully recorded in any log file you may have open.
⚫ Those lines are of little or no statistical interest in most examples and may be
omitted by adding the nolog option

13
About Logits

⚫ There is a direct relationship between ⚫ The logit transformation allows for a


the coefficients produced by logit and the linear relationship between the
odds ratios produced by logistic. response variable and the coefficients:
– [2] logit(p) = a + bX or
⚫ First, let’s define what is meant by a – [3] log(p/q) = a + bX
logit:
– A logit is defined as the log base e ⚫ This means that the coefficients in
(log) of the odds: logistic regression are in terms of the
– [1] logit(p) = log(odds) = log(p/q) log odds, that is, the coefficient 1.69
– The range is negative infinity to implies that a one unit change in
positive infinity. gender results in a 1.69 unit change in
the log of the odds.
⚫ Logistic regression is in reality an
ordinary regression using the logit as the
response variable.
14
15
⚫ [3] log(p/q) = a + bX

⚫ Equation [3] can be expressed in odds by getting rid of the log.


– This is done by taking e to the power for both sides of the equation.
– [4] elog(p/q) = ea + bX or
– [5] p/q = ea + bX

⚫ The end result of all the mathematical manipulations is that the odds ratio can be
computed by raising e to the power of the logistic coefficient,
– [6] OR = eb = e1.69 = 5.44

16
⚫ Logit/Logistic regression for original problem:
– Now, coming back to our original problem (binary data)
– Dependent Variable: admit
– Independent Variables: gre, gpa & rank

STATA Command:
logit admit gre gpa i.rank
logit admit gre gpa i.rank, or

17
18
⚫ Interpretation:
– Now we can say that for a one unit increase in gpa, the odds of being admitted to
graduate school (versus not being admitted) increase by a factor of 2.23.
– Also, the odds can be interpreted as (for rank variable)
⚫ Odds of being admitted decrease for the student who has attended an
undergraduate institution with rank of 2, versus an institution with a rank of 1
and so on.

⚫ General Rule:
⚫ For Odd Ratio (OR)
– If OR > 1: Odds of being admitted increase
– If OR < 1: Odds of being admitted decrease
⚫ For Log (OR) or Logit
– If Logit > 0: Log odds of admission increase
– If Logit < 0: Log odds of admission decrease
19
Margins
⚫ You can also use predicted probabilities to help you understand the model.
– You can calculate predicted probabilities using the margins command.
– Below we use the margins command to calculate the predicted probability of
admission at each level of rank, holding all other variables in the model at their means.

STATA Command:
margins rank, atmeans

20
Margins - Interpretation

⚫ In the above output we see that the predicted probability of being accepted
into a graduate program is 0.51 for the highest prestige undergraduate
institutions (rank=1), and
– 0.18 for the lowest ranked institutions (rank=4), holding gre and gpa at
their means.

⚫ Please note that the predicted probabilities decreases with the decrease in the
rank

21
Things to consider

⚫ Empty cells or small cells:


– You should check for empty or small cells by doing a crosstab between categorical predictors
and the outcome variable.
– If a cell has very few cases (a small cell), the model may become unstable or it might not run at all.

⚫ Pseudo-R-squared:
– Many different measures of psuedo-R-squared exist.
– They all attempt to provide information similar to that provided by R-squared in OLS regression;
however, none of them can be interpreted exactly as R-squared in OLS regression is interpreted.

⚫ Sample size:
– Both logit and probit models require more cases than OLS regression because they use
maximum likelihood estimation techniques.
– It is sometimes possible to estimate models for binary outcomes in datasets with only a small
number of cases using exact logistic regression (using the exlogistic command).
– It is also important to keep in mind that when the outcome is rare, even if the overall dataset is
large, it can be difficult to estimate a logit model.
22
PROBIT REGRESSION

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


Probit regression
Probit regression, also called a probit model, is used to model dichotomous or
binary outcome variables.

Examples of Probit regression


Example 1:
⚫ Suppose that we are interested in the factors that influence whether a political candidate
wins an election.
⚫ The outcome (response) variable is binary (0/1); win or lose.
⚫ The predictor variables of interest are the amount of money spent on the campaign, the
amount of time spent campaigning negatively and whether the candidate is an incumbent.

Example 2:
⚫ A researcher is interested in how variables, such as GRE (Graduate Record Exam scores),
GPA (grade point average) and prestige of the undergraduate institution, effect admission
into graduate school.
⚫ The response variable, admit/don’t admit, is a binary variable.
24
Probit Model

Description of the data: binary STATA Commands:


⚫ For our data analysis below, we are going to probit admit gre gpa i.rank
expand on Example 2 about getting into
graduate school.
– This data set has a binary response
(outcome, dependent) variable
called admit.

⚫ There are three predictor


variables: gre, gpa and rank.
– We will treat the variables gre and gpa as
continuous.
– The variable rank is ordinal, it takes on
the values 1 through 4 [Institutions with a
rank of 1 have the highest prestige, while
those with a rank of 4 have the lowest]
⚫ We will treat rank as categorical.
25
Output Interpretation:
⚫ In the output above, we first see the iteration log, indicating how quickly the model
converged.
– The log likelihood (-229.20658) can be used in comparisons of nested models, but we won’t
show an example of that here.
⚫ Also at the top of the output we see that all 400 observations in our data set were used in the
analysis (fewer observations would have been used if any of our variables had missing values).
– The likelihood ratio chi-square of 41.56 with a p-value of 0.0001 tells us that our model
as a whole is statistically significant, that is, it fits significantly better than a model with no
predictors.
26
Coefficient Interpretation:
⚫ Both gre, gpa, and the three indicator variables for rank are statistically significant.
⚫ The probit regression coefficients give the change in the z-score or probit index for a one
unit change in the predictor.
– For a one unit increase in gre, the z-score increases by 0.001.
– For each one unit increase in gpa, the z-score increases by 0.478.
– The indicator variables for rank have a slightly different interpretation.
⚫ For example, having attended an undergraduate institution of rank of 2, versus an
27 institution with a rank of 1 (the reference group), decreases the z-score by 0.415.
Margins

⚫ You can also use predicted probabilities to help you understand the model.
– Below we use the margins command to calculate the predicted
probability of admission at each level of rank, holding all other variables
in the model at their means.

STATA Commands:
margins rank, atmeans

28
⚫ Margin Interpretation:
– In the above output we see that the predicted probability of being accepted into a
graduate program is 0.52 for the highest prestige undergraduate institutions
(rank=1), and 0.19 for the lowest ranked institutions (rank=4),
holding gre and gpa at their means.
29
Things to consider

⚫ Same as LOGIT

30
MULTINOMIAL LOGISTIC
REGRESSION

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


Introduction

⚫ Multinomial logistic regression is used to model nominal outcome variables, in


which the log odds of the outcomes are modeled as a linear combination of the
predictor variables.

32
Examples of multinomial logistic regression Description of the data: hsbdemo

Example 1: ⚫ For our data analysis example, we


People’s occupational choices might be influenced by will expand the second example
their parents’ occupations and their own education using the hsbdemo data set.
level. – The data set contains variables

We can study the relationship of one’s occupation on 200 students.


choice with education level and father’s occupation. – The outcome variable is prog,

The occupational choices will be the outcome variable program type.


which consists of categories of occupations. – The predictor variables are
social economic status, ses, a
three-level categorical variable
Example 2:
and writing score, write, a
Entering high school students make program choices continuous variable.
among general program, vocational program and
academic program.
Their choice might be modeled using their writing
33 score and their social economic status.
⚫ Objective: To test whether there is any STATA Command
association between the two variables.
(Variation of Write across type of program):
⚫ H0: Assumes that there is no
association between the two variables.
table prog, con(mean write sd write)
⚫ H1: Assumes that there is an association
between the two variables

STATA Command:
tab prog ses, chi2

Conclusion: Reject H0
34
Multinomial logistic regression: STATA Example

⚫ mlogit Command:
– Below we use the mlogit command to estimate a multinomial logistic regression
model.
⚫ The i. before ses indicates that ses is a indicator variable (i.e., categorical
variable), and that it should be included in the model.
⚫ We have also used the option “base” to indicate the category we would want
to use for the baseline comparison group.
– In the model below, we have chosen to use the academic program type
as the baseline category.

⚫ Stata Commands:
mlogit prog i.ses write, base(2)

35
⚫ Output Interpretation:
– In the output above, we first see the iteration log, indicating how quickly the model
converged.
⚫ The log likelihood (-179.98173) can be used in comparisons of nested models, but we
won’t show an example of comparing models here

– The likelihood ratio chi-square of 48.23 with a p-value < 0.0001 tells us that our model as a
whole fits significantly better than an empty model (i.e., a model with no predictors)
36
⚫ The output above has two parts, labeled with the categories of the outcome variable prog.
– They correspond to the two equations below:

37
⚫ Coefficient Interpretation:
– write:
⚫ A one-unit increase in the variable write is associated with a .058 decrease in the relative
log odds of being in general program vs. academic program .
⚫ A one-unit increase in the variable write is associated with a .1136 decrease in the
38 relative log odds of being in vocation program vs. academic program.
⚫ Coefficient Interpretation:
– ses:
⚫ The relative log odds of being in general program vs. in academic program will decrease
by 1.163, if moving from the lowest level of ses (ses==1) to the highest level
39 of ses (ses==3).
Objective: To test for an overall effect of ses ⚫ More specifically, we can also test if the
⚫ We can test for an overall effect effect of 3.ses in predicting general vs.
of ses using the test command. academic equals the effect of 3.ses in
⚫ Below we see that the overall effect
predicting vocation vs. academic using
of ses is statistically significant. the test command again.
– The test shows that the effects are
not statistically different from
Stata Commands: each other.
test 2.ses 3.ses
Stata Commands:
Result test [general]3.ses = [vocation]3.ses
– chi2( 4) = 10.82
– Prob > chi2 = 0.0287 Result:
– [general]3.ses - [vocation]3.ses = 0
– chi2( 1) = 0.08
– Prob > chi2 = 0.7811
40
Margins

⚫ You can also use predicted probabilities to help you understand the model.
– You can calculate predicted probabilities using the margins command.

⚫ Below we use the margins command to calculate the predicted probability of


choosing each program type at each level of ses, holding all other variables in the model
at their means.
– Since there are three possible outcomes, we will need to use the margins command
three times, one for each outcome value.

Stata Commands:
margins ses, atmeans predict(outcome(1))
margins ses, atmeans predict(outcome(2))
margins ses, atmeans predict(outcome(3))

41
42
Censored and Truncated
Regression: Tobit Regression

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


TOBIT ANALYSIS

⚫ The tobit model, also called a censored regression model, is


designed to estimate linear relationships between variables
when there is either left- or right-censoring in the dependent
variable (also known as censoring from below and above,
respectively).
– Censoring from above takes place when cases with a value at
or above some threshold, all take on the value of that threshold,
so that the true value might be equal to the threshold, but it might
also be higher.
– In the case of censoring from below, values those that fall at or
below some threshold are censored.

44
Examples of tobit regression

⚫ Example 1:
– In the 1980s there was a federal law restricting speedometer
readings to no more than 85 mph.
– So if you wanted to try and predict a vehicle’s top-speed
from a combination of horse-power and engine size, you
would get a reading no higher than 85, regardless of how fast
the vehicle was really traveling.
– This is a classic case of right-censoring (censoring from
above) of the data.
– The only thing we are certain of is that those vehicles were
traveling at least 85 mph.

45
⚫ Example 2:
– A research project is studying the level of lead in home
drinking water as a function of the age of a house and
family income.
– The water testing kit cannot detect lead concentrations below
5 parts per billion (ppb).
– The EPA considers levels above 15 ppb to be dangerous.
– These data are an example of left-censoring (censoring
from below).

46
⚫ Example 3:
– Consider the situation in which we have a measure of
academic aptitude (scaled 200-800) which we want to model
using reading and math test scores, as well as, the type of
program the student is enrolled in (academic, general, or
vocational).
– The problem here is that students who answer all questions on
the academic aptitude test correctly receive a score of 800,
even though it is likely that these students are not “truly” equal
in aptitude.
– The same is true of students who answer all of the questions
incorrectly. All such students would have a score of 200,
although they may not all be of equal aptitude.

47
Description of the data

⚫ We have a hypothetical data file, tobit.dta with 200 observations.


– The academic aptitude variable is apt, the reading and math
test scores are read and math respectively.
– The variable prog is the type of program the student is in, it is a
categorical (nominal) variable that takes on three values,
academic (prog = 1), general (prog = 2), and vocational
(prog = 3).

48
⚫ Descriptive Statistics:

⚫ Stata Commands:
summarize apt read math
tabulate prog
histogram apt, normal bin(10) xline(800)
histogram apt, discrete freq

⚫ Note that in this dataset, the lowest value of apt is 352.


– No students received a score of 200 (i.e. the lowest score
possible), meaning that even though censoring from below
was possible, it does not occur in the dataset.

49
⚫ First Histogram:
⚫ Looking at the first histogram showing the distribution of apt, we can see the
censoring in the data, that is, there are far more cases with scores of 750
to 800 than one would expect looking at the rest of the distribution.
.004
.003
Density

.002
.001

300 400 500 600 700 800


apt
50
20
15
Frequency

10
5
0

300 400 500 600 700 800


apt

⚫ Second Histogram:
⚫ Alternative histogram that further highlights the excess of cases where apt=800.
– In the second histogram, the discrete option produces a histogram where each
unique value of apt has its own bar.
– The freq option causes the y-axis to be labeled with the frequency for each value,
rather than the density.
– Because apt is continuous, most values of apt are unique in the dataset, although
close to the center of the distribution there are a few values of apt that have two or
three cases.
– The spike on the far right of the histogram is the bar for cases where apt=800,
the height of this bar relative to all the others clearly shows the excess number
51 of cases with this value.
⚫ Next we’ll explore the bivariate
relationships in our dataset.
read

Stata Commands: 80

correlate read math apt 60


math
graph matrix read math apt, half jitter(2) 40

800

⚫ In the last row of the scatterplot 600


apt

matrix shown above, we see the 400

scatterplots showing read and apt, 20 40 60 80 40 60 80

as well as math and apt.


– Note the collection of cases
at the top of each scatterplot
due to the censoring in the
distribution of apt.
52
Tobit Regression:STATA

⚫ Below we run the tobit model, using read, math, and prog to
predict apt.

⚫ The ul( ) option in the tobit command indicates the value at which
the right-censoring begins (i.e., the upper limit).
– There is also a ll( ) option to indicate the value of the left-
censoring (the lower limit) which was not needed in this
example.

⚫ STATA Commands:
tobit apt read math i.prog, ul(800)

53
Output Interpretation:
⚫ The final log likelihood (-1041.0629) is shown at the top of the output, it can be used
in comparisons of nested models, but we won’t show an example of that here.
⚫ Also at the top of the output we see that all 200 observations in our data set were
used in the analysis (fewer observations would have been used if any of our
variables had missing values).
– The likelihood ratio chi-square of 188.97 (df=4) with a p-value of 0.0001 tells us
that our model as a whole fits significantly better than an empty model (i.e., a
model with no predictors).
54
⚫ Coefficient Interpretation:
⚫ In the table we see the coefficients, their standard errors, the t-statistic,
associated p-values, and the 95% confidence interval of the coefficients.
– The coefficients for read and math are statistically significant, as is the
coefficient for prog=3.
55
⚫ Coefficient Interpretation:
⚫ Tobit regression coefficients are interpreted in the similiar manner to OLS regression
coefficients; however, the linear effect is on the uncensored latent variable, not the
observed outcome.
– For a one unit increase in read, there is a 2.7 point increase in the predicted value
of apt.
– A one unit increase in math is associated with a 5.91 unit increase in the predicted
value of apt.
– The terms for prog have a slightly different interpretation.
⚫ The predicted value of apt is 46.14 points lower for students in a vocational
56 program (prog=3) than for students in an academic program (prog=1).
⚫ The ancillary statistic /sigma is analogous to the square root of the
residual variance in OLS regression.
– The value of 65.67 can be compared to the standard deviation of
academic aptitude which was 99.21, a substantial reduction.
– The output also contains an estimate of the standard error of /sigma as
well as the 95% confidence interval.

⚫ Finally, the output provides a summary of the number of left-censored,


uncensored and right-censored values.

57
⚫ Objective: To test for an overall effect of prog using
the test command.
– Below we see that the overall effect of prog is statistically
significant.

⚫ STATA Commands:
test 2.prog 3.prog

58
⚫ Objective: To test for additional hypotheses about the differences in
the coefficients for different levels of prog.
– Below we test that the coefficient for prog=2 is equal to the
coefficient for prog=3.
– In the output below we see that the coefficient for prog=2 is
significantly different than the coefficient for prog=3.

⚫ STATA Commands:
test 2.prog = 3.prog

59
THANK YOU

60
ORDERED LOGISTIC
REGRESSION

McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008


Examples of ordered logistic regression

⚫ Example 1:
– A marketing research firm wants to investigate what factors
influence the size of soda (small, medium, large or extra large)
that people order at a fast-food chain.
⚫ These factors may include what type of sandwich is ordered
(burger or chicken), whether or not fries are also ordered,
and age of the consumer.

– While the outcome variable, size of soda, is obviously ordered,


the difference between the various sizes is not consistent.
⚫ The difference between small and medium is 10 ounces,
between medium and large 8, and between large and extra
large 12.

62
⚫ Example 2:
– A study looks at factors that influence the decision of
whether to apply to graduate school.
⚫ College juniors are asked if they are unlikely, somewhat
likely, or very likely to apply to graduate school.
⚫ Hence, our outcome variable has three categories.
– Data on parental educational status, whether the undergraduate
institution is public or private, and current GPA is also collected.

– The researchers have reason to believe that the "distances"


between these three points are not equal.
⚫ For example, the "distance" between "unlikely" and
"somewhat likely" may be shorter than the distance between
"somewhat likely" and "very likely".
63
Description of the data: ologit

⚫ For our data analysis below, we are going to expand on Example 2


about applying to graduate school.
– This hypothetical data set has a thee level variable
called apply (coded 0, 1, 2), that we will use as our outcome
variable.
– We also have three variables that we will use as predictors:
⚫ pared, which is a 0/1 variable indicating whether at least one
parent has a graduate degree;
⚫ public, which is a 0/1 variable where 1 indicates that the
undergraduate institution is public and 0 private, and
⚫ gpa, which is the student’s grade point average.

64
Descriptive Statistics:

⚫ Stata Commands:
tab apply
tab apply, nolab # No Label
table apply, cont(mean gpa sd gpa) # Contents of Table

65
Ordered logistic regression: STATA

⚫ ologit Command:
– Below we use the ologit command to estimate an ordered
logistic regression model.
⚫ The i. before pared indicates that pared is a factor variable
(i.e., categorical variable), and that it should be included in
the model as a series of indicator variables.
⚫ The same goes for i.public.

⚫ STATA Commands:
ologit apply i.pared i.public gpa

66
67
⚫ Output Interpretation:
– In the output above, we first see the iteration log.
⚫ At iteration 0, Stata fits a null model, i.e. the intercept-only model.
⚫ It then moves on to fit the full model and stops the iteration process
once the difference in log likelihood between successive iterations
become sufficiently small.

– The final log likelihood (-358.51244) is displayed again.


⚫ It can be used in comparisons of nested models.
– Also at the top of the output we see that all 400 observations in our data
set were used in the analysis.

– The likelihood ratio chi-square of 24.18 with a p-value of 0.0000 tells us


that our model as a whole is statistically significant, as compared to
the null model with no predictors.
⚫ The pseudo-R-squared of 0.0326 is also given.
68
69
Interpretation:

⚫ Coefficient Interpretation:
– In the table we see the coefficients, their standard errors, z-tests
and their associated p-values, and the 95% confidence interval of
the coefficients.
⚫ Both pared and gpa are statistically significant; public is not.

– So for pared, we would say that for a one unit increase


in pared (i.e., going from 0 to 1), we expect a 1.05 increase in
the log odds of being in a higher level of apply, given all of the
other variables in the model are held constant.

– For a one unit increase in gpa, we would expect a 0.62 increase


in the log odds of being in a higher level of apply, given that
all of the other variables in the model are held constant.
70
⚫ The cutpoints shown at the bottom of the output indicate where the
latent variable is cut to make the three groups that we observe in
our data.
– Note that this latent variable is continuous.
– In general, these are not used in the interpretation of the
results.
– latent variables are variables that are not directly observed
but are rather inferred (through a mathematical model) from
other variables that are observed

71
⚫ Ordinal Logistic Regression (Odd Ratios):
– We can obtain odds ratios using the or option after
the ologit command.

⚫ STATA Commands:
ologit apply i.pared i.public gpa, or

72
73
⚫ Coefficient Interpretation:
– In the output above the results are displayed as proportional odds ratios.
⚫ We would interpret these pretty much as we would odds ratios from a binary
logistic regression.

– For pared, we would say that for a one unit increase in pared, i.e., going from 0 to
1, the odds of high apply versus the combined middle and low categories
are 2.85 greater, given that all of the other variables in the model are held
constant.
⚫ Likewise, the odds of the combined middle and high categories versus
low apply is 2.85 times greater, given that all of the other variables in the
model are held constant.
– For a one unit increase in gpa, the odds of the high category of apply versus
the low and middle categories of apply are 1.85 times greater, given that the
other variables in the model are held constant.
⚫ Because of the proportional odds assumption, the same increase, 1.85
times, is found between low apply and the combined categories of
74 middle and high apply.
Proportional odds assumption

⚫ One of the assumptions underlying ordered logistic (and ordered probit)


regression is that the relationship between each pair of outcome groups is the
same.
– In other words, ordered logistic regression assumes that the coefficients that
describe the relationship between, say, the lowest versus all higher categories of
the response variable are the same as those that describe the relationship
between the next lowest category and all higher categories, etc.

⚫ This is called the proportional odds assumption or the parallel regression


assumption.

⚫ Because the relationship between all pairs of groups is the same, there is only
one set of coefficients (only one model).
– If this was not the case, we would need different models to describe the
relationship between each pair of outcome groups.

⚫ We need to test the proportional odds assumption, and there are tests that can
75 be used to do so.
⚫ Omodel test:
– First, we need to download a user-written command
called omodel (type search omodel).
– The first test that we will show does a likelihood ratio test.
⚫ The null hypothesis is that there is no difference in the
coefficients between models, so we "hope" to get a non-
significant result.
– Please note that the omodel command does not recognize
factor variables, so the i. is ommited.

76
omodel Tests

⚫ STATA Commands:
search omodel
omodel logit apply pared public gpa

77
78
⚫ The above tests indicate that we have not violated the
proportional odds assumption.
– If we had, we would want to run our model as a generalized
ordered logistic model using gologit2.

79
Margins

⚫ We can also obtain predicted probabilities, which are usually


easier to understand than the coefficients or the odds ratios.
– We will use the margins command.

⚫ This can be used with either a categorical variable or a continuous


variable and shows the predicted probability for each of the
values of the variable specified.
– We will use pared as an example with a categorical predictor.
– Here we will see how the probabilities of membership to each
category of apply change as we vary pared and hold the other
variable at their means

80
⚫ STATA Commands:
margins, at(pared=(0/1)) predict(outcome(0)) atmeans
margins, at(pared=(0/1)) predict(outcome(1)) atmeans
margins, at(pared=(0/1)) predict(outcome(2)) atmeans

81
82
⚫ As you can see, the predicted probability of being in the lowest
category of apply is 0.59 if neither parent has a graduate level
education and 0.34 otherwise.
– For the middle category of apply, the predicted probabilities are
0.33 and 0.47, and for the highest category of apply, 0.078 and
0.196.

⚫ Hence, if neither of a respondent ‘s parents have a graduate


level education, the predicted probability of applying to
graduate school decreases.

83
Things to consider
⚫ Perfect prediction:
– Perfect prediction means that one value of a predictor variable is associated with
only one value of the response variable.
– If this happens, Stata will usually issue a note at the top of the output and will drop
the cases so that the model can run.
⚫ Sample size:
– Both ordered logistic and ordered probit, using maximum likelihood estimates,
require sufficient sample size.
– How big is big is a topic of some debate, but they almost always require more cases
than OLS regression.
⚫ Empty cells or small cells:
– You should check for empty or small cells by doing a crosstab between categorical
predictors and the outcome variable.
– If a cell has very few cases, the model may become unstable or it might not run at
all.
⚫ Pseudo-R-squared:
– There is no exact analog of the R-squared found in OLS.
84 – There are many versions of pseudo-R-squares.

You might also like