Limited Dependent Variables Models-1
Limited Dependent Variables Models-1
Com Finance
Lectured by
Edson Mbedzi
1
Session outline
At the end of this session, students must be able to:
• Understand and differentiate various limited dependent variables models; LPM, Logit and Probit.
• Apply limited dependent variables models using selected computer software packages and
interpret the results based on given set of data.
• When a dummy variable is used as an explanatory variable (independent variable) in a regression model, it does not
give a lot of problems (except dealing with the dummy variable trap).
• However, there are many cases in finance or economics where the explained variable (dependent variable) is
qualitative.
• The qualitative information would be coded as a categorical variable (i.e. dummy variable or alternate variables). This
would be referred to as a limited dependent variable and needs to be treated differently.
• The term refers to any problem where the values that the dependent variables takes are limited to certain integers e.g.
0, 1, 2, 3, 4 for types of financing options or where it is a binary situation (only 0 or 1 for loan approval or denial).
• There are several examples of real life situations like this in finance e.g.
1. Why firms choose to list their shares on the NASDAQ rather than on the NYSE (a 2 level dependent – binary
variable).
2. What factors affect whether countries default on their foreign debt or not default (a 2 level dependent – binary
variable).
3. Why some firms choose to issue new stock while others issue bonds or use debt to finance expansions (a 3 level
dependent – multiple dependent variable).
In all the above cases, the dependent variable is not numeric but categorical with varying category levels.
LPM, Logit and Probit Models
• There are three models that can be used when we have limited dependent variables.
1. Linear Probability Model (LPM) (uses linear multiple regression model and estimated by OLS)
2. Nonlinear Probability Models
• Probit (Uses cumulative standard normal distribution)
• Logit (Uses cumulative standard logistic distribution)
• Both logit and probit model approaches overcome the limitation of the LMP that it can produce
estimated probability distributions that are either negative or greater than one. Whereas, logit
and probit functions transforms the regression model such that the fitted values are bounded
within the (0,1) interval.
Example: Probability of mortgage denial given the ratio
of debt payments to income (P/I ratio) on same data
Nonlinear Probability Models
• Probabilities must not be less than 0 or greater than 1 for a model to be robust or strong.
• To address this problem we will consider nonlinear probability models of a binary variable
taking values 1 or 0. We know that the expected value of a binary variable Y is
• E [Y] = 1* Pr(Y = 1) + 0 * Pr(Y = 0) = Pr(Y = 1)
Pr(Yi 1) P ( Z )
Example
Z 2 3xi
• Suppose we have only 1 regressor and
i y 1 x 0.4
i
• We want to know the probability that when
Z 2 3(0.4) 0.8
Pr(Yi 1) P ( Z 0.8) F ( 0.8)
•
Nonlinear Probability Models
2. Probit Model
yi 1
• Probit regression models the probability that
Example Z 2 3xi
• Suppose we have only 1 regressor and
yi 1 xi 0.4
• We
Z want
2 3(0.to
4) know
0.8 the probability that when
Research Question.
What factors affect pass rate at ‘O’ Level and how effective are these factors in influencing pupils’ pass rate?
The following data was collected from 15 770 students who wrote O’ levels examinations across the country
in Zimbabwe for the November 2018 ZIMSEC examinations.
Dependent variable: Pass 5 or more O level subjects or Fail (binary variable = two levels).
Independent variables: Ethnic group, Gender, Parents’ Social Economic Class, Parent status, Tuition paid
for private classes.
The dependent variable like this one with only two levels is called a limited dependant variable and logistic
regression or probit regression can be used.
Where,
Y = Categories of performance of a student.
X = Characteristics of parents of student with factors SEC, and parent status.
Z = Characteristics of a child with factors gender and attending paid fees extra lessons.
= Error term.
Variables definitions
Variable Definitions
Pass OL A proxy for PASSING 5 ‘O’ LEVELS SUBJECT by the student and is a binary variable.
Takes a value of 1 if a student passed 5 ‘O’ Levels and above and 0 otherwise.
Gender A proxy for “GENDER OF STUDENT” measured in binary form. Takes a value of 1 if male
and 0 if female.
Parent A proxy for “PARENT STATUS OF CHILD’S HOME” denoted as 1 if the child is from a
status single parent household and 0 if coming from a household with both parents.
SEC A proxy for the SOCIAL ECONOMIC STATUS OF PARENTS is a nominal variable. Takes
values: 1= Higher managerial & Executives, 2 = Lower managerial & professional, 3 =
Intermediate occupations, 4 = Small employers & own account, 5= Lower supervisory &
technical, 6= Semi-routine occupations,
7= Routine occupations and 8= Never worked/long term unemployed
Tuition A proxy for “PARENT PAYING FEES FOR STUDENT TO ATTEND EXTRA LESSONS”
denoted as 1 if the child attends extra lessons and 0 otherwise.
Presentation and discussion of quantitative results
Presenting and interpreting results
• Results section of your project/dissertation/thesis must cover the following:
1. Descriptive statistics
2. Assumptions diagnosis
3. Results
4. Discussion of results
• Before presenting your main results, it is important first to understand your data. So,
present simple descriptive statistics of the data as indicator of the expected results as
well as the nature and scope of your data.
Results report the extent to which different parents and child characteristics affect the
pass rate of students (Table 3). These were deduced using a binary logistic regression
since the dependent variable, passing ‘O’ Level is dichotomous, i.e. the student either
passes or fail. The results followed a stepwise process in each model to allow for the
addition of more independent variables onto the model from model (1) to model (3). This
procedure was repeated several times to select the independent variables’ combination
that best suits the data and that also produces a model with a high level of robustness.
Model (1) captures the constant and only three independent variables that is the socio-
economic status of the parents, household ownership and gender of the child. In Model (2)
the ethnicity group of the child is added while in Model (3), the last independent variable,
which is whether the child attended private extra lessons or not is added. In each model,
three statistical values are presented. These are the regression beta coefficients (B), the
standard errors (SE) and the odds ratios (OR) for each of the variable category of
independent variables. The level of statistical significances of each of the coefficients in
the model is denoted by the number of asterisks at the bottom of Table 3.
Factors affecting students pass rate
Model 1 Model 2 Model 3
Variables B SE OR B SE OR B SE OR
Constant 1.02*** 0.06 2.76 0.11 0.71 1.12 0.04 0.71 1.04
Social Economic Class
Lower managerial -0.75*** 0.07 0.52 -0.65*** 0.07 0.52 -0.64*** 0.07 0.53
Intermediate Occupations -1.05*** 0.09 0.35 -1.06*** 0.09 0.35 -1.04*** 0.09 0.36
Small employers -1.17*** 0.08 0.31 -1.21*** 0.08 0.30 -1.19*** 0.08 0.31
Lower supervisory -1.58*** 0.08 0.21 -1.61*** 0.08 0.20 -1.56*** 0.08 0.21
Semi-routine occupations -1.70*** 0.08 0.18 -1.74*** 0.08 0.18 -1.70*** 0.08 0.18
Routine occupations -1.97*** 0.09 0.14 -2.02*** 0.09 0.13 -1.97*** 0.09 0.14
Unemployment -2.11*** 0.10 0.12 -2.21*** 0.10 0.11 -2.15*** 0.10 0.12
Nagelkerke 𝑅2
p<0.001 p<0.001 p<0.001
15.7% 16.7% 17.2%
Hosmer & Lemeshow P= 0.742 P=0.816 P=0.769
goodness of fit test
Classification accuracy 64.6% 64.8% 65.3%
* =10% significance, ** = 5% significance and *** = 1% significance
4.2.1 Model diagnostic for pass rate of students
To test the robustness of the model, various testing tools were used. First, the -2 Log Likelihood
Test is dropping from model (1) through model (3) (15532.485, 15427.278 and 15377.095
respectively) and these values are all significant according to the corresponding three Omnibus
Test Chi Square tests ( (9, N=15770) = 1549.878, p<0.001), (17, N=15770) = 1655.085, p<0.001
and (18, N=15770) = 1705.267, p<0.001). This means that model (2) fits the data better than
model (1) and similarly model (3) also better fits the data than model (2), and thus model (3) is
the best model to adopt. Second, the Hosmer and Lemeshow goodness of fit test and this test is
significant since the data fails to reject the null hypothesis at all the three model levels from
model (1) to model (3) (p=0.742, p= 0.816, p= 0.769 respectively). Third, the Nagelkerke is
improving from 15.7%, 16.7% to 17.22% as one progresses from model (1) to model (3) and this
puts model (3) as a better model in terms of the extent to which the variations in the dependent
variable are explained by the changes on independent variables in the model. The classification
accuracy values increases from 64.6%, 64.8% to 65.35% as one moves from model (1) to model
(3), again putting model (3) as a better accurately classified model. Given the robustness of the
above diagnostic tests, discussions of final results are based on model (3).
4.2.2 Effect of Socio-economic class of parents on pass rate of students
With reference to the socio-economic classes’ variables, high managerial position of work of the
parent is the reference category, with a base odds ratio of 1. The odds ratio for each of the other
classes of socio-economic status are compared against this reference category’s odd ratio one by
one or holding other classes’ odds ratios constant. The odds ratios indicates that pass rate of
students whose parents are in low managerial positions is reduced by 47% (0.53-1) compared to
those whose parents are in high managerial positions. Similarly, odds ratios are reduced by 64% for
students whose parents are in intermediate occupations compared to students whose parents are in
high managerial positions. The trend in the results is that pass rate decreases as the family from
which a child comes from are in a lower socio-economic class. Since all the cases are statistically
significant (p<0.01), this places families in higher socio-economic classes as more likely to have
students passing than students from lower socio-economic classes.
4.2.3 Effect of parent status and gender of child on pass rate of students
With reference to the parent of a child’s status, a child with both parents was used as the reference
category, with a base odds ratio of 1. The odds ratios indicates that pass rate of students who live
with single parent is reduced by 36% (0.64 - 1) compared to those with both parents and the results
are statistically significant (p<0.01). Female was the reference for gender and results show that
male students are 1.49 times likely to pass than female students or pass rate is increased by 49%
for boys compared to girls. Finally, students whose parents do not take them for extra paid for
lessons were used as a reference category. In this case, students who attend extra classes paid for
by parents are 1.69 times more likely to pass than those students who do not attend extra lessons
at all. The results indicate that both status of the parent and gender of a child have an effect in their
performance and these results are significant in all the cases.
4.5 Discussion of Results
1. You summarise what you found out, i.e. what is the effect on pass rate of a child due to social
economic class, status and extra classes paying ability of parents and gender?
2. For each of the results, always compare what you get in your results with what is said in the
literature, clearly stating the sources of that literature.
2. Also show how these findings support or disagree with main theories you have identified in your
literature.
3. If the results disagree with common theories, try to find reasons why is this is the case in that
population, what is unique about the characteristics of the population which makes it different from
other populations used in other areas or studies?
4. Based on these results and what is said in the literature, you must recommend appropriate policy
interventions e.g. in this case there is clearly a problem with single parents households, low social
economic class families and female students. Therefore, any policy intervention must therefore aim
at addressing shortcomings associated with incapacities in each of these three classes.