0% found this document useful (0 votes)
20 views23 pages

Limited Dependent Variables Models-1

Uploaded by

n02124508x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views23 pages

Limited Dependent Variables Models-1

Uploaded by

n02124508x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

B.

Com Finance

Financial econometrics and data analysis

Limited Dependent Variables Models

Lectured by

Edson Mbedzi

Department of Finance & Fiscal Sciences

1
Session outline
At the end of this session, students must be able to:

• Define limited dependent variables.

• Formulate limited dependent variables given a set of economic problems.

• Understand and differentiate various limited dependent variables models; LPM, Logit and Probit.

• Test assumptions of limited dependent variables models.

• Apply limited dependent variables models using selected computer software packages and
interpret the results based on given set of data.

• Present the results for limited dependent variables academically.


Limited dependent variables models
• There are times when we use dummy variables to numerically capture the information of qualitative variables, e.g. days
of the week effects, gender, types of banks, credit ratings, dealing methods etc.

• When a dummy variable is used as an explanatory variable (independent variable) in a regression model, it does not
give a lot of problems (except dealing with the dummy variable trap).

• However, there are many cases in finance or economics where the explained variable (dependent variable) is
qualitative.

• The qualitative information would be coded as a categorical variable (i.e. dummy variable or alternate variables). This
would be referred to as a limited dependent variable and needs to be treated differently.

• The term refers to any problem where the values that the dependent variables takes are limited to certain integers e.g.
0, 1, 2, 3, 4 for types of financing options or where it is a binary situation (only 0 or 1 for loan approval or denial).

• There are several examples of real life situations like this in finance e.g.
1. Why firms choose to list their shares on the NASDAQ rather than on the NYSE (a 2 level dependent – binary
variable).
2. What factors affect whether countries default on their foreign debt or not default (a 2 level dependent – binary
variable).
3. Why some firms choose to issue new stock while others issue bonds or use debt to finance expansions (a 3 level
dependent – multiple dependent variable).

In all the above cases, the dependent variable is not numeric but categorical with varying category levels.
LPM, Logit and Probit Models
• There are three models that can be used when we have limited dependent variables.
1. Linear Probability Model (LPM) (uses linear multiple regression model and estimated by OLS)
2. Nonlinear Probability Models
• Probit (Uses cumulative standard normal distribution)
• Logit (Uses cumulative standard logistic distribution)
• Both logit and probit model approaches overcome the limitation of the LMP that it can produce
estimated probability distributions that are either negative or greater than one. Whereas, logit
and probit functions transforms the regression model such that the fitted values are bounded
within the (0,1) interval.
Example: Probability of mortgage denial given the ratio
of debt payments to income (P/I ratio) on same data
Nonlinear Probability Models
• Probabilities must not be less than 0 or greater than 1 for a model to be robust or strong.
• To address this problem we will consider nonlinear probability models of a binary variable
taking values 1 or 0. We know that the expected value of a binary variable Y is
• E [Y] = 1* Pr(Y = 1) + 0 * Pr(Y = 0) = Pr(Y = 1)

Pr(Yi 1) P ( Z )

With Z  0  1 x1i  ....   k x1i  i


and 0 P ( Z ) 1
• We will consider 2 nonlinear functions.
1. The logit model : Lets say Y has 2 levels 0 or 1, then logistic regression models the
yi 1 that
probability
1
F (Zi ) 
The logistic
1 function
 z

F, which is a function of any random variable, Z, would be
, where is the exponential 1 under the logit approach and F is the cumulative
P 
standard1 logistic
 (  0  1x1i distribution.
i ...  k xki   i )

The logistic model would be


Pi yi 1
Where is the probability that . Logit uses these probability values in the regression.
Nonlinear Probability Models
yi 1
• Logistic regression models the probability that

• Using the cumulative standard logistic distribution function.


1
F (Zi ) 
1   z

• Evaluated atZ  0  1 x1i  ....   k x1i  i

• Since F ( Z ) P ( Z  z ) predicted probabilities of the logit model are between 0 and 1.

Example
Z  2  3xi
• Suppose we have only 1 regressor and

i y 1 x 0.4
i
• We want to know the probability that when
Z  2  3(0.4)  0.8
Pr(Yi 1) P ( Z  0.8) F ( 0.8)

Nonlinear Probability Models
2. Probit Model
yi 1
• Probit regression models the probability that

• Using the cumulative standard normal distribution function.


F ( Z i )  ( z )

Z  0  1 x1i  ....   k x1i  i


• Evaluated at
 ( Z ) P ( Z  z )
• Since predicted probabilities of the probit model are between 0 and 1.

Example Z  2  3xi
• Suppose we have only 1 regressor and
yi 1 xi 0.4
• We
Z want
2  3(0.to
4) know
 0.8 the probability that when

Pr(Yi 1) P( Z  0.8)  ( 0.8)



Logistic regression for categorical variables
Types of logistic regressions
1. Binary logistic regression – used when the categorical dependent variable is nominal
but has only two levels.
• E.g. You want to find out how getting employed is influenced by a number of
factors? (education, experience, background, gender, ethnicity etc.)
• Getting Employed is binary dependent variable – Either one is Employed or Not
employed.
• 2. Ordinal logistic regression – Used when the dependent variable is ranked or ordered.
• E.g. You want test how the level of satisfaction of customers of a bank is influenced
by a number of factors? (staff gender, number of visits, types of bank services etc.)
• Satisfaction may have four ranks – 1 Very highly satisfied, 2 Highly satisfied, 3
lowly satisfied , 4, Very lowly satisfied.
• 3. Multinomial logistic regression – Used when the categorical dependent variable is
nominal with more than 2 levels.
• E.g. Which subject major is chosen by postgraduate students and what factors
influence that? (Grade, mathematical aptitude, IQ, parent’s profile etc.)
• Subject major has 3 levels – Sciences, Arts or Commerce, and here there is no
order in the categories of the dependent variable.
Example of binary logistic regression
Research Problem:
There is a wide variation in the pass rate of ZIMSEC ‘O’ Level Examinations by students in Zimbabwe from
the different regions of the country. Passing or not passing of these exams determines whether these
students can proceed with further education or not, a determination that influence local development of
each region. The Government wants to understand what accounts for these variations, and what policy
interventions can be taken by the Government to reduce variations and national inequalities associated with
that at national level.

Research Question.
What factors affect pass rate at ‘O’ Level and how effective are these factors in influencing pupils’ pass rate?

The following data was collected from 15 770 students who wrote O’ levels examinations across the country
in Zimbabwe for the November 2018 ZIMSEC examinations.
Dependent variable: Pass 5 or more O level subjects or Fail (binary variable = two levels).
Independent variables: Ethnic group, Gender, Parents’ Social Economic Class, Parent status, Tuition paid
for private classes.

The dependent variable like this one with only two levels is called a limited dependant variable and logistic
regression or probit regression can be used.

Let’s do it but see the next two slide below first!!!


Methods and Data

1. Measuring pass rate of students


a) Binary logistics Yi   1 X i   2 Z i   i --------------------------------------(1)
1 if a student passed

Y 
0 if a student failed

Where,
Y = Categories of performance of a student.
X = Characteristics of parents of student with factors SEC, and parent status.
Z = Characteristics of a child with factors gender and attending paid fees extra lessons.
 = Error term.
Variables definitions
Variable Definitions
Pass OL A proxy for PASSING 5 ‘O’ LEVELS SUBJECT by the student and is a binary variable.
Takes a value of 1 if a student passed 5 ‘O’ Levels and above and 0 otherwise.
Gender A proxy for “GENDER OF STUDENT” measured in binary form. Takes a value of 1 if male
and 0 if female.
Parent A proxy for “PARENT STATUS OF CHILD’S HOME” denoted as 1 if the child is from a
status single parent household and 0 if coming from a household with both parents.
SEC A proxy for the SOCIAL ECONOMIC STATUS OF PARENTS is a nominal variable. Takes
values: 1= Higher managerial & Executives, 2 = Lower managerial & professional, 3 =
Intermediate occupations, 4 = Small employers & own account, 5= Lower supervisory &
technical, 6= Semi-routine occupations,
7= Routine occupations and 8= Never worked/long term unemployed
Tuition A proxy for “PARENT PAYING FEES FOR STUDENT TO ATTEND EXTRA LESSONS”
denoted as 1 if the child attends extra lessons and 0 otherwise.
Presentation and discussion of quantitative results
Presenting and interpreting results
• Results section of your project/dissertation/thesis must cover the following:
1. Descriptive statistics
2. Assumptions diagnosis
3. Results
4. Discussion of results
• Before presenting your main results, it is important first to understand your data. So,
present simple descriptive statistics of the data as indicator of the expected results as
well as the nature and scope of your data.

• It is important to understand your data.

• As you discuss the descriptive statistics, comment on the following.


1. Common trends outlined on the data
2. Categories with most and least influence on the dependent.
3. The relationship between these results and theory.
Descriptive statistics for pass rate of students
4.0 Introduction
Before any analysis of the impact of various factors on the pass rate of students, it is
important to understand the pattern of the sample data in descriptive terms and the
extent to which these descriptions are in conformity with literature. For that purpose,
descriptive statistics on nature and scope of the students’ pass rate and each of the
independent factors are presented in the following sections.

4.1 Descriptive statistics on pass rate of students


An assessment of the students’ pass rate by the different parent and child characteristics
were derived through cross-tabulations. Students from different socio-economic classes,
gender and family parent status have different levels of pass rate (Table 1). At a glance,
students from affluent families performs better than students from disadvantaged
families, boys are better than girls, children with both parents do better than those with
single parents and those attending extra lessons are better than those who do not. While
in general, students didn’t pass ordinary level (53.3%) compared to those that passed
(46.7%), there is however a variation on the level of pass rate by different socio-economic
status, which is directly related the socio-economic class of the child’s parents. These
variations are somewhat not surprising because the level of upbringing of a child affect
their performance at school (Nkuah, Tanyeh and Kala 2013).
Table 1: Pass rate of students by Parent and child factors
Pass O level 5 Total
No Yes
Higher managerial & executive 3.0% 9.3% 12.3%
Lower managerial & professional 9.5% 14.7% 24.2%
Intermediate occupations 3.8% 3.5% 7.3%
Parent socio-economic Small employers & own account 6.6% 6.4% 13.0%
class (SEC) Lower supervisory & technical 7.0% 4.4% 11.4%
Semi-routine hand occupations 8.4% 4.2% 12.7%
Routine hand occupations 7.7% 3.2% 10.9%
Unemployed 6.3% 2.0% 8.3%

Gender Male 29.0% 21.7% 50.8%


Female 24.3% 24.9% 49.2%
Single parent 16.9% 8.0% 24.9%
Parent status
Both parent 36.8% 38.3% 75.1%
Did not attended 4.8% 7.9% 12.7%
Extra lessons
Attend 48.8% 38.5% 87.3%

53.3% 46.7% 100.0


Average Total
%
4.2 Effects of child and parent factors on students pass rate

Results report the extent to which different parents and child characteristics affect the
pass rate of students (Table 3). These were deduced using a binary logistic regression
since the dependent variable, passing ‘O’ Level is dichotomous, i.e. the student either
passes or fail. The results followed a stepwise process in each model to allow for the
addition of more independent variables onto the model from model (1) to model (3). This
procedure was repeated several times to select the independent variables’ combination
that best suits the data and that also produces a model with a high level of robustness.

Model (1) captures the constant and only three independent variables that is the socio-
economic status of the parents, household ownership and gender of the child. In Model (2)
the ethnicity group of the child is added while in Model (3), the last independent variable,
which is whether the child attended private extra lessons or not is added. In each model,
three statistical values are presented. These are the regression beta coefficients (B), the
standard errors (SE) and the odds ratios (OR) for each of the variable category of
independent variables. The level of statistical significances of each of the coefficients in
the model is denoted by the number of asterisks at the bottom of Table 3.
Factors affecting students pass rate
Model 1 Model 2 Model 3
Variables B SE OR B SE OR B SE OR
Constant 1.02*** 0.06 2.76 0.11 0.71 1.12 0.04 0.71 1.04
Social Economic Class
Lower managerial -0.75*** 0.07 0.52 -0.65*** 0.07 0.52 -0.64*** 0.07 0.53
Intermediate Occupations -1.05*** 0.09 0.35 -1.06*** 0.09 0.35 -1.04*** 0.09 0.36
Small employers -1.17*** 0.08 0.31 -1.21*** 0.08 0.30 -1.19*** 0.08 0.31
Lower supervisory -1.58*** 0.08 0.21 -1.61*** 0.08 0.20 -1.56*** 0.08 0.21
Semi-routine occupations -1.70*** 0.08 0.18 -1.74*** 0.08 0.18 -1.70*** 0.08 0.18
Routine occupations -1.97*** 0.09 0.14 -2.02*** 0.09 0.13 -1.97*** 0.09 0.14
Unemployment -2.11*** 0.10 0.12 -2.21*** 0.10 0.11 -2.15*** 0.10 0.12

(Base =High managerial)


Single parent household
Yes -0.53*** 0.05 0.59 -0.46*** 0.05 0.63 -0.53*** 0.05 0.59
(Base=No)
Gender of child
Male 0.40*** 0.04 1.50 0.39*** 0.04 1.48
(base=Female)
Private class tuition paid
Yes 0.43*** 0.06 1.54
(Base=No)

𝑥2=106.238, df=1, 𝑥2= 105.207, df=8, 𝑥2=50.183, df=18,


-2LL test 15532.485 15427.278 15377.095
Omnibus Test

Nagelkerke 𝑅2
p<0.001 p<0.001 p<0.001
15.7% 16.7% 17.2%
Hosmer & Lemeshow P= 0.742 P=0.816 P=0.769
goodness of fit test
Classification accuracy 64.6% 64.8% 65.3%
* =10% significance, ** = 5% significance and *** = 1% significance
4.2.1 Model diagnostic for pass rate of students
To test the robustness of the model, various testing tools were used. First, the -2 Log Likelihood
Test is dropping from model (1) through model (3) (15532.485, 15427.278 and 15377.095
respectively) and these values are all significant according to the corresponding three Omnibus
Test Chi Square tests ( (9, N=15770) = 1549.878, p<0.001), (17, N=15770) = 1655.085, p<0.001
and (18, N=15770) = 1705.267, p<0.001). This means that model (2) fits the data better than
model (1) and similarly model (3) also better fits the data than model (2), and thus model (3) is
the best model to adopt. Second, the Hosmer and Lemeshow goodness of fit test and this test is
significant since the data fails to reject the null hypothesis at all the three model levels from
model (1) to model (3) (p=0.742, p= 0.816, p= 0.769 respectively). Third, the Nagelkerke is
improving from 15.7%, 16.7% to 17.22% as one progresses from model (1) to model (3) and this
puts model (3) as a better model in terms of the extent to which the variations in the dependent
variable are explained by the changes on independent variables in the model. The classification
accuracy values increases from 64.6%, 64.8% to 65.35% as one moves from model (1) to model
(3), again putting model (3) as a better accurately classified model. Given the robustness of the
above diagnostic tests, discussions of final results are based on model (3).
4.2.2 Effect of Socio-economic class of parents on pass rate of students
With reference to the socio-economic classes’ variables, high managerial position of work of the
parent is the reference category, with a base odds ratio of 1. The odds ratio for each of the other
classes of socio-economic status are compared against this reference category’s odd ratio one by
one or holding other classes’ odds ratios constant. The odds ratios indicates that pass rate of
students whose parents are in low managerial positions is reduced by 47% (0.53-1) compared to
those whose parents are in high managerial positions. Similarly, odds ratios are reduced by 64% for
students whose parents are in intermediate occupations compared to students whose parents are in
high managerial positions. The trend in the results is that pass rate decreases as the family from
which a child comes from are in a lower socio-economic class. Since all the cases are statistically
significant (p<0.01), this places families in higher socio-economic classes as more likely to have
students passing than students from lower socio-economic classes.
4.2.3 Effect of parent status and gender of child on pass rate of students
With reference to the parent of a child’s status, a child with both parents was used as the reference
category, with a base odds ratio of 1. The odds ratios indicates that pass rate of students who live
with single parent is reduced by 36% (0.64 - 1) compared to those with both parents and the results
are statistically significant (p<0.01). Female was the reference for gender and results show that
male students are 1.49 times likely to pass than female students or pass rate is increased by 49%
for boys compared to girls. Finally, students whose parents do not take them for extra paid for
lessons were used as a reference category. In this case, students who attend extra classes paid for
by parents are 1.69 times more likely to pass than those students who do not attend extra lessons
at all. The results indicate that both status of the parent and gender of a child have an effect in their
performance and these results are significant in all the cases.
4.5 Discussion of Results
1. You summarise what you found out, i.e. what is the effect on pass rate of a child due to social
economic class, status and extra classes paying ability of parents and gender?

2. For each of the results, always compare what you get in your results with what is said in the
literature, clearly stating the sources of that literature.

2. Also show how these findings support or disagree with main theories you have identified in your
literature.

3. If the results disagree with common theories, try to find reasons why is this is the case in that
population, what is unique about the characteristics of the population which makes it different from
other populations used in other areas or studies?

4. Based on these results and what is said in the literature, you must recommend appropriate policy
interventions e.g. in this case there is clearly a problem with single parents households, low social
economic class families and female students. Therefore, any policy intervention must therefore aim
at addressing shortcomings associated with incapacities in each of these three classes.

You might also like