0% found this document useful (0 votes)

206 views71 pages

Econometrics II Chapter One

1. This document discusses regression analysis using dummy variables to represent qualitative variables with two or more categories. 2. It explains that when a qualitative variable has multiple categories, the number of dummy variables used should be one less than the number of categories. 3. Key aspects of dummy variable regression are illustrated through examples of regressing salary on factors like sex, experience, and education level. Comparisons are made between categories by looking at differences in intercept coefficients.

Uploaded by

Genemo Fitala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

206 views71 pages

Econometrics II Chapter One

Uploaded by

Genemo Fitala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

1

CHAPTER ONE:
REGRESSION ON DUMMY VARIABLES
1. Introduction

 In a regression analysis, we usually face a qualitative response (dependent)

variable of the “yes” or “no” type.

 Discrete choice models dealing with such kind of binary responses are called
binary choice models.

 Technically, it is possible to estimate the binary choices using OLS.

 Such linear model for binary choices where OLS is used is called linear
probability model (LPM).
Cont’d…
3

 Such models are very useful in that they allow us to address

questions for which there is a “yes or no” answer.

 In a regression analysis, we usually face a qualitative response

(dependent) variable of the “yes” or “no” type.

 When examining the dummy dependent variables we need to

ensure there are sufficient numbers of 0s and 1s.
Regression on Dummy Variables
4
The nature of dummy variables: In regression analysis the dependent
variable is frequently influenced not only by variables that can be readily
quantified on some well-defined scale (e.g., income, output, prices, costs,
height, and temperature), but also by variables that are essentially
qualitative in nature (e.g., sex, race, color, religion, nationality, wars,
earthquakes, strikes, political upheavals, and changes in government
economic policy). For example, holding all other factors constant, female
college professors are found to earn less than their male counterparts,
and nonwhites are found to earn less than whites.
Regression on Dummy Variables
5
 This pattern may result from sex or racial discrimination, but
whatever the reason, qualitative variables such as sex and races do
influence the dependent variable and clearly should be included
among the explanatory variables.

 Since such qualitative variables usually indicate the presence or

absence of a “quality” or an attribute, such as male or female, black
or white, or Christian or Muslim, one method of “quantifying” such
attributes is by constructing artificial variables that take on values
of 1 or 0, 0 indicating the absence of an attribute and 1 indicating
the presence (or possession) of that attribute.
Regression on Dummy Variables
6

 For example, 1 may indicate that a person is a male, and 0 may

designate a female; or 1 may indicate that a person is a college
graduate, and 0 that he is not, and so on. Variables that assume
such 0 and 1 values are called dummy variables.

 Alternative names are indicator variables, binary variables,

categorical variables, and dichotomous variables.
Regression on Dummy Variables
7
 Dummy variables can be used in regression models just as easily as
quantitative variables. As a matter of fact, a regression model may contain
explanatory variables that are exclusively dummy, or qualitative, in nature.

 Example: 𝒀𝒊 = 𝒂 + 𝜷𝑫𝒊 + 𝑼𝒊 … … … … … … … … … … … … … … … … … . . (𝟓. 𝟎𝟏)

Example: Where Y=annual salary of a college professor
Di = 1 If male college professor
= 0 otherwise (i.e., female professor)
Note that (5.01) is like the two variable regression models encountered
previously except that instead of a quantitative X variable we have a dummy
variable D (hereafter, we shall designate all dummy variables by the letter D).
Cont’d
8
 Model (5.01) may enable us to find out whether sex makes any
difference in a college professor’s salary, assuming, of course, that
all other variables such as age, degree attained, and years of
experience are held constant. Assuming that the disturbance
satisfy the usually assumptions of the classical linear regression
model, we obtain from (5.01).

 Mean salary of female college professor𝑬(𝒀𝒊/𝑫𝒊= 𝟎) = 𝒂

 Mean salary of male college professor:𝑬(𝒀𝒊 /𝑫𝒊 = 𝟏) = 𝒂 + 𝜷

Cont’d
9

 that is, the intercept term 𝒂 gives the mean salary of female
college professors and the slope coefficient 𝜷 tells by how much the
mean salary of a male college professor differs from the mean salary
of his female counterpart, 𝒂 + 𝜷 reflecting the mean salary of the male
college professor.
 A test of the null hypothesis that there is no sex discrimination (H 0 :
𝜷= 0) can be easily made by running regression (5.01) in the usual
manner and finding out whether on the basis of the t test the
estimated 𝜷 is statistically significant.
1.2.Regression on one quantitative variable and one qualitative
variable with two classes or categories 10
 Consider the Model: 𝒀𝒊 = 𝞪𝒊 + 𝞪𝟐 𝑫𝒊 + 𝜷𝑿𝒊 + 𝑼𝒊…………………………………(5.03)

 Where: 𝒀𝒊 =annual salary of a college professor

 𝑿𝒊 = Years of teaching experience

 𝑫𝒊 =1 if male and 0= Otherwise(i.e if Female)

 Model (5.03) contains one quantitative variable (years of teaching experience) and one
qualitative variable (sex) that has two classes (or levels, classifications, or categories),
namely, male and female. What is the meaning of this equation? Assuming, as usual, that
E(ui ) = 0, we see that
Cont’d 11
 Mean salary of female college professor𝑬(𝒀𝒊/𝑫𝒊= 𝟎) = 𝞪𝟏 + 𝜷𝑿𝒊……..(5.04)
 Mean salary of male college professor:𝑬(𝒀𝒊 /𝑫𝒊 = 𝟏) = (𝞪𝟏 + 𝞪𝟐 ) + 𝜷𝑿𝒊………(5.05)
 Geometrically, we have the situation shown in fig. 5.01 (for illustration, it is assumed
that(𝞪𝟏 > 0 ).

 In words, model 5.01 postulates that the male and female college professors‟ salary
functions in relation to the years of teaching experience have the same slope (𝜷 ) but
different intercepts.

 In other words, it is assumed that the level of the male professor‟s mean salary is
different from that of the female professor‟s mean salary by(𝞪𝟐 ) but the rate of change
in the mean annual salary by years of experience is the same for both sexes.
Cont’d 12

Cont’d 13
 If the assumption of common slopes is valid, a test of the hypothesis
that the two regressions (5.04) and (5.05) have the same intercept
(i.e., there is no sex discrimination) can be made easily by running the
regression (5.03) and noting the statistical significance of the
estimated 𝞪𝟐 on the basis of the traditional t test.

 If the t test shows that𝞪𝟐 is statistically significant, we reject the

null hypothesis that the male and female college professors‟ levels of
mean annual salary are the same.
Cont’d 14

Before proceeding further, note the following features of the

dummy variable regression model considered previously.

 1.To distinguish the two categories, male and female, we have

introduced only one dummy Variable Di .For if Di = 1 always denotes
a male, when Di = 0 we know that it is a female since there are only
two possible outcomes.
Cont’d 15

 Hence, one dummy variable suffices to distinguish two categories.

The general rule is this: If a qualitative variable has „m‟ categories,
introduce only „m-1‟ dummy variables.

 In our example, sex has two categories, and hence we introduced

only a single dummy variable. If this rule is not followed, we shall
fall into what might be called the dummy variable trap, that is, the
situation of perfect multicollinearity.
Cont’d 16
 2. The assignment of 1 and 0 values to two categories, such as
male and female, is arbitrary in the sense that in our example we
could have assigned D=1 for female and D=0 for male.

 3. The group, category, or classification that is assigned the value

of 0 is often referred to as the base, benchmark, control,
comparison, reference, or omitted category. It is the base in the
sense that comparisons are made with that category.

 4. The coefficient 𝞪𝟐 attached to the dummy variable D can be called

the differential intercept coefficient because it tells by how much the
value of the intercept term of the category that receives the value of 1
differs from the intercept coefficient of the base category.
1.3.Regression on one quantitative variable and one qualitative
variable with more than two classes 17

 Suppose that, on the basis of the cross-sectional data, we want to

regress the annual expenditure on health care by an individual on the
income and education of the individual.

 Since the variable education is qualitative in nature, suppose we consider

three mutually exclusive levels of education: less than high school, high
school, and college.

 Now, unlike the previous case, we have more than two categories of the
qualitative variable education.
1.3.Regression on one quantitative variable and one qualitative
variable with more than two classes 18

 Therefore, following the rule that the number of dummies be one less
than the number of categories of the variable, we should introduce two
dummies to take care of the three levels of education.

 Assuming that the three educational groups have a common slope but
different intercepts in the regression of annual expenditure on health
care on annual income, we can use the following model:
1.3.Regression on one quantitative variable and one qualitative
variable with more than two classes 19
 Consider the Mod𝒆𝒍 𝒀𝒊 = 𝞪𝟏 + 𝞪𝟐 𝑫𝟐𝒊 + 𝞪𝟑 𝑫𝟑𝒊 + 𝜷𝑿𝒊 + 𝑼𝒊 …………………………………(5.06)

 Where: 𝒀𝒊 =annual expenditure on health care

 𝑿𝒊 = annual expenditure

 𝑫𝟐 =1 if high school education and 0= Otherwise

 𝑫𝟑 =1 if college education and 0= Otherwise

 Note that in the preceding assignment of the dummy variables we are arbitrarily
treating the “less than high school education” category as the base category.
Therefore, the intercept 𝞪𝟏 will reflect the intercept for this category.
1.3.Regression on one quantitative variable and one qualitative
variable with more than two classes 20

 The differential intercepts 𝞪𝟐 and 𝞪𝟑 tell by how much the intercepts of the other
two categories differ from the intercept of the base category, which can be readily
checked as follows: Assuming E(ui ) = 0 , we obtain from (5.06)

 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟎,/𝑫𝟑 = 𝟎, 𝑿𝒊 ) = 𝞪𝟏 + 𝜷𝑿𝒊

 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟏,/𝑫𝟑 = 𝟎, 𝑿𝒊 ) = (𝞪𝟏 +𝞪𝟐 ) +𝜷𝑿𝒊

 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟎,/𝑫𝟑 = 𝟏, 𝑿𝒊 ) = (𝞪𝟏 +𝞪𝟑 ) +𝜷𝑿𝒊

 which are, respectively the mean health care expenditure functions for the three levels
of education, namely, less than high school, high school, and college. Geometrically, the

situation is shown in fig 5.2 (for illustrative purposes it is assumed that𝞪𝟑 > 𝞪𝟐 ).
1.3.Regression on one quantitative variable and one qualitative
variable with more than two classes 21

1.4. Regression on one quantitative variable and two qualitative
variables 22

 The technique of dummy variable can be easily extended to handle more

than one qualitative variable. Let us revert to the college professors‟
salary regression (5.03), but now assume that in addition to years of
teaching experience and sex the skin color of the teacher is also an
important determinant of salary.

 For simplicity, assume that color has two categories: black and white. We

can now write (5.03) as:

Cont’d 23
 Consider the Mod𝒆𝒍 𝒀𝒊 = 𝞪𝟏 + 𝞪𝟐 𝑫𝟐𝒊 + 𝞪𝟑 𝑫𝟑𝒊 + 𝜷𝑿𝒊 + 𝑼𝒊 ……………(5.07)

 where Yi = annual Salary

 Xi = Years of teaching experience

 D2 = 1 if female

 = 0 if male

 D3 = 1 if white

 = 0 otherwise
Cont’d 24
 Notice that each of the two qualitative variables, sex and color, has two
categories and hence needs one dummy variable for each. Note also that
the omitted, or base, category now is “black female professor.”
 Assuming E(ui ) = 0 , we can obtain the following regression from (5.07)
Mean salary for black female professor:
 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟎,/𝑫𝟑 = 𝟎, 𝑿𝒊 ) = 𝞪𝟏 + 𝜷𝑿𝒊
Mean salary for black male professor
 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟏,/𝑫𝟑 = 𝟎, 𝑿𝒊 ) = (𝞪𝟏 +𝞪𝟐 ) +𝜷𝑿𝒊
Mean salary for white female professor:
 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟎,/𝑫𝟑 = 𝟏, 𝑿𝒊 ) = (𝞪𝟏 +𝞪𝟑 ) +𝜷𝑿𝒊
Mean salary for white male professor:
 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟏,/𝑫𝟑 = 𝟏, 𝑿𝒊 ) = (𝞪𝟏 +𝞪𝟐 + 𝞪𝟑 ) +𝜷𝑿𝒊
Cont’d 25
 Consider Once again, it is assumed that the preceding regressions differ

only in the intercept coefficient but not in the slope coefficient 𝜷.

 An OLS estimation of (5.06) will enable us to test a variety of hypotheses.

Thus, if 𝞪𝟑 is statistically significant, it will mean that color does affect a

professor‟s salary.
 Similarly, if𝞪𝟐 is statistically significant, it will mean that sex also affects a
professor‟s salary.
 If both these differential intercepts are statistically significant, it would
mean sex as well as color is an important determinant of professors’ salaries.
Cont’d 26

 From the preceding discussion it follows that we can extend our model to
include more than one quantitative variable and more than two qualitative
variables.

 The only precaution to be taken is that the number of dummies for each
qualitative variable should be one less than the number of categories of
that variable.
1.5.Testing for structural stability of regression models
27
 Until now, in the models considered in this chapter we assumed that the
qualitative variables affect the intercept but not the slope coefficient of
the various subgroup regressions.

 But what if the slopes are also different? If the slopes are in fact
different, testing for differences in the intercepts may be of little practical
significance.

 Therefore, we need to develop a general methodology to find out whether

two (or more) regressions are different, where the difference may be in the
intercepts or the slopes or both.
1.6. Interaction effects 28
 Consider the following Mod𝒆𝒍 𝒀𝒊 = 𝞪𝟏 + 𝞪𝟐 𝑫𝟐𝒊 + 𝞪𝟑 𝑫𝟑𝒊 + 𝜷𝑿𝒊 + 𝑼𝒊 ……………(5.08)

 where Yi = annual expenditure on clothing

 Xi = Income

 D2 = 1 if female

 = 0 if male

 D3 = 1 if college graduate

 = 0 otherwise
Cont’d 29
 Implicit in this model is the assumption that the differential effect of the
sex dummy D2 is constant across the two levels of education and the
differential effect of the education dummy D3 is also constant across the
two sexes. That is, if, say, the mean expenditure on clothing is higher for
females than males this is so whether they are college graduates or not.

 Likewise, if, say, college graduates on the average spend more on clothing
than non college graduates, this is so whether they are female or males.
Cont’d 30

 In many applications such an assumption may be untenable. A female

college graduate may spend more on clothing than a male graduate.

 In other words, there may be interaction between the two qualitative

variables D2 and D3 and therefore their effect on mean Y may not be
simply additive as in (5.08) but multiplicative as well, as in the following
model:
Cont’d 31
Consider the following Mod𝒆𝒍 𝒀𝒊 = 𝞪𝟏 + 𝞪𝟐 𝑫𝟐𝒊 + 𝞪𝟑 𝑫𝟑𝒊 + 𝞪𝟒 (𝑫𝟐𝒊 𝑫𝟑𝒊 ) + 𝜷𝑿𝒊 + 𝑼𝒊…(4.09)

 From (4.09) we obtain

 𝑬(𝒀𝒊 /𝑫𝟐 = 𝟏,/𝑫𝟑 = 𝟏, 𝑿𝒊 ) = (𝞪𝟏 +𝞪𝟐 + 𝞪𝟑 + 𝞪𝟒 ) +𝜷𝑿𝒊 ……………………………………………………..(4.10)

 which is the mean clothing expenditure of graduate females. Notice that

 𝞪𝟐 differential effect of being a female

 𝞪𝟑 differential effect of being a college graduate

 𝞪𝟒 differential effect of being a female graduate which shows that the mean clothing
expenditure of graduate females is different (by 𝞪𝟒 ) from the mean clothing
expenditure of females or college graduates.
Cont’d 32

 If 𝞪𝟐 , 𝞪𝟑 , and 𝞪𝟒 are all positive, the average clothing expenditure of

females is higher (than the base category, which here is male non
graduate), but it is much more so if the females also happen to be
graduates.

 Similarly, the average expenditure on clothing by a college graduate

tends to be higher than the base category but much more so if the
graduate happens to be a female.
1.7 The use of dummy variables in seasonal analysis
33

 Many economic time series based on monthly or quarterly data exhibit

seasonal patterns (regular oscillatory movement).

 Examples are sales of department stores at Christmastime, demand for

money (cash balances) by households at holiday times, demand for ice
cream and soft drinks during the summer, and prices of crops right after
the harvesting season.
Cont’d 34
 Often it is desirable to remove the seasonal factor, or component, from a time
series so that one may concentrate on the other components, such as the
trend.
 The process of removing the seasonal component from a time series is known
as deseasonalization, or seasonal adjustment, and the time series thus
obtained is called the deseasonalized or seasonally adjusted, time series.
 Important economic time series, such as the consumer price index, the
wholesale price index, the index of industrial production, are usually published
in the seasonably adjusted form.
1.8 Piecewise linear regression 35
 To illustrate yet another use of dummy variables, consider fig 5.3, which
shows how a hypothetical company remunerates its sales representatives.
Cont’d 36

 It pays commissions based on sales in such manner that up to a certain

level, the target, or threshold, level X*, there is one (stochastic)
commission structure and beyond that level another. (Note: Besides sales,
other factors affect sales commission.

 Assume that these other factors are represented by the stochastic

disturbance term.) More specifically, it is assumed that sales commission
increases linearly with sales until the threshold level X*, after which also
it increases linearly with sales but at a much steeper rate.
Cont’d 37

 Thus, we have a piece-wise linear regression consisting of two linear

pieces or segments, which are labeled I and II in fig. 5.3, and the
commission function changes its slope at the threshold value.

 Given the data on commission, sales, and the value of the threshold level
X*, the technique of dummy variables can be used to estimate the
(differing) slopes of the two segments of the piecewise linear regression
shown in fig. 5.3. We proceed as follows:
Cont’d 38
 Consider the Mod𝒆𝒍 𝒀𝒊 = 𝞪𝟏 + 𝜷𝑿 + 𝜷𝟐 (𝑿𝒊 − 𝑿 ∗)𝑫𝒊 + 𝑼𝒊…………(5.11)

 Where: 𝐘𝐢 = sales commission

 𝐗𝐢= volume of sales generated by the sales person

 X*=threshold value of sales also known as a knot (Known in advance)

 D=1 if 𝐗 𝐢 >X *

 0 if 𝐗 𝐢 < X *
Cont’d 39
 Assuming E(ui ) = 0, we see at once that
 𝑬(𝒀𝒊 /𝑫𝒊 = 𝟎, 𝑿𝒊 , 𝑿 ∗) = 𝞪𝟏 + 𝜷𝟏 𝑿𝒊 ………………………………………………………………………(5.12)
which gives the mean sales commission up to the target level X* and
 𝑬(𝒀𝒊 /𝑫𝒊 = 𝟏, 𝑿𝒊 , 𝑿 ∗) = 𝞪𝟏 + 𝜷𝟐 𝑿 ∗ +(𝜷𝟏 +𝜷𝟐 )𝑿𝒊……………………………………………………(5.13)

 which gives the mean sales commission beyond the target level X*.

 Thus, 𝜷𝟏 gives the slope of the regression lien in segment I, and(𝜷𝟏+𝜷𝟐)

gives the slope of there gression line in segment II of the piecewise linear
regression shown in fig 5.3.

 A test of the hypothesis that there is no break in the regression at the

threshold value X* can be conducted easily by noting the statistical
significance of the estimated differential slope coefficient𝜷𝟐 .
2.REGRESSION ON DUMMY DEPENDENT VARIABLE
40
 Binary dependent variables are extremely common in the social sciences.
Suppose we want to study the labor-force participation of adult males as a
function of the unemployment rate, average wage rate, family income,
education, etc.

 A person either is in the labor force or not. Hence, the dependent variable,
labor-force participation, can take only two values: 1 if the person is in the
labor force and 0 if he or she is not. We can consider another example. A
family may or may not own a house. If it owns a house, it takes a value 1 and
0 if it does no
Cont’d
41
 There are several such examples where the dependent variable is
dichotomous. A unique feature of all the examples is that the dependent
variable is of the type that elicits a yes or no response; that is, it is
dichotomous in nature.

 Now before we discuss the estimation of models involving dichotomous

response variables, let us briefly discuss the concept of qualitative
response models:
4
Cont’d…
2
These are models in which the dependent variable is a discrete outcome.
 Example 1. Y = 0 + 1X1 + 2X2
 Y = 1, if individual i attended college
 = 0, otherwise
 In the above example the dependent variable Y takes on only two
values (i.e., 0 and 1).
 Conventional regression cannot be used to analyze a qualitative
dependent variable model.
 The models are analyzed in a general framework of probability models.
2.1. Categories of Qualitative Response Models (QRM) 43
Two broad categories of QRM
1.Binomial Model
 The choice is between two alternatives
2.Multinomial models
The choice is between more than two alternatives
 Example: Y = 1, occupation is farming
 = 2, occupation is carpentry
 = 3, occupation is fishing
 Let us define some important terminologies
 Binary variables: are variables that have two categories and are often used to indicate
that an event has occurred or that some characteristic is present.
 Example: - Decision to participate in the labor force/or not to participate
 -Decision to vote or not to vote
Cont’d
44
Ordinal variables:- these are variables that have categories that can be
ranked.
Example: – Rank to indicate political orientation
 Y = 1, radical
 = 2, liberal
 = 3, conservative
 - Rank according to education attainment
 Y = 1, primary education
 = 2, secondary education
 = 3, university education
Cont’d
45
Nominal variables: These variables occur when there are multiple
outcomes that cannot be ordered.

Example: Occupation can be grouped as farming, fishing, carpentry etc.

Note that numbers are assigned arbitrarily

Y = 1 farming

= 2 fishing

= 3 carpentry

= 4 Livestock
Cont’d
46
 Count variables: These variables indicate the number of times some
event has occurred.
 Example: How many strikes have been occurred.
2.3.Types of Binomial Models
47

The four most commonly used approaches to estimating binary response

models (Type of binomial models).

1. Linear probability models

2. The logit model

3. The probit model

4. The tobit (censored regression) model.

2.3.1.THE LINEAR PROBABILITY MODEL (LPM)
48
The linear probability model is the regression model applied to a binary
dependent variable. To fix ideas, consider the following simple model:
Yi = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝑼𝒊`……………………………(1)
 where X = family income
Y = 1 if the family owns a house
= 0 if the family does not own a house
Ui is the disturbance term
 The independent variable Xi can be discrete or continuous variable. The model can be
extended to include other additional explanatory variables.
Cont’d
49
 The above model expresses the dichotomous Yi as a linear function of the explanatory
variable Xi.

 Such kinds of models are called linear probability models (LPM) since E(Yi/Xi) the
conditional expectation of Yi given Xi, can be interpreted as the conditional probability
that the event will occur given Xi; that is, Pr(Yi = 1/Xi).

 Thus, in the preceding case, E(Yi/Xi) gives the probability of a family owing a house and
whose income is the given amount Xi. The justification of the name LPM can be seen as
follows.

Assuming E(Ui) = 0, as usual (to obtain unbiased estimators), we obtain

 E(Yi/Xi) = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 …………………………………….(2)
50

 .
2.3.1.1.Problems with the LPM 51
 While the interpretation of the parameters is unaffected by having a
binary outcome, several assumptions of the LPM are necessarily violated.

.
Cont’d
52

 .
2.BinaryNon-normality of Ui
53
 Although OLS does not require the disturbance (U’s) to be normally
distributed, we assumed them to be so distributed for the purpose of
statistical inference, that is, hypothesis testing, etc. But the assumption of
normality for Ui is no longer tenable for the LPMs because like Yi, Ui takes
on only two values.
𝑼𝒊 =𝒀𝒊 -𝜷𝟎 - 𝜷𝟏 𝑿𝒊

Now when 𝒀𝒊 =1 , 𝑼𝒊 = 1-𝜷𝟎- 𝜷𝟏 𝑿𝒊

and when 𝒀𝒊 =0, 𝑼𝒊= -𝜷𝟎- 𝜷𝟏 𝑿𝒊
Obviously Ui cannot be assumed to be normally distributed.
Recall that normality is not required for the OLS estimates to be unbiased.
Cont’d
54
3.Non-Sensical Predictions
 The LPM produces predicted values outside the normal range of
probabilities (0, 1). It predicts value of Y that are negative and greater than
1. This is the real problem with the OLS estimation of the LPM.
4.Functional Form:
 Since the model is linear, a unit increase in X results in a constant change of
in the probability of an event, holding all other variables constant. The
increase is the same regardless of the current value of X. In many
applications, this is unrealistic. When the outcome is a probability, it is often
substantively reasonable that the effects of independent variables will have
diminishing returns as the predicted probability approaches 0 or 1.
 Remark: Because of the above mentioned problems the LPM model is not
recommended for empirical works.
2.3.2.THE LOGIT MODEL
55
 We have seen that LPM has many problems, such as non-normality of Ui,
heteroscedasticity of Ui, possibility of lying outside the 0-1 range, and the
generally lower R2 values. But these problems are surmountable. The
fundamental problem with the LPM is that it is not logically a very
attractive model because it assumes that Pi = E(Y = 1/X) increases
linearly with X, that is, the marginal or incremental effect of X remains
constant throughout.
 Example: The LPM estimated by OLS (on home ownership) is given as
follows:
= -0.9457 + 0.1021Xi
(0.1228) (0.0082)
t = (-7.6984) (12.515)
R2 = 0.8048
Cont’d
56
The above regression is interpreted as follows

 The intercept of –0.9457 gives the “probability” that a family with zero
income will own a house. Since this value is negative, and since
probability cannot be negative, we treat this value as zero.

 The slope value of 0.1021 means that for a unit change in income, on the
average the probability of owning a house increases by 0.1021 or about
10 percent. This is so whether the income level is increased or not. This
seems patently unrealistic. In reality one would expect that Pi is non-
linearly related to Xi.
Cont’d
57
Therefore, what we need is a (probability) model that has the following two
features:

 1.As Xi increases, Pi = E(Y = 1/ Xi ) increases but never steps outside the 0-1
interval.

 2. The relationship between Pi and Xi is non-linear, that is, “ one which

approaches zero at slower and slower rates as Xi gets small and
approaches one at slower and slower rates as Xi gets very large”
Cont’d
58

 .
Cont’d
59
 Therefore, one can easily use the CDF to model regressions where the response
variable is dichotomous, taking 0-1 values.

 The CDFs commonly chosen to represent the 0-1 response models are.

a. the logistic – which gives rise to the logit model

b. the normal – which gives rise to the probit (or normit) model

 Now let us see how one can estimate and interpret the logit model.

Recall that the LPM was (for home ownership)

Pi = E(Y = 1/Xi) = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊

Where X is income and Y = 1 means the family owns a house

 .
61

 .
62

 .
3.THE PROBIT MODEL 63
 The estimating model that emerges from the normal CDF is popularly known
as the probit model.
 Here the observed dependent variable Y, takes on one of the values 0 and 1
using the following criteria.
y=1 if 𝐗 𝐢 >X *
yif 𝐗 𝐢 < X *
 The latent variable Y* is continuous (-∞ < Y* <∞ ). It generates the
observed binary variable Y.
 An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
 The latent variable is assumed to be a linear function of the observed X‟s
through the structural model.
Cont’d 64
 Example: Let Y measures whether one is employed or not. It is a binary
variable taking values 0 and 1.

 Y* - measures the willingness to participate in the labor market. This

changes continuously and is unobserved. If X is a wage rate, then as X
increases the willingness to participate in the labor market will increase.
(Y* - the willingness to participate cannot be observed). The decision of
the individual will be changed (becomes zero) if the wage rate is below the
critical point.

 Since Y* is continuous the model avoids the problems inherent in the LPM
model (i.e., the problem of non-normality of the error term and
heteroscedasticity)
Cont’d 65
 However, since the latent dependent variable is unobserved the model
cannot be estimated using OLS. Maximum likelihood can be used instead.

 Most often, the choice is between normal errors and logistic errors,
resulting in the probit (normit) and logit models, respectively. The
coefficients derived from the maximum likelihood (ML) function will be the
coefficients for the probit model, if we assume a normal distribution.

 If we assume that the appropriate distribution of the error term is a

logistic distribution, the coefficients that we get from the ML function will
be the coefficient of the logit model. In both cases, as with the LPM, it is
assumed that E[Ui/Xi] = 0
Cont’d 66

 .
67

 .
4.THE TOBIT MODEL 68
 An extension of the probit model is the tobit model developed by James
Tobin. To explain this model, let us consider the home ownership example.

 Suppose we want to find out the amount of money the consumer spends in
buying a house in relation to his or her income and other economic
variables. Now we have a problem.

 If a consumer does not purchase a house, obviously we have no data on

housing expenditure for such consumers; we have such data only on
consumers who actually purchase a house.
Cont’d 69
 Thus consumers are divided into two groups, one consisting of say, N1
consumers about whom we have information on the regressors (say income,
interest rate etc)as well as the regresand ( amount of expenditure on
housing) and another consisting of say, N2 consumers about whom we have
information only on the regressors but on the regressand.

 A sample in which information on regressand is available only for some

observations is known as a censored sample. Therefore, the tobit model is
also known as a censored regression model.
Cont’d 70
 Mathematically, we can express the tobit model as

Yi = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + Ui if RHS > 0

= 0, otherwise

Where RHS = right-hand side

 The method of maximum likelihood can be used to estimate the

parameters of such models.
71

END OF CHAPTER ONE

Chapter 5 & 6
No ratings yet
Chapter 5 & 6
136 pages
Labor Economics
No ratings yet
Labor Economics
155 pages
ECO 221 Past Questions - 074009
No ratings yet
ECO 221 Past Questions - 074009
9 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
International Economics Final Exam
No ratings yet
International Economics Final Exam
5 pages
Econometrics Chapter Three
No ratings yet
Econometrics Chapter Three
70 pages
#.Development Planing II
100% (1)
#.Development Planing II
173 pages
Calculus For Economists-Module-Final Draft - Dr. Addisu M
100% (1)
Calculus For Economists-Module-Final Draft - Dr. Addisu M
146 pages
Economics
100% (1)
Economics
136 pages
Model Exit
No ratings yet
Model Exit
8 pages
Econometrics Chapter Two-1
No ratings yet
Econometrics Chapter Two-1
41 pages
2010 AAU Entrance Exam - MSC in Economics
No ratings yet
2010 AAU Entrance Exam - MSC in Economics
4 pages
Introduction To Economics Mid Exam
No ratings yet
Introduction To Economics Mid Exam
4 pages
Chapter 4 Quantitative Development Planning Techniques
No ratings yet
Chapter 4 Quantitative Development Planning Techniques
139 pages
MSS 064 Rev.00 Final
No ratings yet
MSS 064 Rev.00 Final
33 pages
Agricultural Economics Chapter 3
No ratings yet
Agricultural Economics Chapter 3
46 pages
Economics 2 1 1
No ratings yet
Economics 2 1 1
14 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Economic of Industry Worksheet
100% (1)
Economic of Industry Worksheet
8 pages
Econometrics Two
No ratings yet
Econometrics Two
116 pages
MACRO ECONOMICS-I MODULE (1) Jimma
No ratings yet
MACRO ECONOMICS-I MODULE (1) Jimma
164 pages
Development Planning and Project Analysis II Group and Individual
No ratings yet
Development Planning and Project Analysis II Group and Individual
3 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
108 pages
Question Bank
100% (1)
Question Bank
3 pages
Macroec I
100% (1)
Macroec I
109 pages
International Economics II - Chapter 2
No ratings yet
International Economics II - Chapter 2
74 pages
Econometrics - Basic 1-8
100% (1)
Econometrics - Basic 1-8
58 pages
Chapter Three: Aggregate Demand in Closed Economy
No ratings yet
Chapter Three: Aggregate Demand in Closed Economy
85 pages
TOS TLE 8 Agricrop For Sharing
No ratings yet
TOS TLE 8 Agricrop For Sharing
2 pages
Business Economics MCQs
No ratings yet
Business Economics MCQs
36 pages
Development Economics
No ratings yet
Development Economics
11 pages
Econometrics I CH-1
No ratings yet
Econometrics I CH-1
32 pages
Agricultural Economics Chapter 4
No ratings yet
Agricultural Economics Chapter 4
34 pages
Financial Economics Chapter 4
No ratings yet
Financial Economics Chapter 4
26 pages
Dvielopmetal Economics Chap 1 & 2 Students
100% (1)
Dvielopmetal Economics Chap 1 & 2 Students
95 pages
Microeconomics II
No ratings yet
Microeconomics II
37 pages
Economics Chapter 8 PPT For Consumption
No ratings yet
Economics Chapter 8 PPT For Consumption
13 pages
Chapter Six
100% (1)
Chapter Six
112 pages
ch-4 Planning
100% (1)
ch-4 Planning
90 pages
Introduction To Econometrics
No ratings yet
Introduction To Econometrics
90 pages
Financial Economics Chapter One
100% (1)
Financial Economics Chapter One
61 pages
Micro Perfect and Monopoly
No ratings yet
Micro Perfect and Monopoly
57 pages
Chapter 3
100% (1)
Chapter 3
25 pages
Violation of OLS Assumption - Multicollinearity
No ratings yet
Violation of OLS Assumption - Multicollinearity
18 pages
Development Economics 1 &2
No ratings yet
Development Economics 1 &2
43 pages
Chapter 1 Econometrics
No ratings yet
Chapter 1 Econometrics
21 pages
Econ 3051-Dynamic Optimization
No ratings yet
Econ 3051-Dynamic Optimization
11 pages
Data Sheet - Carrier Chiller
No ratings yet
Data Sheet - Carrier Chiller
4 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
Macroecon II Handout Haramaya
No ratings yet
Macroecon II Handout Haramaya
107 pages
Microeconomics 1
No ratings yet
Microeconomics 1
231 pages
Chapter 6
No ratings yet
Chapter 6
27 pages
Multiple Choice Questions (1-5) 1 Tick For Each Correct Answer PDF
No ratings yet
Multiple Choice Questions (1-5) 1 Tick For Each Correct Answer PDF
2 pages
Chapter Four
No ratings yet
Chapter Four
35 pages
Final Economic Development Ch.3.2023
No ratings yet
Final Economic Development Ch.3.2023
8 pages
Chapter Five
No ratings yet
Chapter Five
39 pages
Public Goods, Externalities, and Information Asymmetries: Multiple Choice Questions
No ratings yet
Public Goods, Externalities, and Information Asymmetries: Multiple Choice Questions
52 pages
Differential Equations For Engineering Science 2014 by Serdar Yüksel
100% (1)
Differential Equations For Engineering Science 2014 by Serdar Yüksel
52 pages
ECO - Chapter 01 The Subject Matter of Econometrics
No ratings yet
ECO - Chapter 01 The Subject Matter of Econometrics
42 pages
Chapter 1 Exam - The Nature of Economics
No ratings yet
Chapter 1 Exam - The Nature of Economics
7 pages
Exit Exam
No ratings yet
Exit Exam
50 pages
Internl Economics II
No ratings yet
Internl Economics II
113 pages
Misge Micro I Chap 1,2,3
No ratings yet
Misge Micro I Chap 1,2,3
84 pages
Econ 3010 Final Exam Multiple Choice (100 Points)
100% (3)
Econ 3010 Final Exam Multiple Choice (100 Points)
9 pages
Chapter Three QM
No ratings yet
Chapter Three QM
77 pages
Chapt 2 MIC
No ratings yet
Chapt 2 MIC
13 pages
Chapter Six: Diversification, Integration and Merger: by Belaynew B
No ratings yet
Chapter Six: Diversification, Integration and Merger: by Belaynew B
25 pages
2022 Econometrics I Chapter Four
No ratings yet
2022 Econometrics I Chapter Four
83 pages
Fee Structure Agm Current
No ratings yet
Fee Structure Agm Current
2 pages
Orson Welles' Memo On by Lawrence French
100% (1)
Orson Welles' Memo On by Lawrence French
41 pages
Brosur Master Steel
No ratings yet
Brosur Master Steel
4 pages
Financial E Chapter Two
No ratings yet
Financial E Chapter Two
25 pages
Taxi Book
No ratings yet
Taxi Book
4 pages
Isolated Footing Excel Computation
No ratings yet
Isolated Footing Excel Computation
27 pages
Inspection Preparation For Ships
No ratings yet
Inspection Preparation For Ships
3 pages
Table Showing Current Ratio: List of Tables
No ratings yet
Table Showing Current Ratio: List of Tables
37 pages
PSL50 Protection Datasheet
No ratings yet
PSL50 Protection Datasheet
5 pages
Defence Standard 00-970 Part 1 Section 1: Issue 13 Date: 13 Jul 2015
No ratings yet
Defence Standard 00-970 Part 1 Section 1: Issue 13 Date: 13 Jul 2015
23 pages
Synthetic
No ratings yet
Synthetic
6 pages
Prefinal-1 Model Paper (2024-25)
No ratings yet
Prefinal-1 Model Paper (2024-25)
4 pages
Better Homes & Gardens 8 Cube Organizer EN
No ratings yet
Better Homes & Gardens 8 Cube Organizer EN
26 pages
Amer Shield
No ratings yet
Amer Shield
4 pages
Soil Mechanics Formula 1700830319
No ratings yet
Soil Mechanics Formula 1700830319
3 pages
IoT Quantum Computing A Future Concept
No ratings yet
IoT Quantum Computing A Future Concept
8 pages
A CMOS Self-Regulating VCO With Low Supply Sensitivity 4
No ratings yet
A CMOS Self-Regulating VCO With Low Supply Sensitivity 4
7 pages
Multiple Choice Questions: University of Cape Town School of Economics Eco1010F Tutorial 8
No ratings yet
Multiple Choice Questions: University of Cape Town School of Economics Eco1010F Tutorial 8
6 pages
AllPack Cataloque - 11.10.24
No ratings yet
AllPack Cataloque - 11.10.24
8 pages
Spring Lighting 2013 - HKD1800 Travel Reimbursement
No ratings yet
Spring Lighting 2013 - HKD1800 Travel Reimbursement
1 page
Agricultural Economics Chapter 5 & 6
No ratings yet
Agricultural Economics Chapter 5 & 6
85 pages
A Shani 2020
No ratings yet
A Shani 2020
9 pages
API FR - INR.RINR DS2 en Excel v2 2917298
No ratings yet
API FR - INR.RINR DS2 en Excel v2 2917298
74 pages
Balancing The Old With The New
No ratings yet
Balancing The Old With The New
4 pages
B.ing Kls XII
No ratings yet
B.ing Kls XII
1 page
Homework 6: Math 308 Due: 8 March
No ratings yet
Homework 6: Math 308 Due: 8 March
3 pages
Circuit Design Powerful Blad Tinkercad
No ratings yet
Circuit Design Powerful Blad Tinkercad
1 page