Solution Manual
Solution Manual
(ECO3C11)
STUDY MATERIAL
CORE COURSE XI
III SEMESTER
M.A. ECONOMICS
(2019 Admission)
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
CALICUT UNIVERSITY P.O.
MALAPPURAM - 673 635, KERALA
190311
School of Distance Education
University of Calicut
Study Material
III Semester
M.A. ECONOMICS
(2019 Admission)
Core Course XI
ECO3C11 : BASIC ECONOMETRICS
Prepared by:
Dr. RASEENA K.K.
Assistant Professor of Economics
Sri. C. Achutha Menon Govt. College
Thrissur.
Scrutinized by:
Dr. R. RAMYA
Head of the Department
Department of Economics
Sri. C. Achutha Menon Govt. College, Thrissur.
DISCLAIMER
"The author(s) shall be solely responsible
for the content and views
expressed in this book".
ECO3C11 : Basic Econometrics
BASIC ECONOMETRICS
(Syllabus)
Module I
Simple Linear Regression Model
This module is divided into two parts. First part deals with the
introduction of the discipline econometrics, its nature, scope,
methodology and uses. The second part consists of the
description of simple linear regression model, its concepts,
methods and properties.
1.1: Introduction to Econometrics
The term econometrics was coined in 1926 by Ragnar A. K.
Frisch, a Norwegian economist who shared the first Nobel
Prize in Economics in 1969 with Jan Tinbergen, an
econometrics pioneer. Although many economists had used
data and made calculations long before 1926, Frisch felt the
significance of a new term associated with the interpretation
and use of data in Economics. Today, Econometrics is a broad
area of study within economics and the field changes
constantly according to the emergence of new tools and
techniques.
Econometrics deals with the measurement of economic
relationships and it is an integration of economics,
mathematical economics and statistics with an objective to
provide numerical values to the parameters of economic
relationships. The relationships of economic theories are
usually expressed in mathematical forms, combined with
empirical economics. The econometrics methods are used to
obtain the values of parameters which are essentially the
coefficients of the mathematical form of the economic
relationships. The econometric relationships depict the random
need to account for the fact that it may not be so. We add
an error term, u to the equation above. It is also called
a random variable or stochastic variable. It represents other
non-quantifiable or unknown factors that affect Y. It also
represents errors of measurements that may have entered the
data. The econometric equation is:
Y= β0+β1X+U
The error term, U, is assumed to follow some sort of statistical
distribution.
4. Collection of Data
We need data for the variables above. This can be obtained
from government statistics agencies and other sources. A lot of
data can also be collected on the Internet in these days. But we
need to learn the art of finding appropriate data from the ever-
increasing loads of data.
Various types of data are used in the estimation of the model.
A) Time series data
Time series data give information about the numerical values
of variables from collected over a period of time. For example,
the data during the years 1990-2010 for monthly income
constitutes time series data.
B) Cross-section data
The cross-section data give information on the variables
concerning individual agents (e.g., consumers or produces) at a
given point of time. For example, data collected from a sample
of consumers and their family budgets showing expenditures
on various commodities by each family, as well as information
on family income, family composition and other demographic,
Table 1.1
X Weekly 10 12 15 18 20 23 25 28 30 35
family
income
Y weekly 5 5 8 11 11 15 16 16 20 22
family 6 8 9 12 14 16 17 18 21 23
expenditure
7 9 10 13 15 17 18 20 23 24
8 10 11 14 16 18 21 22 24 26
9 13 15 19 23 24 30
15 23 26
Total 35 32 66 65 56 108 95 126 88 125
Table 1.2
X 10 12 15 18 20 23 25 28 30 35
P(y/Xi) 1/5 ¼ 1/6 1/5 ¼ 1/6 1/5 1/6 ¼ 1/5
Conditional 1/5 ¼ 1/6 1/5 ¼ 1/6 1/5 1/6 ¼ 1/5
probabilities 1/5 ¼ 1/6 1/5 ¼ 1/6 1/5 1/6 ¼ 1/5
1/5 ¼ 1/6 1/5 ¼ 1/6 1/5 1/6 ¼ 1/5
1/5 1/6 1/5 1/6 1/5 1/6 1/5
1/6 1/6 1/6
Conditional 7 8 11 13 14 18 19 21 22 25
Means of y
ûi= Yi -
ûi2 = (Yi - )2
Sum of squares,
∑ ∑
∂ Ʃûi2/∂ =2∑
∂ Ʃûi2/∂ =0 => ∑
∑ ∑
That is,
ƩYi = n + ƩXi ...............(1)
Similarly,
∂ Ʃûi2/∂ =2∑
∂Ʃûi2/ =0 => ∑ ( )
∑ ∑
That is,
Ʃ( )
=
Ʃ
Ʃ
= where, xi = Xi- and yi = Yi-
Ʃ
And,
Ʃ Ʃ Ʃ Ʃ
=
Ʃ Ʃ
That is,
= -
Therefore we have the two OLS estimators of the Simple
Linear Regression model as,
= - and
Ʃ
= Ʃ
Note that the formula for and the definition of the weights
ki imply that is also a linear function of the Yi such that
Property 2: Unbiasedness
If you look at the regression equation, you will find an error
term associated with the regression equation that is estimated.
This makes the dependent variable also random. If an
estimator uses the dependent variable, then that estimator
would also be a random number. Therefore, before describing
what unbiasedness is, it is important to mention that
unbiasedness property is a property of the estimator and not of
any sample.
Unbiasedness is one of the most desirable properties of any
estimator. The estimator should ideally be an unbiased
estimator of true parameter/population values.
Consider a simple example: Suppose there is a population of
size 1000, and you are taking out samples of 50 from this
population to estimate the population parameters. Every time
you take a sample, it will have the different set of 50
observations and, hence, you would estimate different values
of and . The unbiasedness property of OLS method
says that when you take out samples of 50 repeatedly, then
after some repeated attempts, you would find that the average
of all the and from the samples will equal to the actual
(or the population) values of β0 and β1.
In short,
variance. In short:
1. If the estimator is unbiased but doesn‘t have the least
variance – it‘s not the best!
2. If the estimator has the least variance but is biased –
it‘s again not the best!
3. If the estimator is both unbiased and has the least
variance – it‘s the best estimator.
Now, talking about OLS, OLS estimators have the least
variance among the class of all linear unbiased estimators. So,
this property of OLS regression is less strict than efficiency
property. Efficiency property says least variance among all
unbiased estimators, and OLS estimators have the least
variance among all linear and unbiased estimators.
Proof:-
That is,
Or in other words,
r2 = 1- (RSS/TSS) since ESS+RSS= TSS
Therefore;
where tα/2,n−2 and −tα/2,n−2 are the critical values for the
two-sided hypothesis. tα/2,n−2 is the percentile of
the t distribution corresponding to a cumulative probability
of (1−α/2) and α is the significance level.
If the value of β1,0 is zero, then the hypothesis tests for the
significance of regression. In other words, the test indicates if
the fitted regression model is significant in explaining
variations in the observations or if you are trying to impose a
regression model when no true relationship exists
between x and Y. Failure to reject H0:β1=0 implies that no
linear relationship exists between x and Y. This result may be
obtained when the scatter plots of against are as shown as
Figure 1.11.
Figure 1.11
1.2.8.2 F test
F-test is any statistical test in which the test statistic follows
an F-distribution under the null hypothesis. It is most often
used when comparing statistical models that have been fitted to
a data set, in order to identify the model that best fits
the population from which the data were sampled. Exact "F-
tests" mainly arise when the models have been fitted to the
data using least squares. The name was coined by George W.
Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially
developed the statistic as the variance ratio in the 1920s.
Common examples of the use of F-tests include the study of
the following cases:
For checking the overall significance of the fitted
regression model.
The hypothesis that the means of a given set of normally
distributed populations, all having the same standard
deviation, are equal. This is perhaps the best-known F-test,
and plays an important role in the analysis of
variance (ANOVA).
The hypothesis that a proposed regression model fits
the data well. See Lack-of-fit sum of squares.
The hypothesis that a data set in a regression
analysis follows the simpler of two proposed linear models
that are nested within each other.
In addition, some statistical procedures, such as Scheffé's
method for multiple comparisons adjustment in linear models,
also use F-tests.
In Simple Linear regression model we are using F test for
testing the overall significance of the model. The F-test of
Module II:
Multiple Regression Analysis
So far, we have seen the concept of simple linear regression
where a single independent/predictor variable X was used to
model the dependent/response variable Y. Practically, there
will be more than one independent variable that influences the
response variable. Multiple regression models thus predict how
a single response variable Y depends linearly on a number of
predictor variables. Examples:
• The selling price of a house can depend on the desirability
of the location, the number of bedrooms, the number of
bathrooms, the year the house was built, the square footage
and a number of other factors.
• The height of a child can depend on the height of the
mother, the height of the father, nutrition, and
environmental factors.
That is, we use the adjective "simple" to denote that our model
has only predictor, and we use the adjective "multiple" to
indicate that our model has at least two predictors. The models
have similar "LINE" assumptions. The only real difference is
that whereas in simple linear regression we think of the
distribution of errors at a fixed value of the single predictor,
with multiple linear regressions we have to think of the
distribution of errors at a fixed set of values for all the
predictors. The entire model checking procedures we learned
earlier is useful in the multiple linear regression frameworks,
although the process becomes more involved since we now
have multiple predictors. A population model for a multiple
linear regression model that relates a y-variable to k number of
SRF: Yi = +ûi..............(5)
From this we have,
...............(6)
...............(7)
The most straight forward procedure to obtain the estimators
that will minimise Residual sum of Squares (RSS) is to
Where;
Then,
Where;
SSres = sum of squares due to residuals,
SST = total sum of squares
SS reg = sum of squares due to regression.
R2 measures the explanatory power of the model, which in turn
reflects the goodness of fit of the model. It reflects the model
adequacy in the sense of how much is the explanatory power
of the explanatory variables.
The limits of R2 are 0 and 1, i.e.,
0 ≤ R2 ≤1.
R2 = 0 indicates the poorest fit of the model.
R2 =1 indicates the best fit of the model
R2 = 0.95 indicates that 95% of the variation in y is
explained by R2. In simple words, the model is 95% good.
Similarly, any other value of R2 between 0 and 1 indicates the
adequacy of the fitted model.
Adjusted R2
If more explanatory variables are added to the model, then R2
increases. In case the variables are irrelevant, then R2 will still
Here;
β1= 0 is excluded because this involves additional implication
that the mean level of y is zero. Our main concern is to know
whether the explanatory variables help to explain the variation
in y around its mean value or not.
This is an overall or global test of model adequacy. Rejection
of the null hypothesis indicates that at least one of the
explanatory variables among X2, X3,......Xk contributes
significantly to the model. This is called as analysis of
variance(ANOVA).
Where,
Yi= output
X2= labour input
X3= Capital input
Taking the natural logarithm on both sides we have,
Ln Yi = Ln β1 + β2 Ln X2i+ β3 LnX3i+ui Ln e ....(2)
Ln Yi = β0+ β2 Ln X2i+ β3 LnX3i+ui .........(3)
Where,
β0= Ln β1 and
Ln e = 1
The simplest procedure is to estimate equation (3) is run the
OLS. This is called the unrestricted or unconstrained
regression.
Now if there are constant returns to scale, economic theory
would suggest that,
β2 + β3 =1......(4)
This is a linear equality restriction.
When estimating equation (3) by considering this linear
equality restriction (equation 4) explicitly it is called, restricted
............(5)
or
...... (6)
Where; β2 + β3 =1
If the t value computed from equation (6) exceeds the critical t
value at the chosen level of significance we reject the
hypothesis of constant returns to scale otherwise accept it.
The ‘F’ test approach
The ‗t‘ test procedure is a kind of post-mortem examination
because we try to find out whether the linear restriction is
Or
different data sets are equal. The Chow test was invented by
economist Gregory Chow. In econometrics, the Chow test is
most commonly used in time series analysis to test for the
presence of a structural break. In program evaluation, the
Chow test is often used to determine whether the independent
variables have different impacts on different subgroups of the
population. These two are shown in Figure 2.1
Figure 2.1 Chow Test
(ii) Bias
Since X is assumed to be nonstochastic and E(ϵ) = 0
(iv) Variance
The variance of b can be obtained as the sum of variances of
all b1,b2....bk which is the trace of covariance matrix of b .
Thus,
Consider,
Module III
Econometric Problems
Yi = +ui
ûi =
.........(1)
Assume that, the true error variance is known. That is, the
error variance for each observation is known. Now consider
the following transformation of the model,
⁄ ⁄ ⁄ ...... (2)
From the Figure 3.3, Part (a) to Part (d) errors follow some
systematic patterns. Hence, there is autocorrelation. Bute Part
(e) reveals no such patterns and hence there is no
autocorrelation.
Figure 3.3 Patterns of Autocorrelation
X2 = Price of beef
X3 = Consumer income
X4 = Price of Pork
t = Time
After Regression,
Y = Consumption, X = Income
Both figures (Figures 3.6 and 3.7) clearly shows that the
residuals follow some systematic patterns and hence there is
autocorrelation.
2. Runs test
Initially, we have several residuals that are negative, then there
is a series of positive residuals, and then there are several
residuals that are negative. If these residuals were purely
random, could we observe such a pattern? Intuitively, it seems
‗d‘=
‗d‘
Now let us define the coefficient of autocorrelation, ρ, which
can be determined with the help of the sample first-order
coefficient of autocorrelation,
=
The d statistic become ;
.........(1)
Where are constants such that not all of
them are zero simultaneously.
But the chances of obtaining a sample of values where the
regressors are related in this fashion are rare in practice. Today
however the term multicollinearity is used in a broader sense
.............(3)
The equation (3) shows that X2 is exactly linearly related to
other variables. In this situation the coefficient of correlation
between X2 and the linear combination on the right side of the
equation (3) is found to be Unity.
But in the case of less than perfect multicollinearity by
assuming , equation (2) can also be written as,
.........(4)
Equation (4) shows that X2 is not an exact linear combination
of other Xs because it is also determined by the stochastic error
term vi.
If multicollinearity is perfect, regression coefficients of the X
variables are indeterminate and their standard errors are
infinite. If multicollinearityis less than perfect, regression
coefficients although determinate, but have large standard
errors (in relation to the coefficients themselves) which means
H0: β2 = β3 = ......... βk =0
2
But R is so high,say 0.9, on the basis of F test one can reject
H0. That it is one of the signals of multicollinearity -
insignificant ‗t‘ values, but a high overall R2 and a significant
F value.
5. The OLS estimators and their standard errors can be
sensitive to small changes in the data.
3.3.4 Detection of Multicollinearity
Here we are going to detect multicollinearity. Multicollinearity
is a question of degree and not of kind. The meaningful
distinction is not between the presence and the absence of
multicollinearity but between its various degrees. Since
multicollinearity refers to the degree of relationship between
explanatory variables that are assumed to be non-stochastic, it
is a feature of the sample and not of the population. Since
multicollinearity is a sample phenomenon we do not have one
unique method of detecting it for measuring its strength. But
we have some rules of thumb which all the same. Some of
them are;
1. High R2 but few significant t-ratios
This is the Classic symptom of multicollinearity. If R2 is high
(R2 > 0.8), the F test in most cases will reject H0 (H0 : β‘s are
zero) but the individual t ratios are in significant and thus
accept H0. Although this diagnostic is sensible, its
disadvantage is that it is too strong in the sense that
multicollinearity is considered as the harmful only when all of
the influence of the explanatory variables on Y cannot be
disentangled.
R2 i =
⁄
CI = √ = √
Var(βj) = ( )
Ʃ
=( )VIFj
Ʃ
Variance Inflation Factor (VIF) means the speed with which
1. A-Priory information
It is possible that we can have some knowledge of the values
of one or more parameters from previous empirical work. This
knowledge can be profitably utilised in the current sample to
reduce multicollinearity.
2. Combining cross sectional and time series data
Another technique to reduce the effect of multicollinearity is to
combine cross-sectional and time series data, that is, pooling
the data.
3. Dropping a variable(s) and specification bias
One simplest method when faced with severe multicollinearity
is that to drop one of the collinear variables. Then the model
becomes highly significant. But dropping a variable from the
model to alleviate the problem of multicollinearity may lead to
the specification bias. Hence the remedy may be worse than
the disease in some situations because where as
multicollinearity may prevent precise estimation of the
parameters of the model, omitting a variable may seriously
mislead us as to the true values of the parameters. The OLS
estimators are BLUE despite near linearity.
4. Transformation of the variables
In Economics, we have time series data and we know that one
reason for high multicollinearity between economic variables
is that over time these variables tend to move in the same
direction. Therefore, the transformation of the model can
minimise if not solve the problem of collinearity. Commonly
used transformation technique is first difference form.
5. Additional or new data
Multicollinearity is a sample feature not a population problem
Module IV
Extensions of Two Variables and Dummy
Variable Regression Model
of α and β2 respectively.
Important feature of the log linear regression model is that the
slope of the coefficient of the model β2, measures the elasticity
of Y with respect to X. That is, the percentage changes in Y for
a given percentage change in X. Thus, if Y represents the
quantity of a commodity demanded and X its unit price, β2
measures the price elasticity of demand which have
considerable importance in economics.
Two special features of Log linear model are;
- It is a constant elasticity model, and
From this,
Slope = β2 of the linear model.
In order to obtain price elasticity we have to multiply the slope
with (X/Y). But the question is that which values of X and Y
are taken? If we take the average values of X and Y (x and )
for this purpose we have,
E = β2 (x / )
But this result is something contrasts to the price elasticity
derived from the log linear model. Therefore we always
depends log linear model for calculating elasticity. The basic
difference between a linear model and a log linear model is
that for the log linear models slope and elasticity are the same
..................(1)
Where, r = Compound rate of growth rate of Y
Taking natural logarithm on both sides,
ln Yt = ln Y0 + t ln (1+r) ...................(2)
Substituting,
β1 = ln Y0 and
β2 = ln Yt
We have,
ln Yt = β1 + β2 t .............(3)
β2 =
This gives the compound growth rate over a period that we are
considering for calculation.
The lin-log model
A model in which the regressand (dependent variable) is linear
but the regressors are logarithmic is called a lin log model. A
lin-log model can be expressed as;
Yi = β1 + β2 ln (Xi) + ui............ (1)
Lin-Log Model is used to find the absolute change in the
dependent variable for a percentage in the independent
variable whereas, the log-lin model used to find the percentage
growth in the dependent variable for an absolute unit change in
the independent variable.
In Lin-log model,
Yi = β1 + β2 ln (Xi) + ui
We can interpret the slope coefficient β2 as,
β2 = Change in Y / Change in ln X
= Change in Y / Relative change in X
That is, a change in the log of a number is a relative change.
Symbolically we have;
...............(2)
That is,
.......................(3)
The equation (3) states that,
.................(1)
That is, the dependent variable Yt is a function of the
reciprocal of the independent variable Xt. This model is non-
linear in variable X because it enters inversely or reciprocally,
the model is linear in β1 and β2 and is therefore a linear
regression model.
The basic feature of a reciprocal model is that as X increases
indefinitely the term β2 (1/X) approaches zero and Y
approaches the limiting or asymptotic value of β1. The
reciprocal models have built in them an asymptote or limit
value that the dependent variable will take when the value of
the X variable increases indefinitely. Some likely shapes of the
curve corresponding to the reciprocal models are shown as in
Figure 4.2.
Figure 4.2 Reciprocal Models
following form;
Yi = β2 Xi + ui ................(1)
In this model, the intercepted term is absent or zero, hence the
name regression through the origin.
For example, in Capital Asset Pricing Model (CAPM) of
modern portfolio theory, the risk premium may be expressed
as;
(ERi – rf ) = βi (ERm – rf) ..............(2)
Where;
ERi = Expected rate of return on security ‗i‘
ERm = Expected rate of return on the market portfolio
rf = Risk free rate of return
βi = Beta coefficient, a measure of systematic risk
If capital markets work efficiently, then capital asset pricing
model postulates that security i‘s expected risk premium (ERi
– rf ) is equal to that security's β coefficient times the expected
market risk premium(ERm – rf).
For empirical purposes, equation (2) can be expressed as;
Ri – rf = βi (Rm-rf) + ui ...............(3) or
Ri – rf =αi+ βi (Rm-rf) + ui ...............(4)
Equation (4) is known as the market model. If capital pricing
model holds, αi is expected to be zero. This form of regression
is known as regression through origin.
4.2 Dummy variable regression model
In regression analysis the dependent variable is frequently
influenced not only by variables that can be readily
Yi = α + βDi +ui........................(1)
Where;
Yi = salary of a worker
Di = 1, if male
= 0, if female
In (1), instead of quantitative X variable we have a dummy
variable D. Model (1) enable us to find out whether sex make
any difference in the salary of a worker, if all other variables
such as age, education, years of experience etc. are held
constant. If ui satisfies all the assumptions of CLRM, We
obtain from (1),
Mean salary of a female worker,
E(Yi /Di =0) = α ....................(2)
Means salary of a male worker,
E(Yi /Di =1) = α +β .................(3)
That is, the intercepted term ‗α‘ gives the mean salary of a
female worker and the slope coefficient ‗β‘ tells by how much
the means salary of a male worker differs from the means
salary of his female counterparts, α +β reflecting the mean
salary of a male worker.
A test of the H0: β= 0, that is, there is no sex discrimination
can be easily made by running regression on (1) in usual
manner and finding out whether on the basis of the ‗t‘ test the
estimated β is statistically significant.
In most economic research, a regression model contains some
explanatory variables that are quantitative and some that are
qualitative. Regression models containing a mixture of
quantitative and qualitative variables are called Analysis of
Co-Variance (ANCOVA) models. These models can be
reject the null hypothesis that the male and female workers‘
level of means salary are the same.
4.2.3 Dummy variable trap
To distinguish the two categories of a dummy variable, we
have introduced only one dummy variable Di. If, Di=1, always
denote a male, when Di=0 we know that it is a female since
there are only two possible outcomes. Hence one dummy
variable suffices to distinguish two categories. Let us the
model is as,
Yi = α1 + α2 D2i + α3 D3i +βXi + ui.............(7)
Where;
Yi = salary of a worker
Xi = Years of experience
D2i = 1, if male
= 0, if female
D3i = 1, if female
= 0, if male
The model (7) cannot be estimated because of perfect
collinearity between D2 and D3. To see this, suppose we have a
sample of three male workers and two female workers as
follows,
Y α1 D2 D3 X
Male Y1 1 1 0 X1
Male Y2 1 1 0 X2
Female Y3 1 0 1 X3
Male Y4 1 1 0 X4
Female Y5 1 0 1 X5
Module V
Model Specification and Diagnostic Testing
In this module, we are discussing two important topics related
to regression analysis. These are model specification errors and
Qualitative Response Regression Models.
5.1 Specification Errors
One important assumption of Classical Linear Regression
Model is that the regression model is correctly specified or
there is no specification bias in the chosen regression model.
With this assumption we are estimating the parameters of the
chosen regression model and testing hypothesis about them
using R2, F, ‗t‘, etc. If the tests are satisfactory, the regression
model is considered as best fit. If the tests are unsatisfactory,
there are some specification errors or bias in the chosen model,
such as;
- Whether some important variables are omitted from the
model?
- Whether some superfluous variables included in the
model?
- Is the functional form of the chosen model correct?
- Is the specification of the stochastic error correct?
- Is there more than one specification error?
If these kinds of specification errors are there, the traditional
econometric methodology used is Average Economic
Regression (AER).
If for example, the bias results from omission of variables, the
researcher starts adding new variables to the model and tries to
E( ) = β1 , E ( ) = β2 and E( ) = β3
2. The error variance σ2 is correctly estimated
3. The usual confidence interval and hypothesis testing
procedures remain valid
4. The estimated α‘s will be generally inefficient that is
their variances will be generally larger than those of the
of the true model.
Yi = β1 + β2Xi + β3 Ŷi 2+ β4 Ŷi 3 + ui .............(2)
- Obtain R2 from (1) and (2) as R2old and R2new. Then using
the F test the statistical significance of increase in R2 using,
Non-fulfillment of 0 ≤ E(Yi | X) ≤ 1
Since E(Yi | X) in the linear probability models measures the
conditional probability of the event Y occurring given X, it
must necessarily lies in between 0 and 1. Although this is true
a priori, there is no guarantee that Ŷi , the estimators of E(Yi |
Xi ), will necessarily fulfil this restriction, and this is the real
problem with the OLS estimation of the LPM. There are two
ways of finding out whether the estimated Ŷi lie between 0 and
1. One is to estimate the LPM by the usual OLS method and
find out whether the estimated Ŷi lie between 0 and 1. If some
are less than 0 (that is, negative), Ŷi is assumed to be zero for
those cases; if they are greater than 1, they are assumed to be
1.
The second procedure is to devise an estimating technique that
will guarantee that the estimated conditional probabilities Ŷi
will lie between 0 and 1. The logit and probit models discussed
later will guarantee that the estimated probabilities will indeed
lie between the logical limits 0 and 1.
5.2.2 Logit and Probit Models
As we have seen, the LPM is plagued by several problems,
such as (1) non-normality of ui, (2) heteroscedasticity of ui, (3)
possibility of Ŷi lying outside the 0–1 range, and (4) the
generally lower R2 values. But these problems are
surmountable. For example, we can use WLS to resolve the
heteroscedasticity problem or increase the sample size to
minimize the non-normality problem. By resorting to restricted
least-squares or mathematical programming techniques we can
even make the estimated probabilities lie in the 0–1 interval.
But even then the fundamental problem with the LPM is that it
is not logically a very attractive model because it assumes that,
The former giving rise to the logit model and the latter to the
probit (or normit) model.
THE LOGIT MODEL:
Pi = E(Y = 1/Xi) = β1 + β2Xi --------------------1
Where X is income and Y = 1 means the family owns a house.
But now consider the following representation of home
ownership:
Where Zi = β1 + β2Xi
Equation (3) represent what is known as the cumulative
logistic distribution function.
It is easy to verify that as Zi ranges from - ∞ to + ∞, Pi Ranges
between 0 and 1 and that Pi is nonlinearly related to Zi (i.e.Xi ).
Thus satisfying the two requirements considered earlier. But it
seems that in satisfying these requirements, we have created an
estimation problem because Pi is nonlinear not only in X but
also in the β's as can be seen clearly from (2). This means that
we cannot use the familiar OLS procedure to estimate the
parameters. But this problem is more apparent than real
because (2) can be linearised, which can be shown as follows.
If Pi the probability for owning a house, is given by (3) then
(1-Pi), the probability of not owning a house, is
........................(6)
= β1 + β2Xi
That is, L, the log of the odds ratio, is not only linear in X, but
also (from the estimation viewpoint) linear in the parameters.
L is called the logit, and hence the name logit model for
models like (6) Notice these features of the logit model.
1. As P goes from 0 to 1 (i.e. as Z varies from - ∞ to +
∞), the logit L goes from - ∞ to + ∞. That is, although the
probabilities (of necessity) lie between 0 and 1, the logits are
not so bounded.
2. Although l is linear in X, the probabilities
themselves are not. This property is in contrast with the LPM
model (1) where the probabilities increase linearly with X.
3. Although we have included only a single X variable,
or regressor, in the preceding model, one can add as many
regressors as may be dictated by the underlying theory.
4. If L, the logit, is positive, it means that when the
value of the regressor(s) increases, the odds that the regressant
......(3)
.........(4)
Where F-1 is the inverse of the normal CDF. What all this
means can be made clear from Figure in panel a of this figure
we obtain from the ordinate the (cumulative) probability of
owning a house given 𝐼* ≤ 𝐼 whereas in panel b we obtain
from the abscissa the value of Ii Given the value of Pi which is
simply the reverse of the former.
In the logit model the dependent variable is the log of the odds