Econometrics CH 1-4
Econometrics CH 1-4
May, 2020
Woliso, Ethiopia
CHAPTER ONE
from machine C.
β is the mean difference in output associated
1
machine A.
β is the mean difference in output associated with
2
machine B.
Exercise: Interpret the following model
Yi = β 0 + β 1 Gender + β 2 Instituion + ui
1 male 1 government
Gender = ; Institutio ns =
0 female 0 private
1.2.2 ANOVA Analysis
Example2.: Wage differential between male and female
Two possible ways: a male or a female dummy.
1. Define a male dummy (male = 1 & female = 0).
reg wage male
Result: Yi = 9.45 + 172.84*D + ûi
p-value: (0.000) (0.000)
Interpretation the monthly wage of a male worker is, on
average, $172.84 higher than that of a female worker.
This difference is significant at 1% level.
2. Define a female dummy (female = 1 & male = 0)
reg wage female
Result: Yi = 182.29 – 172.84*D + ûi
p-value: (0.000) (0.000) Interpretation ??
1.2.3 Analysis of Covariance (ANCOVA)
Unlike ANOVA, a regression model may contain
regressors that are all exclusively dummy, or
qualitative, in nature; ANCOVA, is regression with a
mixture of qualitative and quantitative
independent variables.
It is regression on both qualitative and quantitative
independent variables.
A. Single dummy independent variable
Example: Suppose we identified two variables that affect the salary of a given
employee.
Wage = β 0 + β 1 gend + β 2 educ + u i
1 female Gender & level of
gend =
0 male education is the only
wage = wage rate of individual observed factors
educ= level of education
affect wage.
1.2.3 Analysis of Covariance (ANCOVA)
Interpretation
β2
measures the slope.
β1
is the difference in hourly wage between females
and males, given the amount of education. Hence,
the coefficient determines whether there is
discrimination against women.
If β < 01 , then for the same level of education,
women earns less than men, on average.
If we assume the zero conditional mean
assumptions E(U)=0, then:
β 1 = E ( wage gend = 1, educ ) − E ( wage gend = 0, educ )
Key: the level of education is the same in both
individuals; the difference, β is due to gender only.
1
1.2.3 Analysis of Covariance (ANCOVA)
Interpretation
The intercept for male is β 0
Interpretation
Mean salary of female college professor:
E (Y / D = 1, educ ) = ( β 0 + β 1 ) + β 2 educ
β 2 > β 1
1.2.3 Analysis of Covariance (ANCOVA)
If a qualitative variable has more than one category,
the choice of the bench mark category is strictly up to
the researcher.
There is a way suppressed this trap by introducing as
many dummy variable as the number of categorical of
that variable provide we do not introduce the intercept
(constant term) in such a model.
Yi = β 1 D1 + β 2 D 2 + β 3 X + u i
When we run regression, we use the non intercept option in your
regression packages (suppressed intercept)
1.2.3 Analysis of Covariance (ANCOVA)
i. Dummy Variables for Multiple Categories :No Intercept Case
If there is no intercept, we have no comparison,
base group and we did not omitted one category.
Yˆi = 13 ,124 D 1 + 12 , 244 D 2 + 10 , 453 D 3
se (546) (425) (462)
t (13) (14) (12)
R 2
= 0 . 2546
Where Y is salary of teachers
1 west 1 north
D1 = ;D 2 =
0 otherwise 0 otherwise
1 south
D 3 =
0 otherwise
1.2.3 Analysis of Covariance (ANCOVA)
Interpretation
B1= the mean salary of teachers in west=13124
B2= the mean salary of teachers in north=12244
B3= the mean salary of teachers in south=10453
ii. Dummy Variables for Multiple Categories :Case
when constant term is present
Redo the above example, now assume we take west
as base category.
β 1 = ( β 0 + β 1 + β 2 educ ) − ( β 0 + β 2 educ )
Y i = α 1 + α 2 D 2i + α 3 D 3i + β X i + ui
Yi where annual expenditure on clothing
Xi income
1 if female
D2 =
0 if male
1 if college graduate
D3 =
0 otherwise
• In many applications there may be interaction between the two
qualitative variables D and D 3 therefore their effect on mean Y
2
Yi = α 1 + α 2 D 2 i + α 3 D 3 i + α 4 ( D 2 i D 3 i ) + β X i + u i
E (Y i | D 2 = 1, D 3 = 1, X i ) = (α 1 + α 2 + α 3 + α 4 ) + β X i
α 2 = differential effect of being a female
α 3 = differential effect of being a college graduate
α 4 = differential effect of being a female graduate
* The above equation shows that the mean clothing expenditure
of graduate females is different by α 4 from the mean clothing
expenditure of females or college graduates.
* If α 2 , α 3 and α 4 are all + ve, the average clothing expenditure
of females is higher (than the base category, which here is
male non graduate), but it is much more so if the females also
happen to be graduates.
1.2.4 Interactions among Dummy Variables
nbdr
1.2.5 Slope indicator variables
If we assume that house location affects both the
intercept and the slope, then both affects can be
incorporated into a single model.
The model specification will be:
prhou = β 0 + ψ neib + β1nbdr + ω ( nbdr * neib ) + ui
( β 0 + ψ ) + ( β 1 + ω ) nbdr − − − when D = 1
E ( prhou ) =
β 0 + β 1 nbdr − − − − − − − − when D = 0
X = [constant D1 D2 ]
1 X 11 1 0
X = 1 X 12 1 0
1 X 13 0 1
1.4 Structural Stability
2. Chow’s test
One approach for testing the presence of structural change
(structural instability) is by means of Chow’s test. The steps
involved in this procedure:
Step 1: Estimate the regression equation for the whole period
(pre-reform plus post-reform periods) and find the error sum of
squares ( ESSR ) or RRSS.
Step 2: Estimate equation (model) using the available data in the pre-reform
period (say, of size n 1), and find the error sum of squares (ESS1) or RSS1
Step 3: Estimate equation (model) using the available data in the pre-reform
period (say, of size n 2), and find the error sum of squares (ESS2) or RSS2.
Step 4: Calculate RSSUR= RSS1+RSS2.
Step 5: Calculate the Chow test statistic
( RSS R − RSSU ) / k
Fc =
RSSU /(n1 + n2 − 2k )
Where k is number of estimated regression coefficients
1.4 Structural Stability
α
F ( k ,n1 +n2 −2k )
is the critical value from the t-
distribution with k (in our case k=2) and n1+n2-2k
degrees of freedom from a given significance level, α
Decision rule: Reject the null hypothesis of
identical intercepts and slopes for the pre-reform
and post reform periods, that is
β0 = β3
H0 = if Fc > Fb.
β2 = β4
i.e, Rejecting H0 means there is a structural
change.
1.4 Structural Stability
Example: RSS1=64499436.865 (Error sum of
squares in the pre-reform period); n1=12;
RSS2=2,726,652,790.434 (Error sum of squares in
the post-reform period); n2=11;
RSSR=13,937,337,067.461 (Error sum of squares
for the whole period)
RSSU=RSS1+RSS2=2,791,152,227.299
The test statistics is:
( RSS R − RSSU ) / k (13,937,337,067.461 − 2,791,152,227.2) / 2
Fc = = ≈ 190
RSSU /( n1 + n2 − 2k ) (2,791,152,227.299) /(12 + 11 − 2(2))
The tabulated value from the F-distribution with 2
and 19 degrees of freedom at the 5% level of
significance is 3.52.
1.4 Structural Stability
Decision: Since the calculated value of F exceeds
the tabulated value, we reject the null hypothesis of
identical intercepts and slopes for the pre-reform
and post reform periods at the 5% level of
significance.
Hence, we can conclude that there is a structural
break.
1.4 Structural Stability
Draw backs:
Chow’s test does not tell us whether the difference
(change) in the slope only, in the intercept only or
in both the intercept and the slope.
BEING @ COMMITTED
Stay Safe!
2.2.“Dummy dependent
variable”:
Qualitative Response
Model
2.2.1 Introduction
Qualitative Response Model shows situations in
which the dependent variable in a regression
equation simply represents a discrete choice
assuming only a limited number of values
Such a model is called
Limited dependent variable
Discrete dependent variable
Qualitative response
i
0 if i th individual is notworking
where i = 1, 2, …, n.
Qualitative choice analysis
The independent variables (called factors) that are expected to
affect an individual’s choice may be X1 = age, X2 = marital
status, X3 = gender, X4 = education, and the like.
These are represented by a matrix X.
Regression Approach
The economic interpretation of discrete choice models is
typically based on the principle of utility maximization leading to
the choice of, say, A over B if the utility of A exceeds that of B.
Let U1 be the utility from working/seeking work and let U0 be
the utility form not working. Then an individual will choose to
be part of the labour force if U1 -U0 > 0 , and this decision
depends on a number of factors X.
Qualitative choice analysis
The probability that the ith individual chooses
alternative 1th (i.e. works) given his/her individual
characteristics, Xi is:
Ρ i = pr (Y i = 1 / X i ) = Pr[( U 1
− U o ) i > 0] = G ( X i , β )
Yi = Pi +εi ε = Yi − Pi
Xβ
1+ e
the non response probability P(Y =0/X) is
evaluated as:
e Xβ 1
1 − P = P(Y = 0 X ) = 1 − Xβ
=
1+ e 1 + e Xβ
Note that: both response and non- response
probabilities lie in the interval [0 , 1] , and
hence, are interpretable.
Odd ratio: the ratio of the response probabilities
(Pi) to the non response probabilities (1-Pi).
Logit model
For the logit model, the odds ratio is given by:
e Xβ
P P(Y = 1 X ) 1 + e Xβ
= = = e Xβ = e β0 +β1 + e β2 X 2 + e β3 X 3 + ...e βk X k
1 − P P(Y = 0 X ) 1
1 + e Xβ
1 if Y i ∗ > 0
Y =
0 if Y i ∗ ≤ 0
Probit model
The latent variable Y* is continuous (-∞ < Y* <
∞).
It generates the observed binary variable Y.
An observed variable, Y can be observed in two
states:
if an event occurs it takes a value of 1
if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear
function of the observed X’s through the
structural model.
Probit model
However, since the latent dependent variable is
unobserved the model cannot be estimated using
OLS.
Maximization of the likelihood function for
either the probit or the logit model is
accomplished by nonlinear estimation methods.
Maximum likelihood can be used instead.
Most often, the choice is between normal errors
and logistic errors, resulting in the probit
(normit) and logit models, respectively.
Probit model
The coefficients derived from the maximum
likelihood (ML) function will be the coefficients
for the probit model, if we assume a normal
distribution.
If we assume that the appropriate distribution
of the error term is a logistic distribution, the
coefficients that we get from the ML function
will be the coefficient of the logit model.
In both cases, as with the LPM, it is assumed
that E[∈i/Xi] = 0
Probit model
In the probit model, it is assumed that Var
(∈i/Xi) = 1; In the logit model, it is assumed that
Var (∈i/Xi) = π2 3 .
Hence, the estimates of the parameters (β ’s)
from the two models are not directly
comparable.
But as Amemiya suggests, a logit estimate of a
parameter multiplied by 0.625 gives a fairly
good approximation of the probit estimate of the
same parameter.
Probit model
Similarly the coefficients of LPM and logit
models are related as follows:
β LPM = 0.25 Logit, except for intercept
β LPM = 0.25 Logit + 0.5 for intercept
The standard normal cdf has a shape very
similar to that of the logistic cdf.
Probit model
The estimating model that emerges from the
normal CDF is popularly known as the probit
model, although sometimes it is also known as
the normit model.
Note that both the probit and the logit models
are estimated by Maximum Likelihood
Estimation.
Probit model
Interpreting the Probit and Logit Model Estimates
The coefficients give the signs of the partial effects of
each Xj on the response probability, and the statistical
significance of Xj is determined by whether we can
reject H0: Bj=0 at a sufficiently small significance level.
However, the magnitude of the estimated parameters (
dZ/dX) has no particular interpretation. We care about
the magnitude of dProb(Y)/dX.
From the computer output for a probit or logit
estimation, you can interpret the statistical significance
and sign of each coefficient directly.
Probit model
In the linear regression model, the slope
coefficient measures the change in the average
value of the regressand for a unit change in the
value of a regressor, with all other variables
held constant.
In the LPM, the slope coefficient measures
directly the change in the probability of an event
occurring as the result of a unit change in the
value of a regressor, with the effect of all other
variables held constant.
Probit model
In the logit model the slope coefficient of a
variable gives the change in the log of the odds
associated with a unit change in that variable,
again holding all other variables constant.
But as noted previously, for the logit model the
rate of change in the probability of an event
happening is given by βj Pi(1 − Pi ), where βj is
the (partial regression) coefficient of the jth
regressor. But in evaluating Pi , all the variables
included in the analysis are involved.
Probit model
In the probit model, as we saw earlier, the rate of
change in the probability is somewhat complicated
and is given by βj f (Zi ), where f (Zi) is the density
function of the standard normal variable and Zi =
β1 + β2X2i + · · · +βkXki , that is, the regression
model used in the analysis.
Thus, in both the logit and probit models all the
regressors are involved in computing the changes in
probability, whereas in the LPM only the jth
regressor is involved. This difference may be one
reason for the early popularity of the LPM model.
Probit vs logit model
Is logit or probit model is preferable?
In most applications the models are quite similar, the main
difference being that the logistic distribution has slightly
fatter tails.
That is to say, the conditional probability Pi approaches
zero or one at a slower rate in logit than in probit.
Therefore, there is no compelling reason to choose one over
the other.
In practice many researchers choose the logit model
because of its comparative mathematical simplicity.
The standard normal cdf has a shape very similar to that of
the logistic cdf.
Probit vs logit model
The probit and logit models differ in the
specification of the distribution of the error term
u.
The difference between the specification and
the linear probability model is that in the linaer
probability model we analyses the dichotomous
variables as they are, where as we assume the
existence of an underlying latent variable for
which we observe a dichotomous realization.
Probit vs logit model
The probit model and the logit model are not
directly comparable. The reason is that,
although the standard logistic (the basis of logit)
and the standard normal distributions (the basis
of probit) both have a mean value of zero, their
variances are different; 1 for the standard
normal (as we already know) and π2/3 for the
logistic distribution, where π ≈ 22/7.
Therefore, if you multiply the probit coefficient
by about 1.81 (which is approximately = π/√ 3),
you will get approximately the logit coefficient.
Probit vs logit model
The R2’s for the linear probability model are
significantly lower than those for the logit and
probit models. Alternative ways of comparing
the models would be:
To calculate the sum of squared deviations
from predicted probabilities
To compare the percentages correctly
predicted
To look at the derivatives of the probabilities
with respect to a particular independent
variable.
Tobit Model
An extension of the probit model is the tobit model
developed by James Tobin.
Let us consider the home ownership example.
Suppose we want to find out the amount of money the
consumer spends in buying a house in relation to his or
her income and other economic variables.
If a consumer does not purchase a house, obviously we
have no data on housing expenditure for such
consumers; we have such data only on consumers who
actually purchase a house.
Tobit Model
Thus, consumers are divided into two groups, one
consisting of say, N1 consumers about whom we have
information on the regressors (say income, interest rate
etc) as well as the regresand ( amount of expenditure
on housing) and another consisting of say, N2
consumers about whom we have information only on
the regressors but on not the regressand.
A sample in which information on regressand is
available only for some observations is known as a
censored sample. Therefore, the tobit model is also
known as a censored regression model.
Tobit Model
Mathematically, we can express the tobit model
as
β 0 + β1 X 1i + ui if RHS > 0
Yi =
0, otherwise
Where RHS= right hand side
The method of maximum likelihood can be used
to estimate the parameters of such models.
Measuring goodness of fit
The conventional measure of goodness of fit,R2 ,
is not particularly meaningful in binary
regressand models. Measures similar to R2,
Measuring goodness of fit
Measures based on likelihood ratios: The
conventional measure of goodness of fit,R2 , is
not particularly meaningful in binary
regressand models.
Measures similar to R2, called pseudo R2, are
available, and there are a variety of them.
Measures based on likelihood ratios
Let LURbe the maximum likelihood function when
maximized with respect to all the parameters and LR be
the maximum likelihood function when maximized
with restrictions βi =0 .
Measuring goodness of fit
2
LR n
R 2
=1− ( )
L UR
the qualitative dependent variable model, the
likelihood function attains an absolute
maximum of 1. This means that, L ≤ L ≤ 1 R UR
CountR =
2
2
maximum likelihood estimators.
5. The variance that logit model assume is
where as probit model assumes 1.
Lab session
BEING @ COMMITTED
stay Safe!
CHAPTER THREE
144
3.1 Nature of the Time Series data
A sequence of random variables indexed by time is
called a stochastic process or a time series process.
(“Stochastic” is a synonym for random.)
When we collect a time series data set, we obtain
one possible outcome, or realization, of the
stochastic process.
We can only see a single realization, because we
cannot go back in time and start the process over
again.
This is analogous to cross-sectional analysis where
we can collect only one random sample.)
145
3.1 Nature of the Time Series data
Important terminology :
Univariate analysis examines a single data
series.
Bivariate analysis examines a pair of series.
The term vector indicates that we are
considering a number of series: two, three, or
more.
The term ‘‘vector’’ is a generalization of the
univariate and bivariate cases.
146
3.2 Stationary and non-stationary Stochastic
Processes
E(Y ) = E (Y + u ) = Y
1 0 t 0
Var (Yt ) = tσ 2
(Yt − Yt −1 ) = ∆ Yt = u t
☞Where ∆ is the first difference operator. It is
easy to show that, while Yt is nonstationary, its
first difference is stationary. In other words, the
first differences of a random walk time series are
stationary.
2. Random Walk with Drift
Let us modify Y = Y + u as follows:
1 0 i
157
3.2 Stationary and non-stationary Stochastic Processes
Where δ is known as the drift parameter. The
name drift comes from the fact that if we write
the preceding equation as .
it shows that Yt drifts upward or downward,
depending on δ being positive or negative. It is
also an AR(1) model.
Following the procedure discussed for random
walk without drift, it can be shown that for the
random walk with drift model :
158
3.2 Stationary and non-stationary Stochastic Processes
As you can see for RWM with drift, the mean
as well as the variance increases over time.
Thus, it violating the conditions of (weak)
stationary. In short, RWM, with or without
drift, is a nonstationary stochastic process.
The random walk model is an example of what
is known in the literature as a unit root
process.
Unit Root Stochastic Process
Let us write the RWM as:
This model resembles the Markov first-order autoregressive
model that we discussed on autocorrelation.
159
3.2 Stationary and non-stationary Stochastic Processes
If ρ=1, becomes a RWM (without drift). If ρ is in
fact 1, we face what is known as the unit root
problem, that is, a situation of nonstationary; we
already know that in this case the variance of Yt
is not stationary.
The name unit root is due to the fact that ρ=1.
Thus the terms nonstationary, random walk, and
unit root can be treated as synonymous. If,
however, |ρ|<1, that is if the absolute value of ρ
is less than one, then it can be shown that the
time series Yt is stationary in the sense we have
defined it. 160
3.3 Trend Stationary and Difference Stationary Stochastic Processes
If the trend in a time series is completely predictable and not variable, we call it a
deterministic trend, whereas if it is not predictable, we call it a stochastic trend.
To make the definition more formal, consider the following model of the time series
Yt.
--------- -------(a)
Where ut is a white noise error term and where t is time measured chronologically.
Now we have the following possibilities:
RWM without drift:
Pure random walk : If in (a), β 0 = 0, β1 = 0, β 2 = 1, we get --------(b)=non stationary
161
3.3 Trend Stationary and Difference Stationary Stochastic Processes
RWM with drift: Pure random walk withdrift : If in (a), β ≠ 0, β = 0, β = 1, we get
0 1 2
----------(d)-non stationary
-----------(e)—stationary, this means
Yt will exhibit a positive (β1>0) or negative
(β1<0) trend. Such a trend is called a
stochastic trend. Equation (e) is a DSP process
because the nonstationarity in Yt can be
eliminated by taking first differences of the time
series.
Deterministic trend: Purerandomwalk withdrift: If in (a),β ≠ 0, β ≠ 0, β =1, weget
0 1 2
163
3.3 Trend Stationary and Difference Stationary
Stochastic Processes
Random walk with drift & deterministic
trend: If in (a) β 0 ≠ 0, β1 ≠ 0, β 2 = 1, we get
-----(f) non stationary
Deterministic trend with stationary AR (1)
component:
If in (a) β 0 ≠ 0, β1 ≠ 0, β 2 < 1, we get
Yt = β 0 + β1t + β 2Yt −1 + U t , which is
stationary around the deterministic trend.
164
3.4 Integrated Stochastic Process
168
3.5 Tests of Stationarity: The Unit Root Test
174
3.5 Tests of Stationarity: The Unit Root Test
In ADF we still test whether δ = 0 and the ADF
test follows the same asymptotic distribution as
the DF statistic, so the same critical values can be
used.
Hypothesis we use under this test is:
176
3.5 Tests of Stationarity: The Unit Root Test
Phillips and Perron use nonparametric statistical methods
to take care of the serial correlation in the error terms
without adding lagged difference terms.
The Phillips-Perron test involves fitting the following
regression: Y t = β 0 + β 1 t + ρ Y t − 1 + u t
Under the null hypothesis that ρ = 0, the PP Z(t) andρ Z(
) statistics have the same asymptotic distributions as the
ADF t-statistic and normalized bias statistics.
One advantage of the PP tests over the ADF tests is that
the PP tests are robust to general forms of
heteroscedasticity in the error term ut. Another
advantage is that the user does not have to specify a lag
length for the test regression. 177
3.5 Tests of Stationarity: The Unit Root Test
179
Next assignment
The series steps we should followed to do
with time series analysis:
Unit root stationary if all are
stationary at a level optimum leg
length we run directly VAR model.
Unit root stationary if all stationary
at 1st difference optimum length
Johanson co integration
VECMIRF& VDF Granger
causality
180
Lab Session
BEING@COMMITTED!
STAY SAFE!
182
CHAPTER FOUR
INTRODUCTION TO SIMULTANEOUS
EQUATION MODELS
4.1 Nature of Simultaneous Equation models
4.2 Simultaneity Bias & Inconsistency of OLS
estimators
4.3 Solution to Simultaneous Equations
4.4 Identification problem
4.5 Formal Rules (Conditions) for Identification
4.6 Estimation of Simultaneous Equations
Models183
183
4.1 Nature of Simultaneous Equation models
So far we have been discussed by focusing exclusively
on the problems and estimations of a single equation
regression models. In such models, a dependent
variable is expressed as a linear function of one or
more explanatory variables.
i.e, there was a single dependent variable Y and one or
more explanatory variables, X’s.
The cause-and-effect relationship in single equation
models between the dependent and independent
variable is unidirectional.
That is, the explanatory variables are the cause and
the independent variable is the effect.
But there are situations where such one-way or
unidirectional causation in the function is not
1 meaningful.
4.1 Nature of Simultaneous Equation models
This occurs if, for instance, Y (dependent
variable) is not only function of X’s (explanatory
variables) but also all or some of the X’s are, in
turn, determined by Y.
There is, therefore, a two-way flow of influence
between Y and (some of) the X’s which in turn
makes the distinction between dependent and
independent variables a little doubtful.
In simultaneous model there is more than one
equation –one for each of the mutually, or
jointly, dependent or endogenous variables.
The number of equations in such models is equal
to the number of jointly dependent or
endogenous variables involved in the
phenomenon under analysis.
1
4.1 Nature of Simultaneous Equation models
Unlike the single equation models, in
simultaneous equation models it is not usually
possible (possible only under specific
assumptions) to estimate a single equation of
the model without taking into account the
information provided by other equation of the
system.
If one applies OLS to estimate the parameters
of each equation disregarding other equations
of the model, the estimates so obtained are not
only biased but also inconsistent, i.e. even if the
sample size increases indefinitely, the
1 estimators do not converge to their true values.
4.1 Nature of Simultaneous Equation models
Example: the classic example of simultaneous
causality in economics is supply and demand.
Both Prices and quantities adjust until supply
and demand are in equilibrium.
A shock of demand or supply cause both
prices and quantities to move.
As well known, the prices P of a commodity
and quantity Q sold are determined by the
intersection of the demand and supply curves
for that commodity.
Look at the graph of dd and ss from class
discussion(???)
1
4.2 Model specification
Supply function :
Qt = β 0 + β1 Pt + β 2Yt + U 2 t − − − β1 > 0
s
Equilibrium condition: Q t = Q t s d
192
4.3Definitions of Some Concepts
Example: The following simple Keynesian model
of income determination can be considered as a
structural model.
=------------(1)
------------(2)
where: C=consumption expenditure; Z=non-
consumption expenditure ; Y=national income; C
and Y are endogenous variables while Z is
exogenous variable.
Find the reduced form of the above structural
model. Since C and Y are endogenous variables
and only Z is the exogenous variables, we have to
express C and Y in terms of Z.
193
4.3Definitions of Some Concepts
-------(3)
Substituting again (3) into (2) we get;
α 1 U
Y= + Z + ----------(4)
1− β 1− β 1− β
Biasedness:
The two-way causation in a relationship leads to violation
of the important assumption of linear regression model,
i.e. one variable can be dependent variable in one of the
equation but becomes also explanatory variable in the
other equations of the simultaneous-equation model.
In this case E[XiUi] may be different from zero. To
show simultaneity bias, let’s consider the following simple
simultaneous equation model.
Y = α 0 + α1X + U
X = β0 + β 1Y + β 2 Z + V
X = f (Y )
Y = this shows that the 2 way causation in a
f (X )
relationship leads to violations of the important
196
assumptions linear regression model
Suppose that the following assumptions hold,
Ε (U ) = 0 , Ε (V ) = 0
Ε (U 2 ) = σ 2
u , Ε (V 2 ) = σ 2
u
Ε (U i U j ) = 0 , Ε (V i V j ) = 0 , also Ε (UiVi ) = 0 ;
β2 σv 2
β1 = β1 +
ˆ
1 − β 1β 2 σ X 2
199
4.5 Solution to the Simultaneous Equations
The obvious solution is to apply other methods of
estimation w/c gives better estimates of parameters.
1. the reduced form method or indirect least
squares (ISLS)
2. the method of instrumental variables
3. two stage least squares (2SLS)
4. limited information maximum likelihood (LIML)
5. the mixed estimation
6. Three stage least squares
7. Full information maximum likelihood (FIML)
200
4.5 Solution to the Simultaneous Equations
N.B: 1-5---we can applied to one equation at a time,
and 6-7----the systems methods b/c they are applied
to all equations of the system simultaneously.
2
4.6 Direct estimation of the reduced form coefficients
Direct Method: Express the three endogenous
variables(Ct , It , and Yt ) as functions of the
two predetermined variables (Gt, andYt-1)
directly using π’s as the parameters of the
reduced form model as follows.
Ct = π11Yt-1 + π12Gt + V1
It , =π21Yt-1 + π22Gt +V2
Yt =π31Yt-1 + π32Gt + V3
Note: π11 , π12 , π21 , π22 , π31 , and π32 are
reduced from parameters.
2
4.6 Direct estimation of the reduced form coefficients
The reduced form π ’s may be estimated by
the method of least- squares –no restriction
(LSNR).
This means we can apply OLS to reduced
form equation because we express all the
endogenous variables in terms of
exogenous variables.
203
4.6 Direct estimation of the reduced form coefficients
204
4.7 Indirect estimation of the reduced form
coefficients
It is known that there is a relationship
between the reduced form coefficients &
the structural parameters (explained in the
table).
Therefore, to obtain values of coefficients
estimate the structural parameters by any
appropriate econometric techniques and
then substitutes these estimates in to the
system of parameters relationships to
obtain indirectly.
This indirect method involved three steps.
205
4.8 Indirect estimation of the reduced form coefficients
207
4.9 Recursive models
OLS is not applicable if there is interdependence
between the explanatory variables and the error term.
In the simultaneous equation models, the endogenous
variables may depend on the error terms of the
model.
Hence, the OLS technique is not appropriate for
estimation of an equation in a simulations equations
model.
However, in a special type of simultaneous equations
model called Recursive, Triangular or Causal model,
the use of OLS procedure of estimation is appropriate.
Consider the following three equation system to
208
understand the nature of such models:
4.9 Recursive models
Note that:
In the above illustration, the X’s and Y’s are exogenous and
endogenous variables respectively.
The disturbance terms follow the following assumptions.
2
4.9 Recursive models
If this does not hold, the above system is no
longer recursive and OLS is also no longer
valid.
The first equation of the above system
contains only the exogenous variables on the
right hand side.
Since by assumption, the exogenous variable is
independent of U1 , the first equation satisfies
the critical assumption of the OLS procedure.
Hence, OLS can be applied straight forwardly
to this equation.
210
4.9 Recursive models
Let us build a hypothetical recursive model for
an agricultural commodity, say wheat.
The production of wheat =Y1; , may be assumed
to depend on exogenous factors: X2 = climatic
conditions; and X3=last season’s price. The retail
price =Y2 may be assumed to be the function of
production level Y1= and exogenous factor X4=
disposable income.
Finally, the price obtained by the producer = Y3
can be expressed in terms of the retail price; Y2
and exogenous factor; Xj= the cost of marketing
the producer.
The relevant equations of the model may be
211 described as under:
4.9 Recursive models
Y1 = α 1 + α 2 X 2 + α 3 X 3 + U 1
Y2 = α 4 + β 1Y 1 + α 5 X 4 + U 2
Y3 = α 6 + β 2Y 2 + α 7 X 5 + U 3
Y1 − α 1 − α 2 X 2 − α 3 X 3 = U 1
− β 1Y1 + Y2 − α 4 − α 5 X 4 = U 2
− β 2Y2 + Y3 − α 6 − α 7 X 5 = U 3
We can again rewrite this in matrix form as
follows:
2
Under identification (SEP>REP)
218
Under identification (SEP>REP)
Equation number 10.37 and 10.39 were the two reduced form
equations derived from the structural equations number 10.32
& 10.33.
Now if you compare the number of structural equation
coefficients (α0, α1, β0 and β1) are four where as from the
structural equations we have only two coefficients (π0 and π1).
The coefficients of reduced form contain the coefficients of the
structural equations i.e α0, α1, β1 and β2 are found in π0 and
π1.
But how we can find the values of α0, α1, β1 and β2 from π0
and π1. It is an ambiguous question??
Since it is not possible to find these values from π0andπ1 or
the coefficients of the structural equations are greater than
the coefficients of the reduced form then we can say that the
equation is under identified and we can not compute four
structured coefficients from two reduced coefficients.
219
Why under identification is happened?
220
Exact /Just/ Identification (SEP=REP)
It occurs when structural coefficients are equal
to reduced form coefficients.
Now let’s incorporate additional variable in the
demand function in order to solve the above
problem.
221
Exact /Just/ Identification (SEP=REP)
222
Exact /Just/ Identification (SEP=REP)
223
Exact /Just/ Identification (SEP=REP)
224
Exact /Just/ Identification (SEP=REP)
But in case of the demand function α0, α1, and α2, is 3
structural coefficients but in reduced form of equation
the coefficients are two.
Since in the demand function the coefficient of the
reduced form (10.45) is less than the coefficients of the
structural equation (10.40).
We can concluded that the demand function is under
identified (π2,π3) are less than α0,α1,and α2).
But in case of supply function π2,π3 are equal to β0 , β1
then it is just identified.
In conclusion, we can say that the supply function is
identified but the demand function is not identified on the
basis of this one can say that the system as a whole is not
identified.
225
Over identification (SEP<REP)
It occurs when the coefficients (parameters) of
structural equation is less than the coefficients
(parameters) of reduced forms.
Let’s modify the demand function by
incorporating wealth (R) and supply function by
incorporating the lagged price we will have the
following equation.
226
Over identification (SEP<REP)
227
Over identification (SEP<REP)
From equation number 10.47 and 10.48 we have
seven structural coefficients but in equation 10.49
and 10.45 we have eight reduced form coefficients.
Since the coefficients of reduced form coefficients
are greater than the reduced form coefficients we
can say that the system as a whole is over identified.
A function (an equation) belonging to a system of
simultaneous equations is identified if it has a
unique statistical form, i.e. if there is no other
equation in the system, or formed by algebraic
manipulations of the other equations of the system,
contains the same variables as the
function(equation) in question.
228
4.11 Formal Rules (Conditions) for Identification
Identification problems do not just arise only on
two equation-models.
Using the above procedure, we can check
identification problems easily if we have two or
three equations in a given simultaneous equation
model.
However, for ‘n’ equations simultaneous equation
model, such a procedure is very cumbersome.
In general, for any number of equations in a given
simultaneous equation, we have two conditions
that need to be satisfied to say that the model is in
general identified or not.
In the following section we will see the formal
conditions for identification.
229
Formal Rules (Conditions) for Identification
Actually the term ‘identification’ was originally used to
denote the possibility (or impossibility) of deducing the
values of the parameters of the structural relations from
a knowledge of the reduced form parameters.
However, we think that the reduced form approach is
conceptually confusing and computationally more
difficult than the structural model approach, because it
requires the derivation of the reduced from first and then
examination of the values of the determinant formed
form some of the reduced form coefficients.
The reduced form equation is time consuming process.
The structural form approach is simpler and more useful.
Thus, the so called order and rank conditions of
identification lighten the task by providing a systematic
230way.
Formal Rules (Conditions) for Identification
There are two conditions which must be fulfilled for
an equation to be identified. These are:
1. the order condition for identification
2. the rank condition for identification
The identification of a system means the
identification of each question.
The parameters identification in any equations
means there is unique value for each parameter in
equations.
Equation is under identified when its statistical
form is not unique/ When one or more of its
equation of the model are identified we can say that
231the system as a whole is under identified.
Formal Rules (Conditions) for Identification
Equation identified: in this case a system is identified when all
the equations are identified.
In identified system we can have two options:
if an equation is under identified it is impossible to
estimate all its parameters using any econometric
techniques. However, if the equation is identified its
coefficients (parameters) can be statistically estimated.
If the equation is exactly identified appropriate method
for estimation is the method of Indirect Least Square
(ILSM).
If the equation is over identified, ILS will not give unique
estimates of the parameters b/c it will not yield unique
estimates of structural parameters.
In this case we use various methods. These are:
2SLS (Two Stages Least Squares) or
232 MLM(Maximum Likely hood methods)
A. The order condition for identification
This condition is based on a counting rule of the variables
included and excluded from the particular equation.
It is a necessary but not sufficient condition for the
identification of an equation.
The order condition may be stated as follows.
For an equation to be identified the total number of variables (endogenous
and exogenous) excluded from it must be equal to or greater than the
number of endogenous variables in the model less one.
Let, G = total number of equations (= total number of
endogenous variables)
K= number of total variables in the model (endogenous
and predetermined)
M= number of variables, endogenous and exogenous,
included in a particular equation/ in a specific equation.
233
A. The order condition for identification
Then the order condition for identification may
be symbolically expressed as:
238
A. The order condition for identification
• Take equation (3);
• Given; M (endogenous and exogenous variables) in
this specified equation is 4 (y3, y1, y2 and x3); K=6;
G=3;
• (K-M)----------(G-1)
• 6-4-------------(3-1)
• 2=2-- this equation is identified and it is exactly
identified.
Example 3: if a system contains 10 equations with
15 variables, ten endogenous and five exogenous, an
equation containing 11 variables is not identified,
while another containing 5 variables is identified.
239
A. The order condition for identification
For 1st equation we have:
G=10; K=15; M=11;
Order condition:
K-M> G-1
15-11> 10-1
4<9 that is the order condition is not
satisfied.
For the 2nd equation we have:
G=10; K=15; M=5
Order condition:
(K-M)> (G-1); 10>9-----the order conditions
240satisfied.
B. The rank condition for identification
The rank condition states that: in a system of G
equations any particular equation is identified if
and only if it is possible to construct at least one
nonzero determinant of order (G-1) from the
coefficients of the variables excluded from that
particular equation but contained in the other
equations of the model.
The practical steps for tracing the identifiablity of
an equation of a structural model may be outlined
as follows.
Firstly, write the parameters of all the equations of
the model in a separate table, noting that the
parameter of a variable excluded from an equation
241is equal to zero.
B. The rank condition for identification
1st equ. -1 3 0 -2 1 0
2nd equ. 0 1 1 0 0 1
3rd equ. 1 1 1 0 0 2
By deleting the relevant row and columns we are left with the
coefficients of variables not included in the particular equation, but
contained in the other equations of the model.
For example, if we are examining for identification the second
equation of the system, we will strike out the second, third and the
sixth columns of the above table, thus obtaining the following tables.
244
B. The rank condition for identification
245
B. The rank condition for identification
Fourthly, form the determinant(s) of order (G-1) and examine
their value.
Guide line:
If at least one of these determinants is non-zero, the equation
is identified.
If all the determinants of order (G-1) are zero, the equation is
under identified.
In the above example of exploration of the identifiability of the
second structural equation we have three determinants of order
(G-1)=3-1=2. They are:
246
B. The rank condition for identification
247
B. The rank condition for identification
The identification of a function is achieved by
assuming that some variables of the model have
zero coefficient in this equation, that is, we
assume that some variables do not directly affect
the dependent variable in this equation.
This, however, is an assumption which can be
tested with the sample data.
We will examine some tests of identifying
restrictions in a subsequent section.
Some examples will illustrate the application of
the two formal conditions for identification.
248
B. The rank condition for identification
Example:
2
Estimation of Simultaneous Equations Models
250
1. Ordinary Least Squares
251
1. Ordinary Least Squares
In equation 10.51 the endogenous variables appear in the left & the
exogenous variables in the right hand side.
Hence, OLS can apply straight forwardly to this question given all the
assumptions of OLS holds true.
In equation 10.52 we can apply OLS provided that Y1 & U2 are
uncorrelated.
Again we can apply OLS to the last equation if both Y1 & Y2 are
uncorrelated with U3.
In this recursive system OLS can be applied to each equation
separately & we do not face a simultaneous equation problem.
The reason for this is that clear, because there is no interdependence
among the endogenous variables.
Thus, Y1 affect Y2 influence Y3 without being influenced by Y3.
In other words each equation exhibits a unilateral causal dependence.
252
2. Indirect least square (ILS method)
Stay Safe!
257
CHAPTER FIVE
BEING @ COMMITTED
Stay Safe!
`~~~THE END~~~
GOOD LUCK!
Being @ committed!
All rights are reserved!