Chapter 2 SEM 2025
Chapter 2 SEM 2025
𝒀𝒊 = 𝜶𝟎 + 𝜶𝟏 𝑿𝒊 + α𝟐 𝒁𝟏𝒊 +𝝁𝒊
ൠ…………………………….(2.1)
𝑿𝒊 = 𝜷𝟎 + 𝜷𝟏 𝒀𝒊 + β𝟐 𝒁𝟐𝒊 +Ɛ𝒊
❖ Where, 𝒁𝟏𝒊 is a variable which affects 𝒀𝒊 but not 𝑿𝒊 and the variable 𝒁𝟐𝒊
affects only 𝑿𝒊 but not 𝒀𝒊 .
❖ ↋𝒊 and μ𝒊 are the stochastic disturbance terms.
❖ From the first line of equation (2.1) we can notice that 𝒀𝒊 is a function
of 𝑿𝒊 and 𝒁𝟏𝒊 .
❖ On the other hand, in the second equation, 𝑿𝒊 (which was serving as an
independent variable in the first equation) becomes a function of 𝒀𝒊 and
𝒁𝟐𝒊 -which was treated as a dependent variable in the first equation-
❖ So, we can’t identify whether the variables 𝒀𝒊 and 𝑿𝒊 are dependent or
independent in this SEM.
❖ A model constitutes a system of simultaneous equations if all the
relationships involved are needed for determining the value of at least
one of the endogenous variables included in the model.
❖ This implies that at least one of the relationships includes more than one
endogenous variable.
❖ Since 𝒁𝟏𝒊 and 𝒁𝟐𝒊 are observable, they are called observed shifters in
the SEM.
❖ On the other hand μ𝒊 and ↋𝒊 are not observed but affect the two equations
respectively and are, thus, called unobserved shifters.
❖ Example:
❖ Consider the following SEM involving two-equation system.
𝒀𝟏 = α𝟎 + α𝟏 𝒀𝟐 + α𝟐 𝒁𝟏 + μ
ൠ…………..………………(2.4)
𝒀𝟐 = β𝟎 + β𝟏 𝒀𝟏 + β𝟐 𝒁𝟐 + Ɛ
❖ Where, 𝒀𝟏 and 𝒀𝟐 are endogenous variables (because they are correlated
with the error terms), and 𝒁𝟏 and 𝒁𝟐 are exogenous (because they are
assumed to be independent of, or uncorrelated with the error terms).
❖ For the first equation to be identified, there should be at least one
variable which is excluded from the first equation- but included in the
other equation - and is uncorrelated with the error term of the first
equation.
❖ This is known as exclusion restrictions.
❖ Therefore, since 𝒁𝟐 is excluded from the first equation, and since is
uncorrelated with the error term μ - because it is assumed to be
exogenous-the first equation is identified.
❖ Similarly, for the second equation to be identified there should be at least
one variable which is excluded from the second equation, and is
uncorrelated with the error term (Ɛ) of this equation.
❖ Since 𝒁𝟏 is excluded from the second equation, and since 𝒁𝟏 is assumed
to be exogenous (uncorrelated with the error term, Ɛ ) the second
equation is identified.
Take another SEM which involves three-equation system.
𝒀𝟏 = 𝜶𝟎 + 𝜶𝟏 𝒀𝟐 + 𝜶𝟐 𝒀𝟑 + 𝜶𝟑 𝒁𝟏 + 𝛍
𝒀𝟐 = 𝜷𝟎 + 𝜷𝟏 𝒀𝟏 + 𝜷𝟐 𝒁𝟏 + 𝜷𝟑 𝒁𝟐 + 𝜷𝟒 𝒁𝟑 + Ɛ ቑ … … … … . . . (𝟐. 𝟓)
𝒀𝟑 = 𝜸𝟎 + 𝜸𝟏 𝒀𝟐 + 𝜸𝟐 𝒁𝟏 + 𝜸𝟑 𝒁𝟐 + 𝜸𝟒 𝒁𝟑 + 𝜸𝟓 𝒁𝟒 + 𝐯
❖ Where, 𝒀𝟏 , 𝒀𝟐 , and 𝒀𝟑 are endogenous variables, 𝒁𝟏 , 𝒁𝟐 , 𝒁𝟑 and 𝒁𝟒 and
are exogenous variables. 𝜶𝒊 , β𝒊 and γ𝒊 are estimators of the three
structural equations, respectively.
❖ It is generally difficult to show that an equation is identified in an SEM
with more than two equations, but it is easy to see when certain equations
are not identified.
❖ The first equation is identified (is at least promising) because three
exogenous variables 𝒁𝟐 , 𝒁𝟑 and 𝒁𝟒 are excluded from this equation.
❖ The second equation is identified (is at least promising). Why?
❖ But, the third equation is not identified because no exogenous variables
excluded from this equation.
❖ That, is 𝒁𝟏 (appearing in the first and second equation), 𝒁𝟐 and 𝒁𝟑
(appearing in the second equation) and 𝒁𝟒 all appear in the third
equation itself.
❑ Formally speaking, there are two rules/conditions which must be
fulfilled for an equation to be identified.
❖ Note from the above that in the first equation the number of excluded
exogenous variables (= 3) is greater than the number of endogenous
variables (= 2).
❖ Such an equation is called over identified.
❖ In the second equation, the number of excluded exogenous variables (= 1) is
exactly equal to the number of endogenous variables (= 1).
❖ In this case, the equation is called just-identified.
❖ In the third equation, the number of excluded exogenous variables (= 0) is
less than the number of endogenous variables (= 1) and the equation is called
unidentified equation.
❖ Under-identified equation means we can’t estimate the parameters of a
structural equation from the estimated reduced-form coefficients.
✓ Order condition is a necessary (but not sufficient) condition for
identification.
❖ For example, from equation (2.5) we have said that the second equation is
identified because of the presence of an excluded variable from this equation.
❖ But, if 𝜸𝟓 = 𝟎, it means 𝒁𝟒 is not correlated with 𝒀𝟏 , 𝒀𝟐 , and 𝒀𝟑 will be
eliminated from the model, so the second equation will be unidentified.
❖ This again illustrates that identification of an equation depends not only
on the presence of an excluded variable but also on the values of the
parameters (which we can never know for sure) in the other equations.
✓ The sufficient condition for identification is called the rank condition.
2) The rank condition for identification
❖ The rank condition states that in a SEM containing G equations any
particular equation is identified if and only if it is possible to construct at
least one non-zero determinant of order (G-1) from the coefficients of the
variables excluded from that particular equation but contained in the other
equations of the model.
❖ Remember from your linear algebra course that, the term rank refers to the
rank of a matrix and is given by the largest-order square matrix (contained
in the given matrix) whose determinant is nonzero.
❖ Alternatively, the rank of a matrix is the largest number of linearly
independent rows or columns of a matrix.
❑ To understand the order and rank conditions, let’s introduce the
following notations:
❖ Let,
M = number of endogenous variables in the model
m = number of endogenous variables in a given equation
K = number of exogenous variables in the model including the
intercept
k = number of exogenous variables in a given equation
I) Order Condition:
✓ In a model of M simultaneous equations in order for an equation to be
identified, it must exclude at least 𝑴 − 𝟏 variables (endogenous as
well as exogenous) appearing in the model.
✓ If it excludes exactly 𝑴 − 𝟏 variables, the equation is just identified.
✓ If it excludes more than 𝑴 − 𝟏 variables, it is over-identified.
✓ In a model of M simultaneous equations, in order for an equation to be
identified, the number of exogenous variables excluded from the
equation must not be less than the number of endogenous variables
included in that equation less 1, that is, 𝑲 − 𝒌 ≥ 𝒎 − 𝟏.
✓ If 𝑲 − 𝒌 = 𝒎 − 𝟏, the equation is just identified, but if 𝑲 − 𝒌 >
𝒎 − 𝟏, it is over identified.
II) Rank condition:
✓ In a model containing M equations in M endogenous variables, an
equation is identified if and only if at least one nonzero determinant of
order (𝑴 − 𝟏)(𝑴 − 𝟏) can be constructed from the coefficients of the
variables (both endogenous and predetermined) excluded from that
particular equation but included in the other equations of the model.
✓ In a model containing simultaneous equations:
1) If 𝑲 − 𝒌 > 𝒎 − 𝟏, and the rank of the A matrix is 𝑴 − 𝟏, the
equation is over identified.
2) If 𝑲 − 𝒌 = 𝒎 − 𝟏, and the rank of the matrix A is 𝑴 − 𝟏,
the equation is exactly identified.
3) If 𝑲 − 𝒌 ≥ 𝒎 − 𝟏, and the rank of the matrix A is less than
𝑴 − 𝟏, the equation is under-identified.
4) If 𝑲 − 𝒌 < 𝒎 − 𝟏, the structural equation is unidentified.
The rank of the A matrix in this case is bound to be less than 𝑴 −
𝟏. (Why?)
❑ Steps of checking rank condition
1) Bring all items of each equation, except the error term, to the left of
the equal sign
2) Put all the endogenous and exogenous variables in a row
3) Put the corresponding coefficients of each variable beneath each
variable
4) Construct a matrix from excluded variables (both exogenous and
endogenous) and check for its rank.
5) Then,
✓ If we can form more than one (M-1) by (M-1) matrix of non-zero
determinant, the matric is said to be over identified.
✓ If we can form exactly one (M-1) by (M-1) matrix of non-zero
determinant, the matric is said to be just identified.
✓ If we can not form at least (M-1) by (M-1) matrix of non-zero
determinant, the matric is said to be under identified.
Illustration
✓ Given the following simultaneous equation model,
𝒀𝟏 = 𝜶𝟎 + 𝜶𝟏 𝒀𝟐 + 𝜶𝟐 𝒀𝟑 + 𝜶𝟑 𝒁𝟏 + 𝛍
𝒀𝟐 = 𝜷𝟎 + 𝜷𝟏 𝒀𝟑 + 𝜷𝟐 𝒁𝟏 + 𝜷𝟑 𝒁𝟐 + Ɛ
𝒀𝟑 = 𝜸𝟎 + 𝜸𝟏 𝒀𝟏 + 𝜸𝟐 𝒁𝟏 + 𝜸𝟑 𝒁𝟐 + 𝐯
𝒀𝟒 = θ𝟎 + θ𝟏 𝒀𝟏 + θ𝟐 𝒀𝟐 + θ𝟑 𝒁𝟑 + 𝒘
✓ To check the order condition for identification, look at the following table.
Equation No. endo. No. excluded Exo. Identification
No. Variables Variables Condition
𝟏𝒔𝒕 2 ( = 𝒀𝟐 and 𝒀𝟑 ) 2 ( = 𝒁𝟑 and 𝒁𝟒 ) 2=2 All are
𝟐𝒏𝒅 1 ( = 𝒀𝟑 ) 1 ( = 𝒁𝟑 ) 1=1 Exactly/Just
Identified
𝟑𝒓𝒅 1 ( = 𝒀𝟏 ) 1 ( = 𝒁𝟑 ) 1=1
𝟒𝒕𝒉 2 ( = 𝒀𝟏 and 𝒀𝟐 ) 2 ( = 𝒁𝟏 and 𝒁𝟐 ) 2=2
❖ Now, Let us recheck with the rank condition.
✓ First, bring all items, except error terms, to the left
𝒀𝟏 − 𝜶𝟎 − 𝜶𝟏 𝒀𝟐 − 𝜶𝟐 𝒀𝟑 − 𝜶𝟑 𝒁𝟏 = 𝛍
𝒀𝟐 − 𝜷𝟎 −𝜷𝟏 𝒀𝟑 − 𝜷𝟐 𝒁𝟏 − 𝜷𝟑 𝒁𝟐 = Ɛ
𝒀𝟑 − 𝜸𝟎 − 𝜸𝟏 𝒀𝟏 − 𝜸𝟐 𝒁𝟏 − 𝜸𝟑 𝒁𝟐 = 𝐯
𝒀𝟒 − θ𝟎 − θ𝟏 𝒀𝟏 − θ𝟐 𝒀𝟐 − θ𝟑 𝒁𝟑 = 𝒘
✓ Second, put all the endogenous and exogenous variables in a row and
put the corresponding coefficients beneath each variable.
Equation No. I 𝒀𝟏 𝒀𝟐 𝒀𝟑 𝒀𝟒 𝐙𝟏 𝐙𝟐 𝐙𝟑
𝟏𝒔𝒕 −𝜶𝟎 1 −𝜶𝟏 −𝜶𝟐 0 − 𝜶𝟑 0 0
𝟐𝒏𝒅 −𝜷𝟎 0 1 −𝜷𝟏 0 − 𝜷𝟐 −𝜷𝟑 0
𝟑𝒓𝒅 −𝜸𝟎 −𝜸𝟏 0 1 0 − 𝜸𝟐 −𝜸𝟑 0
𝟒𝒕𝒉 −θ𝟎 −θ𝟏 − θ𝟐 0 1 0 0 −θ𝟑
✓ Consider the first equation, which excludes variables 𝒀𝟒 , 𝐙𝟐 , and 𝐙𝟑
✓ For this equation to be identified, we must obtain at least one nonzero
determinant of order 3 × 3 from the coefficients of the variables excluded from
this equation but included in other equations.
✓ To obtain the determinant we first obtain the relevant matrix of coefficients of
variables 𝒀𝟒 , 𝐙𝟐 , and 𝐙𝟑 included in the other equations.
✓ In the present case there is only one such matrix, call it A, defined as follows.
𝟎 −𝜷𝟑 𝟎
A = 𝟎 −𝜸𝟑 𝟎
𝟏 𝟎 −𝜽𝟑
✓ If we find the determinant of matrix A, it is equal to zero.
✓ This implies the rank of the matrix is less than 3 and it is not identified.
✓ Therefore, although the order condition shows that the SEM is identified, the
rank condition shows that it is not.
✓ Note: As noted, the rank condition is both a necessary and sufficient condition for
identification.
3.3.2. Indirect Least Squares (ILS), Instrumental
Variable (IV) and Two Stage Least Squares
(2SLS) estimation of structural equations
❖ Once we finished modeling SEM, the next task is estimation.
❖ Yet, estimation problem is rather complex because there are a
variety of estimation techniques with varying statistical
properties.
❖ In an SEM, two approaches may be adopted to estimate the
structural equations, namely:
✓ single-equation methods, also known as limited
information methods, and
✓ system methods, also known as full information
methods.
❖ In the single-equation methods, we estimate each equation in the system
(of simultaneous equations) individually, taking into account any
restrictions placed on that equation (such as exclusion of some
variables) without worrying about the restrictions on the other equations
in the system, hence the name limited information methods.
❖ In the system methods, on the other hand, we estimate all the equations
in the model simultaneously, taking due account of all restrictions on
such equations by the omission or absence of some variables (recall that
for identification such restrictions are essential), hence the name full
information methods.
❖ Although the systems method -such as the full information maximum
likelihood (FIML) method- may be good to preserve the spirit of
simultaneous-equation models, in reality they are not commonly used
for different reasons.
✓ Some of these reasons include:
1) High burden of the computation: for example, estimating 20
equations require incorporating 151 coefficients for US economy in
1955.
2) The systems methods, such as FIML, lead to solutions that are highly
non-linear in the parameters and are, therefore, often difficult to
determine.
3) If there is a specification error (eg., a wrong functional form or
exclusion of relevant variables) in one or more equations of the
system, that error is transmitted to the rest of the system. As a result, the
systems methods become very sensitive to specification errors.
❖ Due to the above problems, therefore, single-equation methods are often
used in practice.
❖ These include: Ordinary least squares (OLS); Indirect least squares
(ILS); and Two-stage least squares (2SLS)
1) Indirect least squares (ILS)
❖ For a just or an exactly identified structural equation, the method of
obtaining the estimates of the structural coefficients from the OLS
estimates of the reduced-form coefficients is known as the method
Indirect Least Squares(ILS), and the estimates obtained are known as the
indirect least squares estimates.
Step 1: Obtain the reduced-form equations; solve for the endogenous
variable in each equation in terms solely of the exogenous variables and the
stochastic error term.
✓ This gives the reduced-form equations.
Step 2: Apply OLS to the reduced-form equations individually.
✓ Since the explanatory variables in these equations are
predetermined/exogenous variables which are uncorrelated with the
stochastic disturbances, this operation is permissible, and the
estimates obtained are consistent.
Step 3: Obtain estimates of the original structural coefficients from the
estimated reduced-form coefficients obtained in Step 2.
✓ If an equation is exactly identified, there is a one-to-one correspondence
between the structural and reduced-form coefficients; that is, one can
derive unique estimates of the former from the latter.
Example:
𝑄𝑡 = 𝛼0 + 𝛼1 𝑃𝑡 + 𝛼2 𝐼𝑡 + 𝜇𝑡
ൠ … … … … … … … … … … … … … … 2.6
𝑄𝑡 = 𝛽0 + 𝛽1 𝑃𝑡 + Ɛ𝑡
✓ Where, 𝑄𝑡 and 𝑃𝑡 are quantity and price which are endogenous, 𝐼𝑡 is
income and is exogenous in the first equation, 𝜇𝑡 is error term of the first
equation and Ɛ𝑡 is error term of the second equation.
✓ Then, given equation (2.6)
1) Determine the identification condition of equation system?
2) Find the estimators of the structural equation using ILS?
Solution:
1) The first equation is demand function (Why?). As a result, the second
equation has to be supply function.
✓ In the first equation, there is no excluded variable; hence the demand
function is under-identified. But, Income,𝑰𝒕 , is excluded from the second
equation; hence supply function is just-identified.
2. Since the second equation is just identified, we can apply ILS and solve
for 𝑸𝒕 and 𝑷𝒕 in this equation as in the following.
✓ At equilibrium, Supply = Demand.
✓ Hence, substitute 𝑸𝒕 from the first equation in to 𝑸𝒕 of the second
equation:
𝜶𝟎 + 𝜶𝟏 𝑷𝒕 + 𝜶𝟐 𝑰𝒕 + 𝛍𝒕 = 𝜷𝟎 + 𝜷𝟏 𝑷𝒕 + Ɛ𝒕
𝜶𝟏 𝑷𝒕 − 𝜷𝟏 𝑷𝒕 = 𝜷𝟎 − 𝜶𝟎 − 𝜶𝟐 𝑰𝒕 + Ɛ𝒕 − 𝛍𝒕
𝑷𝒕 (𝜶𝟏 − 𝜷𝟏 ) = 𝜷𝟎 − 𝜶𝟎 − 𝜶𝟐 𝑰𝒕 + Ɛ𝒕 − 𝛍𝒕
Where 𝜶𝟏 ≠ 𝜷𝟏
𝜷𝟎 − 𝜶𝟎 −𝜶𝟐 Ɛ 𝒕 − 𝛍𝒕
𝑷𝒕 = + 𝑰𝒕 +
𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏
𝑷𝒕 = θ𝟎 + θ𝟏 𝑰𝒕 + 𝒗𝒕 … … … … … … … … … … … … … … … … … … … . 𝟐. 𝟕
𝜷𝟎 −𝜶𝟎 −𝜶𝟐 Ɛ𝒕 −𝛍𝒕
✓ Where, θ𝟎 = ; θ𝟏 = ; and 𝒗𝒕 =
𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏
✓ To estimate 𝑸𝒕 , substitute the estimate of 𝑷𝒕 from equation (2.7) in to either
the demand or supply function of equation (2.6):
𝑸𝒕 = 𝜷𝟎 + 𝜷𝟏 𝑷𝒕 + Ɛ𝒕
𝜷𝟎 − 𝜶𝟎 −𝜶𝟐 Ɛ 𝒕 − 𝛍𝒕
𝑸 𝒕 = 𝜷𝟎 + 𝜷𝟏 ( + 𝑰𝒕 + ) + Ɛ𝒕
𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏
𝜷𝟎 − 𝜶𝟎 −𝜷𝟏 𝜶𝟐 Ɛ 𝒕 − 𝛍𝒕
𝑸 𝒕 = 𝜷𝟎 + 𝜷𝟏 + 𝑰𝒕 + 𝜷𝟏 + Ɛ𝒕
𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏
𝜷𝟎 𝜶𝟏 − 𝜷𝟎 𝜷𝟏 + 𝜷𝟎 𝜷𝟏 − 𝜷𝟏 𝜶𝟎 −𝜷𝟏 𝜶𝟐 𝜷𝟏 Ɛ𝒕 − 𝜷𝟏 𝛍𝒕 + 𝜶𝟏 Ɛ𝒕 + 𝜷𝟏 Ɛ𝒕
𝑸𝒕 = + 𝑰𝒕 +
𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏
❖ Simplifying this gives:
𝜷𝟎 𝜶𝟏 − 𝜷𝟏 𝜶𝟎 −𝜷𝟏 𝜶𝟐 𝜶𝟏 Ɛ𝒕 − 𝜷𝟏 𝛍𝒕
𝑸𝒕 = + 𝑰𝒕 +
𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏
𝑸𝒕 = γ𝟎 + γ𝟏 𝑰𝒕 + 𝒘𝒕 … … … … … … … … … … … … … … … … . . … 𝟐. 𝟖
𝜷𝟎 𝜶𝟏 −𝜷𝟏 𝜶𝟎 −𝜷𝟏 𝜶𝟐 𝜶𝟏 Ɛ𝒕 −𝜷𝟏 𝛍𝒕
Where, γ𝟎 = ; γ𝟏 = ; and 𝒘𝒕 =
𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏 𝜶𝟏 −𝜷𝟏
❖ Look once again back to the structural equations (2.6).
❖ It consists of five structural coefficients/parameters; namely
𝜶𝟎 , 𝜶𝟏 , 𝜶𝟐 , 𝜷𝟎, 𝒂𝒏𝒅 𝜷𝟏 .
❖ But, there are only four equations to estimate those structural coefficients,
namely, the four reduced-form coefficients θ𝟎 , θ𝟏 , γ𝟎 and γ𝟏 .
❖ Since the number of coefficients in the structural equations (=5) is greater
than the number of coefficients in the reduced form equations (=4), unique
solution of all the structural coefficients is not possible.
✓ As a result, only the parameters of the supply function can be
identified as (derived from the four reduced form parameters):
𝜸𝟏
𝜷𝟎 = 𝜸𝟎 − 𝜷𝟏 𝜽𝟎 and 𝜷𝟏 = ……………….…….(2.9)
𝜽𝟏
✓ Remember from chapter of econometrics I that estimating equation
(2.7) using OLS results:
σ 𝒑𝒕 𝒊𝒕
𝜽𝟏 = ഥ ത
𝟐 and 𝜽𝟎 = 𝑷 − 𝜽𝟏 𝑰
σ 𝒊𝒕
✓ Similarly, estimating equation (2.8) using OLS results:
σ 𝒒𝒕 𝒊𝒕
ෝ𝟏 =
𝜸 ෝ𝟎 = 𝑸
𝟐 and 𝜸
ഥ −𝜸 ෝ𝟏 ത𝑰
σ 𝒊𝒕
✓ Where, the lowercase letters, as usual, denote deviations from mean
value, and, 𝑸ഥ, 𝑷
ഥ and ത𝑰 are the sample mean values of Q and P and I.
✓ Substituting estimates of the reduced-form coefficients in to equation
(2.9) gives the ILS estimators of structural equation parameters for
supply functions:
ෝ𝟏
𝜸
𝜽𝟎
ෝ 𝟎 − 𝜷𝟏 𝜽 𝟎 = 𝜸
𝜷𝟏 = and 𝜷𝟎 = 𝜸 ෝ𝟎 − 𝜸ෝ𝟏 ……..……………(2.10)
𝜽𝟏 𝜽𝟏
✓ But, as far as the demand function is under-identified, there is no unique
way of estimating the parameters, and remains under-identified.
✓ As the name 2SLS implies estimation of SEM using 2SLS involves two
stages. These are:
Stage 1: Estimate the endogenous variable using all exogenous
variable of the SEM
✓ To get rid of the likely correlation between endogenous variable(income)
and the error term, Ɛ𝑖 , we find the best linear combination among all the
exogenous variables in the whole system, not just that equation.
✓ The linear combination of Experience and Education in equation (2.12),
which we call 𝑖𝑛𝑐𝑜𝑚𝑒 ∗ becomes:
𝑖𝑛𝑐𝑜𝑚𝑒 ∗ = 𝜋0 + 𝜋1 𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝜋2 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 … … … … … 2.13
✓ But, since we don’t know the exact value of 𝑖𝑛𝑐𝑜𝑚𝑒 ∗ , we can only
estimate, using OLS, by regressing income on experience and education
as:
= 𝜋ො 0 + 𝜋ො 1 𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝜋ො 2 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 … … … … … 2.14
𝐼𝑛𝑐𝑜𝑚𝑒
Or,
𝐼𝑛𝑐𝑜𝑚𝑒 = 𝐼𝑛𝑐𝑜𝑚𝑒 + 𝑣ො … … … … … … … . … … … … … … … … … . . (2.15)
✓ Then, conduct a joint significance of variables for equation (2.13) using
F-test. If the variable are found jointly significant (not larger than 5%),
as an IV.
then use the fitted values of income, 𝐼𝑛𝑐𝑜𝑚𝑒,
✓ All the above tasks involve Stage-I.
Stage 2: Substitute the estimated value of the endogenous variable
obtained from Stage I and estimate the over-identified model of
the SEM
✓ Substitute equation (2.15) in the second equation of (2.11) results,
+ 𝑣)
𝐴𝑠𝑠𝑒𝑡 = 𝛽0 + 𝛽1 𝐼𝑛𝑐𝑜𝑚𝑒 + 𝜇𝑖 = 𝛽0 + 𝛽1 (𝐼𝑛𝑐𝑜𝑚𝑒 ො + 𝜇𝑖
+ 𝑤𝑖 … … … … … . . … … … . … … … (2.16)
𝐴𝑠𝑠𝑒𝑡 = 𝛽0 + 𝛽1 𝐼𝑛𝑐𝑜𝑚𝑒
Where, 𝑤𝑖 = 𝛽1 𝑣ො + 𝜇𝑖
✓ Then, directly estimate the asset model in (2.16) using OLS method.
✓ Equation (2.16) is very similar in appearance, with the second equation of
(2.11) with the only difference being that actual value of income is replaced
using exogenous variables.
by its estimated value, 𝐼𝑛𝑐𝑜𝑚𝑒,
❖ Therefore, the advantage of 2SLS (as in this case) is that the error term,
Why?
Ɛ𝒊 , is correlated with income, but not 𝑰𝒏𝒄𝒐𝒎𝒆.
✓ As a result, OLS estimation of on equation (2.16), will give unbiased
and consistent estimate unlike equation (2.10).
Features of 2SLS
1) It can be applied to an individual equation in the SEM without
directly taking into account any other equation(s) in the system.
✓ For this, the method has been used extensively in practice, for
solving econometric models involving a large number of equations.
2) ILS provides multiple estimates of a parameter in the over
identified equations, but 2SLS provides only one.
3) It is easy to apply because all one needs to know is the total number
of exogenous or variables in the system, without knowing any other
variables in the system.
4. It can also be applied to exactly identified equations and gives
identical estimates with ILS.
5. If the 𝑹𝟐 values in the reduced-form regressions (that is, Stage-1
regressions) are very high, say, in excess of 0.8, the classical OLS
estimates and 2SLS estimates will be very close. Why?
6. In reporting the ILS regression, we did not state the standard errors of
the estimated coefficients.
✓ But we can do this for the 2SLS estimates because the structural
coefficients are directly estimated from the second-stage (OLS)
regressions, though there may be some modification (see 𝒘𝒊 =
ෝ + 𝝁𝒊 under equation (2.16) above).
𝜷𝟏 𝒗
Numerical example: Undertake 2SLS on the following simple Keynesian model
Consumption function: Ct = 0 + 1 Y t + ut 0 1 1 EQ 1
Income identity: Yt = C t + I t (= S t ) EQ 2
Use the Data Below to demonstrate OLS is biased and inconstant, and 2SLS is
unbiased estimator
Time I C Y
1 2 3 5
2 3 4 7
3 2 2 4
4 4 5 9
5 2 3 5
6 3 4 7
7 2 4 6
First: to understand the difference between OLS and 2SLS, run OLS for C as
function of income (reg C Y on Stata as follow)
. reg C Y
Now, let’s see the 2SLS estimate for the above same data: Steps for 2SLS
• Run the first-stage regression (Y on I).
• Get the fitted values Ŷ.
• Run the second-stage regression (C on Ŷ).
• Interpret the coefficient on Ŷ (α₁).
• Reflect: Why can’t we just use OLS? What problem does 2SLS solve?
Short cut method of 2SLS in STATA
. ivregress 2sls C ( Y = I )
Endogenous: Y
Exogenous: I
Discussion Questions
1. Why can't we use OLS directly to estimate equation the above relationship?
2. What makes Y endogenous in this model?
4. How does 2SLS correct for endogeneity?
Exercise (practice the following to question)
1) Given an Econometric model,
𝒚 = 𝜷𝒙 + Ɛ
a) Show that if x is correlated with Ɛ, the OLS estimate of β, which is β
will be biased and inconsistent estimator.
b) If we find a variable Z, correlated with X and uncorrelated with Ɛ, then
Z can be used as an instrument for X. In this case, show that β becomes
unbiased and consistent estimator of β.
2) Consider a two equation model:
𝑦1 = α1 𝑦2 + α2 𝑥 + 𝑒1
𝑦2 = β1 𝑦1 + β2 𝑥 + 𝑒2
a) Identify which variable/s is/are exogenous and endogenous.
b) What will happen to the identification of the equations if β2 = 0 ?
c) Find the reduced-form equations and find the formulas for estimating
the structural coefficients if β2 = 0
Assignment ONE
Use the attached STATA file (FERTIL2.RAW) and the following information to answer about IV & 2SLS (submit
your results in LOG along with DO file). This question has been taken from Wooldridge textbook.