Quantitative Methods in Economics With Application
Quantitative Methods in Economics With Application
1|Page
Contents
1.1Nature and Importance of Mathematical Economics ............................................... 5
1.3. Advantages of the Mathematical Approaches ....................................................... 5
1.4. Review of Economic Model and Analysis .............................................................. 6
1.5. Static (Equilibrium) Analysis in Economics ........................................................... 6
1.5.1. Partial Market Equilibrium - A Linear Model .......................................................6
1.5.2. Partial Market Equilibrium - A Nonlinear Model ..................................................7
1.5.3. General Market Equilibrium ..............................................................................8
1.5.4. Equilibrium in National-Income Analysis ........................................................... 8
CHAPTER TWO .........................................................................................................9
REVISION ON CALCULUS AND LINEAR ALGEBRA ........................................................9
2.1. Differential Calculus: Fundamental Techniques .................................................... 9
2.1.1. The Concept of the Derivative. .........................................................................9
2.1.2. The Rules or Theorems of Derivates ...............................................................10
a) The constant rule .................................................................................................10
b) The Simple Power rules ........................................................................................10
c) The coefficient rules ............................................................................................. 10
d) The sum or difference rule ...................................................................................10
e) The product rules ................................................................................................ 10
g) Differentiation of a composite function ..................................................................11
i) Derivatives of Logarithmic and Exponential Functions ............................................ 12
2.2. Integral calculus: Techniques and applications of integral calculus ...................... 14
2.2.1. The Indefinite Integral ...................................................................................14
2.2.2. Techniques of Integration & the General Power Rule .......................................14
2.2.3. Definite Integrals .......................................................................................... 16
2.2.4. Continuous Money Streams ............................................................................17
2.2.4.1. Continuous Money Streams within a Period ..................................................17
2.2.5. Improper Integrals ........................................................................................ 18
2.3.1. Matrix Operations ..........................................................................................20
2.3.2. Determinants and Inverse of a Matrix .............................................................26
2.3.2.1. The Concept of Determinants, Minor and Cofactor ........................................26
2.3.2.2. The Inverse of a Matrix ...............................................................................31
2.3.3. Matrix Representation of Linear Equations ...................................................... 32
2.3.4. Input out put analysis (Leontief model) .......................................................... 33
2.3.5. Linear Programming ...................................................................................... 35
3.2.2. Linear Approximation .................................................................................... 39
3.2.6The Intermediate Value Theorem ..................................................................... 48
3.3.2The Multivariate Chain Rule ............................................................................. 53
2|Page
CHAPTER FOUR ...................................................................................................... 55
UNCONSTRAINED OPTIMIZATION ........................................................................... 55
4.1. Functions with one variable ................................................................................ 55
4.1.1. The concept of optimum (extreme) value ........................................................... 55
4.3 Implicit functions and unconstrained Envelope theory ......................................... 71
CHAPTER FIVE ....................................................................................................... 74
5. CONSTRAINED OPTIMIZATION ........................................................................... 74
CHAPTER SIX ......................................................................................................... 96
COMPARATIVE STATIC ANALYSIS .....................................................................96
CHAPTER SEVEN ...................................................................................................108
DYNAMIC OPTIMIZATION. .................................................................................... 108
7.1.1 First –Order Linear Differential Equations .......................................................109
7.2. The Dynamic Stability of Equilibrium ................................................................113
ECONOMETRICS ONE ............................................................................................121
1.1 Definition and scope of econometrics ............................................................... 121
1.5 Methodology of econometrics ...........................................................................123
Chapter Two .........................................................................................................125
2.1 THE CLASSICAL REGRESSION ANALYSIS ..........................................................125
2.1. Stochastic and Non-stochastic Relationships .................................................... 126
2.2. Simple Linear Regression model. .....................................................................127
2.2.1 Assumptions of the Classical Linear Stochastic Regression Model .....................127
2.2.2 Methods of estimation .........................................................................128
2.2.2.1 The ordinary least square (OLS) method .....................................................129
2.2.2.2 Estimation of a function with zero intercept .................................................130
2.2.2.3. Statistical Properties of Least Square Estimators .........................................131
2.2.2.4. Statistical test of Significance of the OLS Estimators (First Order tests) ........ 133
2.6 Confidence Intervals and Hypothesis Testing .................................................... 134
CHAPTER THREE ................................................................................................141
The Multiple Linear Regression Analysis ................................................................. 141
3.1 Introduction ....................................................................................................142
3.2 Assumptions of Multiple Regression Models .......................................................142
3.3. A Model with Two Explanatory Variables ..........................................................143
3.3.1 Estimation of parameters of two-explanatory variables model ....................... 143
3.3.2 Coefficient of Multiple Determination(R2) ....................................................... 144
Chapter Four ........................................................................................................ 150
Violations of basic Classical Assumptions ................................................................ 150
4.3.4.1 Test Based on Auxiliary Regressions: .......................................................... 172
3|Page
4.3.4.3 Test of multicollinearity using Eigen values and condition index: ...................174
4.3.4.4 Test of multicollinearity using Tolerance and variance inflation factor .......... 174
PART TWO ........................................................................................................... 176
ECONOMETRICS TWO ...........................................................................................176
CHAPTER ONE ...................................................................................................... 176
1.2.3. Regression on One Quantitative Variable and Two Qualitative Variables ..........180
1.2.4. Interactions Among Dummy Variables .......................................................... 181
1.2.5. Testing for Structural Stability of Regression Models ......................................182
1.3. Dummy as Dependent Variable ..................................................................... 182
1.3.2. The Logit and Probit Models .........................................................................184
CHAPTER TWO ..................................................................................................... 189
2.6 The Phenomenon of Spurious Regression ......................................................... 197
CHAPTER THREE .................................................................................................. 200
CHAPTER FOUR .................................................................................................... 208
4|Page
CHAPTER ONE
INTRODUCTION TO MATHEMATICAL ECONOMICS
The major difference between these two approaches is that in the former, the assumptions and
conclusions are stated in mathematical symbols rather than words and in equations rather than
sentences. Symbols are more convenient to use in deductive reasoning, and certainly are more
conducive to conciseness and preciseness of statement. The choice between literary logic and
mathematical logic, is a matter of little important, but mathematics has the advantage of forcing
analysts to make their assumptions explicit at every stage of reasoning.
But an economist equipped with the tools of mathematics is like a normal person with motorboat
or ship depending upon his personal inclination to each. As a result, most economic researchers
are extensively using the tools of mathematics to economic reasoning.
The term mathematical economics is also different from econometrics in that the former is
concerned with the application of mathematics to the purely theoretical aspects of economics
analysis, with little or no concern about statistical problems as the error of measurement of the
variables under study, while the later focuses on the measurements and analysis of economic
data; hence it deals with the study of empirical observations using statistical methods of
estimation and hypothesis testing.
5|Page
mathematical theorems, it keeps us from the pitfall of an unintentional adoption of
unwanted implicit assumptions
d) It helps us to understand relationships among two or more economic variables simply
and neatly with which the geometric and literary approaches are at high probability of
risk of committing mistakes.
e) it allows us to treat the general n-variable case.
For the present analysis we will focus on two types of the equilibrium. One is the equilibrium
attained by a market under given demand and supply conditions. The other is the equilibrium of
national income under given conditions of consumption and investment patterns.
Three variables
6|Page
Qs = the quantity supplied of the commodity;
The model is
Qd = a - bp (a, b > 0)
Qs = -c + dp (c, d > 0)
Note that, contrary to the usual practice, quantity rather than price has been plotted vertically
in the Figure.
a - bp = -c + dp
and thus
(b + d) p = a + c.
P = a + c/ b+d
Since the denominator (b+d) is positive, the positivity of Q requires that the numerator
(ad-bc)> 0. Thus, to be economically meaningful, the model should contain the additional
restriction that ad > bc.
Qd = 4 - p2
7|Page
Qs = 4p - 1
As previously stated, this system of three equations can be reduced to a single equation by
In general, given a quadratic equation in the form ax2 + bx + c = 0 (a>0), its two roots can be
obtained from the quadratic formula:
-b+_ √b2-4ac/2a where the “+ " part of the "±" sign yields Xj and the "-" part yields x2.
Thus, by applying the quadratic formulas to p2 + 4p - 5 = 0, we have Pl = 1 and p2= -5, but only
the first is economically admissible, as negative prices are ruled out.
of a commodity are functions of the price of that commodity alone. In the real world, there
would normally exist many substitutes and complementary goods. Thus, a more realistic model
for the demand and supply functions of a commodity should take into account the effects not
only of the price of the commodity itself but also of the prices of other commodities.
Y= C + Io + G0 (equilibrium condition)
where Y and C stand for the endogenous variable’s national income and consumption
expenditure, respectively, Ca stands for autonomous consumption, and I0 and G0 represent the
exogenously determined investment and government expenditures.
Solving these two linear equations, we obtain the equilibrium national income and
8|Page
consumption expenditure: Y=Ca+bY+I0+G0, by collecting like terms we have;
Y=Ca+I0+G0/1-b
C=Ca+b(Ca+I0+G0/1-b)
CHAPTER TWO
REVISION ON CALCULUS AND LINEAR ALGEBRA
2.1. Differential Calculus: Fundamental Techniques
As you might remember from the previous discussion, the derivative is also the slope of the
tangent line to f (x) at a point. Before we directly go to the discussion of the rules of derivatives,
let’s see some examples.
2
Examples: For f (x)=X
Solution
Note that,
i. The derivative, which is one of the most fundamental concepts in calculus, is the same
as the following two concepts.
a. Slope of a line tangent to a curve at x
b. Instantaneous rate of change of f(x) at x
ii. The process of obtaining f’ (x) from f (x) is known as differentiation and if the derivative
9|Page
exists, then the function is differentiable at a point or over an interval. In the following
section, we will present the rules of derivatives, which will largely simplify our
calculation of the derivative.
2.1.2. The Rules or Theorems of Derivates
In this section, we formally state the different rules of derivatives that are important in several
problems of getting the derivative. These are the constant rule, the simple power rule, the
coefficient rule, sum/difference rule, product rule, quotient rule, chain rule, and others.
a) The constant rule
If c is constant and if f(x)=c, then f’(x)=0
Examples. If f (x)=2, then f ‘(x)=0
b) The Simple Power rules
n-1
If where then = nx
c) The coefficient rules
For any constant C and function m(x),
Example
Example
1.
10 | P a g e
f) The Quotient rule
If u (x) and V (x) are two functions and if f(x) is the quotient,
Example If Y
Solution
Example Y
Solution Let
11 | P a g e
up with . But using implicit differentiation one can calculate .
Suppose we have an implicit function given by
Then, totally differentiating the equation we get
If Y
.
See expansion from the previous discussion
12 | P a g e
j. Derivatives of exponential functions
U(x)
If Y=a , Using chain rule . As a result, we have
x
The exponential function e is a unique function with special behavior. That is.
f(x)
However, in case we have e , we can use the chain rule to evaluate its derivative. That is let
u
f (x)=u and we will have e .
u
If Y=e and U= f(x)
Thus,
13 | P a g e
Solution
Up to this time, we have been concerned with finding the derivative of a function. However,
there are some problems in economics that require us to come up with the original function given
its rate of change. These problems are common in the area of social welfare, distribution of
income of a country etc. The technique of finding out the original function based on its
derivative is known as integration or anti derivative. More specifically, you will learn that the
concept of integration is the exact opposite of differentiation.
2.2.1. The Indefinite Integral
Given the primitive function F (x) which yields the derivative f (x) , you have the following
definitions.
You have the following notation that denotes the integration of f (x) with respect to x ,
f ( x)dx
The symbol is called the integral sign. The f (x) part is known as the integrand (that is the
function to be integrated). The dx part tells that the operation is to be carried out with respect to
the variable x .
One important point that you should note is that while the primitive function F (x) invariably
yields a unique derivative f (x) , the derived function can have an infinite number of possible
primitive functions through integration. Why? Because, if F (x) is an integral of f (x) , then so
is F (x) plus a constant for the derivative of a constant is zero.
1.
1.∫
14 | P a g e
c) Sum or Difference Rule: For
Example:
1.
= iff n ≠ -1.
But, the following formula applies when n = -1. That is
= ln |u| + C
We use absolute value to protect it from being negative & the whole logarithm undefined. This
.
Example: Evaluate the indefinite integral of the following.
1)
Solution
1) Let 4x + 1 = U
15 | P a g e
Thus, ∫
The second type of non-algebraic function we saw was exponential functions. In
relation to exponential function, we have seen that
Thus,
Moreover, we have seen that when U is a function of X, that is,
give .
Let f(x) be a function and “a” and “b” be real numbers. The definite integral of f(x) over the
interval from X=a to X=b denoted by is the net change of an anti-derivative of f(x) over the
Note that it is used to find definite integral and not antiderivative of a function.
Now it is time to look at some examples so that the idea of definite integral and fundamental
theorem of calculus will be well planted in your mind.
16 | P a g e
2.2.4. Continuous Money Streams
You will formally be introduced with the notion of continuous compounding of interest in
monetary economics. If P dollars is invested at annual rate r compounded continuously, then its
value F at the end of t years is given by equation
. Solving this equation for P in terms of F gives the
Written in this from, the equation tells us the present value P of F dollars that will be received t
years from now, assuming that interest is compounded continuously at the annual rate r.
Example
Bethel knows that she will need to replace her car in 3 year. How much would she have to put in
the bank today at 8% interest compounded continually in order to have the $12,000 she expects
to need 3 years from now?
Solution
We apply the boxed formula, remembering that the interest rate of 8% must be written in its
decimal form.
.
2.2.4.1. Continuous Money Streams within a Period
There are many situations in business and industry where it is useful to think of money as
flowing continuously into an account. For example, the ABC Company plans to buy an
expensive machine which it estimates will increase the company’s net income by $10,000 per
year. But this income won’t come at the end of each year; it will come in dribbles throughout the
year. As a model for what will happen, it is convenient to think of the machine as if it will
produce income continuously. Our next example raises an important question for the company
to answer.
Suppose that the annual rate of income of an income stream is R (t).
Suppose further that this income stream will last over the period 0 < t < T. If interest is at rate r
compounded continuously, then the present value PV of this income stream is given by
Example
Suppose the ABC Company takes a more optimistic view and estimates that the new machine
will produce income at the rate R (t) = 10,000 + 200t. What is the present value of this income
stream, again assuming interest at 9% compounded continuously and a lifetime of 8 years?
Solution
17 | P a g e
Let us modify the problem in the above example in another way by assuming that the machine
will last indefinitely. More generally, let us ask for the present value PV of a perpetual (meaning
infinitely long) income stream which produces income at an annual rate R(t), assuming that
interest is at an annual rate r compounded continuously. The result in this case is
Here, we face a new kind of integral, one with an infinite limit. Such integrals are called
improper integrals and must be given a clear definition. We do this by defining
This point leads us to the concept known as improper integrals which we will handle right now!
2.2.5. Improper Integrals
With the counter-revolution of new classical macroeconomics, there has been a shift by
economists to consider an infinitely lived agent. This agent is widely used in perfect foresight-
model, rational expectation model, Ricardian equivalence theorem, endogenous economic
growth models, among others. Hence, saving and investment decision of households is governed
by the utility function agents maximize throughout, that is,
Where
U(c) is the utility function the social planner maximizes,
-B is the discount rate agents assign to future consumption,
C is percapita consumption
L is the size of population
And δ is the inverse of the intertemporal elasticity of substitution.
Now we have faced some what new, but definite integral, problem in this function. Such type of
integral is known as improper integrals. An improper integral is an integral which is bounded by
18 | P a g e
a) For
=
Note: In each three cases of above, if limit exists, then the improper integral is said to converge;
otherwise the improper integral diverges.
Examples
Solution
2. Find ∫
Solution
19 | P a g e
3. Now it is time for you to check whether you have grasped the concept or not. To this end, why
don’t you handle this problem? Use your paper and pen right now!
Definitions
A matrix is defined as a rectangular array of numbers, parameters, or variables. It is usually
enclosed in brackets and sometimes in parentheses or double vertical lines.
The members of the array are referred to as the elements of the matrix.
The number of rows and the number of columns in a matrix together define the dimension of
the matrix. If a matrix has m rows and n columns, then that matrix is said to be of
dimension (m n) .
Consider the above matrices. As you can see, matrix A consists of three rows and three columns.
Therefore, it is said to be of dimension (3 3) . Coming to matrices x and d , you can see that both
are of dimension (3 1) . That is both have a single column and three rows. You will have a
discussion such type of matrices in the next sub Section. You should remember that writing the
dimension of a matrix, the row number always precedes the column number.
In general, if
a11 a12 a1n
a a 22 a1n
A 21
,
a m1 a m 2 a mn
then the matrix is said to have m rows and n columns and it is of dimension (m n) . The
subscripts of under each element of the matrix show the column and the row where each element
belongs.
2.3.1. Matrix Operations
a) Equality of Matrices
Before going into matrix operations (addition, subtraction and multiplication), let us first study
what is meant by equality of two matrices. Suppose you have two matrices. The two matrices are
said to be equal if and only if they have the same dimension and identical elements in the
corresponding locations in the arrays. Consider the following matrices.
1 3 5 2 1 1 1 3 5
A 4 1 2
B 4 1 2 C 4 1 2
2 3 0 2 3 0 2 3 0
It can be conclude that A C because the two matrices have the same dimension and identical
elements. Though, matrices A and C have the same dimension, they are not equal since the
20 | P a g e
elements in their first row are different. Thus, we conclude A B .
b) Addition and Subtraction of Matrices
Two matrices can be added if and only if they have the same dimension (i.e. they have the same
number of rows and columns). If this condition is satisfied, then the two matrices are said to be
conformable for addition. The sum of the matrices is obtained by adding the corresponding
elements of the matrices.
Similarly, subtraction of a matrix from another matrix is possible if and only if both matrices are
of the same dimension (i.e. they have the number of rows and columns). The difference between
two matrices is obtained by subtracting each corresponding element.
Generally, if
a11 a12 a13 b11 b12 b13
A a 21 a 22 a 23 and B b21
b22 b23
a31 a32 a 33 b31 b32 b 33
Then
a11 b11 a12 b12 a13 b13
A B a 21 b21 a 22 b22 a 23 b23
a 31 b31 a 32 b32 a 33 b33
a11 b11 a12 b12 a13 b13
A B a 21 b21 a 22 b22 a 23 b23
a 31 b31 a 32 b32 a 33 b33
Examples
a) Compute A B and A B given
2 4 3 1 4 5
A and
1 3 1 1 2 2
Solution
Since both matrices have the same dimension, they are conformable for addition and
subtraction. Thus, we can compute A B and A B .
Hence,
2 4 3 1 4 5 2 1 4 4 3 5 3 8 8
A B
1 3 1 1 2 2 1 1 3 2 1 2 2 5 3
2 4 3 1 4 5 2 1 4 4 3 5 1 0 2
A B
1 3 1 1 2 2 1 1 3 2 1 2 0 1 1
21 | P a g e
b) Find A B if
2 1 1 2
A and B
2 1 1 2
Solution
Since the matrices have identical dimension, you can compute A B . Hence,
2 1 1 2 2 1 1 2 3 3
A B
2 1 1 2 2 1 1 2 3 3
b) Matrix Multiplication
i) Multiplication by a Scalar
If k is a real number and A is a matrix, then multiplying the matrix A by the number k is referred to
as the multiplication of matrix A by scalar. In this case, every element of the matrix is multiplied
by the constant k .
Examples
4 2 2 10
2 0 7 2
a) If k 3 and A , then find k A .
6 5 4 7
1 3 1 0
Solution
4 2 2 10 3 4 3 2 3 2 3 10 12 6 6 30
2 0 7 2 3 2 3 0 3 7 3 2 6 0 21 6
k A 3
6 5 4 7 3 6 3 5 3 4 3 7 18 15 12 21
1 3 1 0 3 1 3 3 3 1 3 0 3 9 3 0
1 2 3 9
b) If k and A , find k A .
2 6 4 5
Solution
1 1 1 3 9
1 2 3 9 2 2 2
3 9
2
1
2 2
k A
2 6 4 5 1 6 1 1 5
4 5 3 2
2 2 2 2
22 | P a g e
In other words, in order to multiply two matrices, the first matrix (the “lead” matrix) must have
the as many columns as the rows of the second matrix (the “lag” matrix). Matrices that fulfill this
condition are said to be conformable for multiplication.
If matrix A is of dimension (m n) and matrix B is of dimension (n p ) , then multiplication of
matrix A by matrix B is possible because the number of columns of matrix A (that is n ) equals the
number of rows matrix B .
What is the dimension of the resulting matrix? The dimension of the resulting matrix is (m p ) .
This implies that it has the same number of rows as the first matrix (that is A ), and the same
number of columns as the second matrix (that is B ).
Symbolically, A( mn ) B( n p ) AB( m p ) where the subscripts denote dimension
How can you multiply two matrices? So far, you have seen the conditions that must be fulfilled
for matrix multiplication. Now, you will discuss the procedure of multiplication of matrices. In
multiplying two matrices, you add the products of elements of the rows of the 1st matrix and
elements of the columns of the 2nd matrix. The sum of the products of elements of the 1st row of
the 1st matrix and elements of the 1st column of the 2nd matrix yields the 1st row-1st column
element for the product matrix. Similarly, the sum of products of elements of 1st row of the 1st
matrix and elements of the 2nd column of the 2nd matrix forms the 1st row-2nd column element for
the product matrix and so on. In other words, matrix multiplication is row by column.
23 | P a g e
c 21 a 21 (b11 ) a 22 (b21 ) a 23 (b31 ) (that is multiplying the second row elements of
matrix A with the first column corresponding elements of matrix B and then adding the products)
c 22 a 21 (b12 ) a 22 (b22 ) a 23 (b32 ) (that is multiplying the second row elements of
matrix A with the first column corresponding elements of matrix B and then adding the products)
Examples
a) Given
2 1 3 1
A and B , find AB.
1 3 ( 22 ) 2 0 ( 22 )
Solution
AB is defined since the condition for multiplication is satisfied. That is the column
number of the first matrix (that is A ) equals the row number of the second matrix (that is B ).The
2 1 3 1
product matrix will have a dimension (2 2) . Therefore A B
1 3 ( 22 ) 2 0 ( 22 )
2(3) 1(2) 2(1) 1(0) 8 2
A B
1(3) 3(2) 1(1) 3(0) 8 1
b) Given
1 2
2 2 0
A 3 0 and B , find AB.
3 1 4 ( 23)
2 1 ( 32 )
Solution
The matrices are conformable for multiplication and the resulting matrix has a
dimension (3 3) . Thus,
1 2
2 2 0
A B 3 0
3 1 4 ( 23)
2 1 ( 32 )
24 | P a g e
1 1 0 0 1 0
A 0 2 1 and B 0 0 1 find, AB.
1
1 0 0 ( 33) 1 0
2 ( 33)
Solution
The conformability condition for multiplication is satisfied. Hence,
1 1 0 0 1 0
A B 0 2 1 0 0 1
1
1 0 0 ( 33) 1 0
2 ( 33)
1
1(0) 1(0) 0(1) 1(1) 1(0) 0( 2 ) 1(0) 1(1) 0(0) 0 1 1
1 1
A B 0(0) 2(0) 1(1) 0(1) 2(0) 1( ) 0(0) 2(1) 1(0) 1 2
2 2
1(0) 0(0) 0(1) 1(1) 0(0) 0( ) 1(0) 0(1) 0(0) 0
1 1 0
2
d) Given
2 1 3
A B , find AB .
2 1 ( 22 ) 2 ( 21)
Solution
The conformability condition is satisfied. Hence,
2 1 3
A B
2 1 ( 22 ) 2 ( 21)
2(3) 1( 2) 8
A B
2(3) 1( 2) 8
iii) Multiplication of Vectors
If A is an ( m 1) column vector and B is a (1 n) row vector, then their product yields a matrix of
dimension (m n) .
Example
a) Given
1
A 2 and B 4 5(12 ) , find AB .
3 ( 31)
Solution
25 | P a g e
1
A B 2 4 5(12 )
3 ( 31)
1(4) 1(5) 4 5
A B 2(4) 2(5) 8 10
3(4) 3(5) 12 15
Therefore, the product of vectors A and B yields another matrix with dimension (3 2) .
2.3.2. Determinants and Inverse of a Matrix
In the previous section, you have been introduced to the basic concepts of matrix algebra. In this
section, you will study the concept of a determinant and its use in determining the inverse of a
matrix.
2.3.2.1. The Concept of Determinants, Minor and Cofactor
a) Determinants
The determinant of a square matrix is a uniquely defined scalar (number) associated with that
matrix. The determinant of a matrix A is denoted by A . Determinants are defined only for square
matrices. The determinant of an (n n) matrix is known as determinant of order n . How can we
find the determinant? Here are the answers.
i) Determinants of Order One
Let matrix A a11 . That is it matrix with only one element. Then the determinant of
matrix A is A a11 .
The determinant of this matrix is defined as the difference of two terms as shown below:
a11 a12
A a11 a 22 a 21 a12
a 21 a 22
That is it is obtained by multiplying the two elements in the principal diagonal of matrix A and
then subtracting the product of the two remaining elements. Since matrix A is a (2 2) matrix, the
determinant is called a second order determinant (determinant of order two).
26 | P a g e
Examples
a) Find the determinant of the matrix
10 4
A
8 5
Solution
Using the formula given above, the determinant of matrix A is
10 4
A 10(5) 8(4) 18
8 5
Here, you have a sum three terms. The vertical bracket in the first term is the determinant of the
matrix obtained after removing the 1st row and the 1st column. The second bracket is the
determinant of the matrix obtained by removing the 1st row and the 2nd column. The vertical
bracket in the third term is the determinant of the matrix obtained by removing the 1st row and
the 3rd column. The elements in each term are the 1st, 2nd and 3rd elements of the first row of
matrix A . In other words, each term is a product of a first row element and a particular second
order determinant. Therefore, you have
A a11 (a 22 a33 a32 a 23 ) a12 (a 21a33 a31a 23 ) a13 (a 21a32 a31a 22 )
After rearranging, you get
A a11 a 22 a33 a12 a 23 a31 a13 a 21a32 a31a 22 a13 a32 a 23 a11 a33 a 21a12
Laplace Expansion
Determinants of a (3 3) matrix can be obtained alternatively by using the Laplace expansion. In
this approach, any two rows of the matrix are written again and from an expanded matrix. The
illustration that follows shows how the determinant can be found using this approach. Consider
a (3 3) matrix given below.
27 | P a g e
a11 a12 a13
A a 21 a 22 a 23
a 31 a 32 a 33
_ _ _
a 21 a 22 a 23 a 21 a 22
a 31 a 32 a 33 a 31 a 32
Then, you multiply the elements along the three lines falling to the right (along the downward
pointing arrows) and give all these products a plus sign. Thus, you have
a11 a 22 a 33 a12 a 23 a 31 a13 a 21 a 32
Similarly, you multiply elements along the three lines rising to the right (along the upward
pointing arrows) and give all these products a negative sign. As a result, you get
a 31 a 22 a13 a 32 a 23 a11 a 33 a 21 a12
The sum of the above two terms is exactly the same as expression for the determinant of
matrix A shown above. That is
A a11 a 22 a33 a12 a 23 a31 a13 a 21a32 a31a 22 a13 a32 a 23 a11 a33 a 21a12
Examples
a) Given
2 1 3
A 4 5 6 , compute the determinant.
7 8 9
Solution
2 1 3
5 6 4 6 4 5
A 4 5 6 2 1 3
8 9 7 9 7 8
7 8 9
A 2(5 9 8 6) 1(4 9 7 6) 3(4 8 7 5)
28 | P a g e
A 2(45 48) 1(36 42) 3(32 35)
A 2(3) 1(6) 3(3)
A 9
Deleting the 1st row and the 1st column of the determinant of matrix A gives the
a 22 a 23
subdeterminant . This sub determinant is called the minor of element a11 and it is
a32 a33
denoted by M 11 . Therefore, the minor of element a11 is written as
a 22 a 23
M 11 a 22 a33 a32 a 23
a32 a33
In general, if matrix A is a square matrix of dimension (3 3) or more, then canceling the i th row
and the j th column of the determinant of that matrix gives the minor of the element a ij which is a
sub determinant denoted by M ij .
Having discussed what a minor is, it is time to move onto the concept of cofactor. The cofactor
denoted by Cij is obtained by
Cij (1) i j M ij where M ij is the minor of the element a ij obtained by canceling the
i th row and the j th column of the determinant of a particular matrix. i and j refer respectively to
the i th row and the j th column.
Therefore, this suggests that a cofactor is a minor with a prescribed algebraic sign attached to it.
For instance, considering matrix A given above, the cofactor C11 is
29 | P a g e
a 22 a 23
C11 (1)11 M 11 M 11 a 22 a33 a32 a 23
a32 a33
Similarly, the cofactor C 23
a11 a12
C 23 (1) 23 M 23 M 23 (a11 a32 a31a12 ) a31a12 a11 a32
a31 a32
Note that the minor M 23 is obtained by canceling the 2nd row and the 3rd column of the
determinant matrix A .
The determinant of a matrix can be found by using the concept of the cofactor. In this case, the
determinant of a matrix is obtained by first taking either a row or a column of that matrix and
multiplying each element in that row or column by its cofactor. Then adding these products
yields the determinant of the matrix. For instance, you can calculate the determinant of
matrix A given above as follows. Suppose you take the first row. Thus, the determinant of
matrix A is
A a11C11 a12 C12 a13C13
Since
a 22 a 23
C11 (1)11 M 11 M 11
a32 a33
a 21 a 23
C12 (1)1 2 M 12 M 12
a31 a33
a 21 a 22
C13 (1)13 M 13 M 13
a31 a32
Therefore,
a 22 a 23 a 21 a 23 a 21 a 22
A a11 a12 a13
a32 a33 a31 a33 a31 a32
This exactly what you had when we defined the determinant of a (3 3) matrix.
Examples
Compute the cofactor of the element at the 2nd row-2nd column of matrix
30 | P a g e
1 2 1
A 3 4 0
2 1 1
Solution
First you need to determine the minor of the element at the 2nd row-2nd column. Thus, the
minor in this case is obtained by canceling the 2nd row-2nd column. Therefore,
1 1
M 22 (1)(1) (2)(1) 1
2 1
31 | P a g e
The last step in finding the inverse of matrix A is dividing the adjoint of matrix A by the
determinant. Therefore, it follows that
C11 C 21 C 31
1
1
A C12 C 22 C 32
A
C13 C 23 C 33
The inverse of a matrix, say A , exits if and only if the determinant of that matrix is different
from zero i.e., A 0 . Otherwise, with A 0 , the inverse becomes undefined and hence, no need
to proceed with the subsequent steps of matrix inversion.
Here is the summary of the steps involved in matrix inversion. Given matrix A
1) find the determinant of A and proceed to the next steps if A 0
2) find the cofactor matrix for A
3) obtain the adjoint of matrix A
1
4) multiply the adjoint of matrix A by and this gives you the inverse of matrix A
A
Examples a) Given
3 2
A , find the inverse.
1 0
Solution
First, find the determinant. Thus, you have
3 2
A (3)(0) (1)(2) 2 (Since A 0 , the inverse for matrix A is defined and
1 0
hence, you can proceed to the next steps)
Then, determining the cofactor matrix, you get
C11 C12 0 1
C
C 21 C 22 2 3
Note that since the matrix is a (2 2) , the minor of each element in this case is
a (1 1) determinant.
Transposing the cofactor matrix, we get the adjoint matrix
0 2
adj ( A) C
1 3
Lastly, dividing the adjoint matrix by the determinant gives us the inverse matrix. That is
1 1 0 2 0 1
A 1 adj ( A) 1 3
A 2 1 3
2 2
2.3.3. Matrix Representation of Linear Equations
32 | P a g e
Having studied the basics of matrix algebra and matrix inversion, it is now time to revise and you
come across the application of matrix algebra in solving systems of linear equations. You are
expected to revise the Inverse method, the Cramer’s Rule, and the Gauss-Jordan Elimination
technique as your reading assignments from linear algebra course.
Matrix representation
Suppose you have the following system of linear equations.
a11 x1 a12 x 2 a13 x3 b1
a 21 x1 a 22 x 2 a 23 x3 b2
a 31 x1 a 32 x 2 a 33 x3 b3
The system can be expanded into matrix form as:
a11 a12a13 x1 b1
a a 22 a 23 x 2 b2
21
a 31
a 32 a 33 x3 b3
Now, you can calculate the unknown values by the following equation using any the above three
method:
2 x1 x 2 3
2 x1 2 x 2 4
2.3.4. Input out put analysis (Leontief model)
Input – output analysis attempts to establish equilibrium conditions under which industries a an
economy have just out put to satisfy each other demands in addition to final (outside) demand.
Commonly speaking the production of one good required the input of many other goods as
intermediate inputs in the production process. Hence, the, total demand for a product is the
summation all the intermediate demand for the product plus the final demand for the product
arising from consumers, investors, government, exporters, etc as ultimate users.
Assume in hypothetical economy there are n- numbers of sectors that produce their own
homogenous products x1 x2, -----, Xn.
Therefore, total demand (x1) = demand from inter mediate users in the economy + ultimate users
demand for their final consumption.
i,e. x1 = x11 + x12 + …….. + x1n + f1
X2 = x21 + x22 + ……. + X2n + f2
“ “ “ “
“ “ “ “
Xn = xn1 xn2 + …. + xnn + fnn
Where, xi - refers total output produced by firm i
Xij – refers total output part produced by sector i and used by sector j or
the delivery of sector i’s output to sector j as input
fi – refers output requirement of sector i for final demand.
33 | P a g e
- Let‘s introduce a new variable in to the model.
xij
aij = , where aij – refers coefficient matrix elements or
xj
an input requirement by sulfur j from sector j to produce one unit B xj
Therefore, from the above relationship
x1 a11 a12 .... a1n
x x1 f1
a a ... a 2n
21 22
2
x2 f 2
. .
. .
.
. . .
. .
xn f n
x2 an1 an 2 .... ann
0 .3 0 .2 15
A= , F= Find the equilibrium output levels of the sectors outputs
05 0.6 20
Solution
x1
x I A * F
1
x2
1 0 0 .3 0. 2
I A
0 1 0. 5 0.6
0 . 7 0 .2
I A
0 .5 0.4
I A1 Adj I A
I A
So the determinant (I – A) = 0.18
34 | P a g e
0 .4 0 .5
C
0 .2 0 .7
0 .4 0 .2
C'
0 .5 0 .7
1 0 .4 0 .2
Thus , I A 1
0.18 0.5 0.7
Adj (I –A) = cofactors transpose (CT) . x I A , F
1
35 | P a g e
Dear Distance learners! In order to refresh your memory solve the following linear programming
problem.
Example:-
Given a Primal problem
Min C = 20Z1 + 30Z2 +16 Z3
Subject to: 2.5Z1 +3Z2 +Z3 >= 3
Z1+ 3Z2 +2Z3 >= 4
Z1, Z2, Z3 > =0
Form the dual counter part of the problem
Solution
The dual problem is = Max C* = 3x1 + 4x2
Subject to 2.5X1 + X2 <= 20
3X1 + 3X2 < =30
X1 + X2 < =16
X1, X2 >= 0 .
CHAPTER THREE
DERIVATIVE IN USE.
dx dx dx 2
derivative is also a function, we can also find its derivative.
We call d [ f ' ' ( x )] / dx the third derivative function and write it as f ' ' ' ( x ) or f ( 3) ( x )
to indicate that this function is found by three successive operation of differentiation, starting
with the function f . Of course, this process may continue indefinitely and so we use the general
(n)
notation f ( x) to indicate the n th derivative of the function f .
Example: Find the first four derivatives of the function f ( x ) x 4 .
if the first two derivatives of a function exist, we say the function is twice differentiable.
Concentrating at first and second derivative leads us to the concept of concavity and convexity.
Consider the function f ( x ) x 2 for the domain x 0 . On x 0 , this function is upward slopping
and we can see that its slope increases as x increases. This means that the first derivative function
is increasing in x, and so the derivative of this, the function’s second derivative, must be positive
36 | P a g e
valued for all x 0 . Up on finding the derivatives, we get f ' ( x ) 2 x and f ' ' ( x) 2 . Thus the
second derivative is indeed positive for any value of x.
Figure 3.4: Function� � = �� (� ≥ �) and its Derivatives
�'' � = 2
2
� � �
Now consider the graph of the same function defined on x 0 . On this set of values the function
is negatively sloped.
f ' ( x) 2 x 0 on x 0 . The greater the value of x , the less steep is the curve in absolute values.
Figure 3.5: Function � � = �� ��� � < 0 and its derivatives
�'' � = 2
� � �
�' � = 2�
Defining this function on the domain R, we see that the second derivative is positive throughout.
A function with this shape, as determined by the second derivative being positive, is convex.
Definition: A twice differentiable function f (x) is convex if at all points on its
domain f ' ' ( x) 0 .According to the above definition, a linear function is convex. To exclude
linear functions, we come to the concept strict convexity. This is done by replacing the week
inequality with strict inequality (>).
Definition: A twice differentiable function f (x) is strictly convex if f ' ' ( x) 0 except possibly at
a single point.
Notice that the function f ( x ) x 4 has the second derivative f ' ' ( x ) 12 x 2 which is positive for
all x except x 0 where the second derivative becomes zero. This function is, however, strictly
37 | P a g e
convex, and hence the qualification in the above definition requirement f ' ' ( x) 0 except
possibly at one point.
We may have strictly convex functions with monotonic increasing, monotonic decreasing or
non-monotonic functions.
� � �
Definition: A twice differentiable function f (x) is concave if f ' ' ( x) 0 on all points of is
domain.
As for the case of convex functions, a linear function satisfies the definition of concavity
( f ' ' ( x ) 0 for all x ). To exclude it, we should redefine it as strict concavity.
Definition: A twice differentiable function f (x) is strictly concave if f ' ' ( x) 0 on all points of
its domain except possibly at a single point.
Alternatively, since multiplying a function by (-1) reverses the inequality, we could say that f(x)
is (strictly) concave if –f(x) is (strictly) convex.
A function whose second derivative is sometimes positive and sometimes negative is neither
convex nor concave everywhere. However, we can sometimes find intervals over which the
function is convex or concave.
1
Example: f ( x ) x 3 3 x 2 5 x 10 on x 0
3
Solution
f ' ( x) x 2 6 x 5 and
f ' ' ( x) 2 x 6
Thus, since f ' ' ( x) 0 for x 3 and f ' ' ( x) 0 for x 3 , the function is convex on [0,3]and
concave [3, + ].
−2 3
Activity: Show that the production function � � = 3
� + 10�2 + 5� has both a concave and a
convex section.
38 | P a g e
Note: Production functions are concave means additional labor units yield lesser and lesser
product and hence are the reflection of diminishing marginal product low. If production
functions are convex, it becomes difficult to explain it in economic background.
Example: A single input, x, is used to produce output y. show that if the production function is
1
y x 3 , x 0, then the cost function, c(y) is convex while the production function is concave.
Solution
1
Production function f ( x ) x 3
1 23
First derivative f ' ( x ) x
3
5
2 2
Second derivative f ' ' ( x) x 3 5 0.
9 9x 3
Thus, if x is greater than zero, so the production function is concave.
Take the short-run cost function
C ( x) c 0 rx in terms of fixed cost c0 and variable cost x needed to produce output y. But
1
since y x 3
x y 3 and we can write the cost function in terms of y.
C ( y ) c 0 ry 3
C ' ( y ) 3ry 2
C ' ' ( y ) 6ry 0 for y 0
And, so the cost function is convex.
Example Given , determine the slope of marginal cost at x=7.
Solution:
2
c’(x)=3x -18x, measures the slope of total cost curve and to find the slope of marginal cost
curve take the derivative of MC, which is the second order derivative of total cost function.
C”(x) =6X-18
C” (7) =6[7]-18=24 .
39 | P a g e
�(�)
Tangent Line
(�, � � )
�(�) � = �(�)
�
�
40 | P a g e
1 23
f ' ( x) x
3
1 2
f ' (1) (1) 3 1
3 3
f (1) 1
y f (1) f ' (1)( x 1)
f ( x) 1 1 ( x 1)
3
1 2
f ( x) x
3 3
If x 1.03
1
3
x 1 ( x 1)
3
1
1 (1.03 1)
3
1.01
But the actual value of the original function at x 1.03 is f ( x ) 3 x .
f (1.03) 3 1.03
f (1.03) 1.0099
f (1.03) 1.01
2) Find the linear approximation of a function
1
f ( x ) (1 3 x 1 x 2 ) 2
at x 0.
2 2
Answer: f ( x ) 1 3 x
4
3) Find the linear approximation value for (1.001) 50 .
It can be written as a function of x as;
f ( x) x 50 at x 1
To linearly approximate it we use the formula � � = � � + �' � (� − �)
� � = �50 ⇒ � 1 = 150
�' � = 50�49 ⇒ �' 1 = 50 149 = 50
� � = 1 + 50 � − 1 = 50� − 49
Evaluating this result around � = 1 such as x = 0.001, we get a linear estimation of
� � = 50� − 49
� 1.001 = 50 1.001 − 49 = 1.05.
But, the actual value or the correct value of � � = �50 at x = 1 is 1.0512.
3.2.3. The Differential Function
41 | P a g e
Given � = � � , then we know that �� �� = �' � . It follows that �� = �' � �� −
this is the differential of the function �(�). It measures the change in the value of Y resulting
from a small change in the value of �.
Let’s compare �� ��� ∆�.
�� and �� are proportional to each other since �� = �' � ��.
Let � changes from � → � + ��, �ℎ�� ∆� = � � + �� − � � .
Given the linear Approximation formula � � ≅ � � + �' � � − � x close to a ,
Substituting a by x and x by x + dx, we have
� � + �� = � � + �' � (� + �� − �)
� � + �� − � � = �' � ��.
∆� ��
.Therefore, in linear approximation, we approximate (estimate) ∆� by ��.
�
Q
��
�
� � + ��
When � changes by ��, the actual change in � is given by ∆� (movement along the curve). But
when we consider a tangent or linear approximation or if we approximate the function by its
tangent at point �, the change in the value of � become �� (movement along the tangent line
from � to �.
�� ≅ ∆�
The line segment �� measures the distance that shows the difference between ∆� and �� (error
term). The error term tends to be zero as �� → 0.
Rules of Differentials
All the rules of differentiation can be applied in the case of differentials.
Given two differential functions (� and �) the following rules hold true:
1) � � ∓ � = �� ∓ ��
2) � �� + �� = ��� + ���
3) � �� = ��� + ���
42 | P a g e
� ���−���
4) � � = �2
��� � ≠ 0.
All other rules of differentiations, discussed in your Calculus for Economists course you have
taken earlier, are all applicable in this context too.
Example:
Given � = (2� − 5)2 , then
�� = �' � ��
�' � = 2 2� − 5 2 = 4 2� − 5
�� = 4 2� − 5 ��.
3.2.4. Polynomial Approximation
Approximation by a linear function may be insufficiently accurate, so it is natural to try quadratic
approximation or approximation by polynomial of higher order.
1. Quadratic Approximation
We have a general quadratic formula
� � = � + � � − � + �(� − �)2 (��� � ����� �� �)
In the above equations we have three unknown coefficients A, B and C. We impose three
conditions on the polynomial to determine the value of the three unknown variables.
Assume at � = �,
� � =� �
�' � = �' �
�'' � = �'' (�)
Given � � = � + � � − � + � � − � 2
�' � = � + 2�(� − �)
�'' � = 2�
When � = �,
� � =�
�' � = �
1
�'' � = 2� ⇒ � = �'' (�)
2
Hence, the quadratic approximation of �(�) is given by
� � ≅� � =�+� �−� +� �−� 2
1
= � � + �' � � − � + �'' (�)(� − �)2
2
This is just similar to the linear approximation except one additional term
1 ''
� (�)(� − �)2
2
Examples
Find the quadratic approximation to � � = 3 � about� = 1.
43 | P a g e
Solution
1
� � =� 3
� 1 =1
1 −2
�' 1 = � 3 = 1 3
3
−5
�'' 1 = −2 9 � 3
= −2 9
Then, the quadratic approximation will be
1 1
� � = 1 + � − 1 − (� − 1)2
3 9
3
If � = 1.03, then 1.03 is quadratically approximated to be equal to
3 1 1
1.03 = 1 + 1.03 − 1 − (1.03 − 1)2 = 1.0099 which is nearer to the exact value than the
3 9
previous linear approximation.
Higher Order Approximation
Functions in higher order derivatives can be better approximated near a point by using
polynomial of higher degree.
Suppose we want to approximate a function � � around � = � of the form
� � = �0 + �1 � − � + �2(� − �)2 + �3 (� − �)3 + … + �� (� − �)�
�+1 ������������
44 | P a g e
� � � = � � − 1 � − 2 …. 2 1 �� = �! �� .
Therefore,
�(�)
� � = 0! �0 = � � ⇒ �0 =
0!
�'(�)
�' � = 1! �1 = �' (�) ⇒ �1 =
1!
�''(�)
�'' � = 2! �2 = �'' � ⇒ �2 =
2!
�'''(�)
�''' � = 3! �3 = �''' � ⇒ �3 =
3!
Type equation here.
�(�) (�)
.� � � = �! �� = �(�) � ⇒ �� = �!
Substituting the values on the coefficients we get the following approximation to � � �� � =
� by a nth degree polynomial.
�'' 0 = −1 4
�''' 0 = 3 8
'' � �−� 3
�' � � − � �' ' � � − � 2 �'
� � =� � + +
1! 2! 3!
1 1
− 4 2 3
=1+ �+ � + 8 �3
2 2 6
1 1 2 1
Therefore, � � = 1 + � − � + �3
2 8 16
3.2.5. Estimation of Functions (Mclaurin and Taylor series.)
This is expansion of a function �(�)by Mclaurin and Taylor series.
45 | P a g e
� � = �0 + �1 � + �2 �2 + �3 �3 + … + �� ��
Successive differentiation of a function yields:
�' (�) = �1 + 2�2 � + 3�3 �2 + … + ��� ��−1
�'' (�) = 2�2 + 6�3 � + … + �(� − 1)�� ��−2
�'' '(�) = 6�3 + … + � � − 1 (� − 2)�� ��−3
�(�) (�) = � � − 1 � − 2 � − 3 … 3 2 (1)��
Evaluating the Maclaurin expansion around � = 0 yields
�(0)
� 0 = �0 = 0! �0 ⇒ �0 =
0!
�'(0)
�' 0 = �1 = 1! �1 ⇒ �1 =
1!
�''(0)
�'' 0 = �2 = 2! �2 ⇒ �2 =
2!
�'''(0)
�''' 0 = �3 = 3! �3 ⇒ �3 =
3!
(�)
� (0)
� � 0 = �! �� ⇒ �� =
�!
Therefore, the above polynomial function can be rewritten as
� 0 �' 0 �' ' 0 2 �' '' 0 3 �(�) 0 �
� � = + �+ � + � +…+ �
0! 1! 2! 3! �!
This power series representation is called Mclaurin Series.
Note: Expansion using Mclaurin series yields exactly the same function as the original one.
Example 1: Find the Mclaurin series for the following function
� � = 8 + 36� + 12�2 + �3
��������
Given the above function, evaluating it around � = 0 based on Mclaurinseries,
we expand until the third derivative, because the highest power is 3.
� 0 =8
� 0 = 36 + 24 0 + 3 0 2 = 36
'
�'' 0 = 24 + 6 0 = 24
�''' 0 = 6
In the Maclaurin series
� 0 �' 0 �' ' 0 2 �' '' 0 3
� � = + �+ � + �
0! 1! 2! 3!
8 36 24 6
= + � + �2 + �3
0! 1! 2! 3!
� � = 8 + 36� + 12� + �3 .
2
Taylor’s Series
This is expansion of a function around � = �0 . Suppose now we want to expand an nth degree
polynomial at some arbitrary point � = �0 .
46 | P a g e
� � = �0 + �1 � + �2 �2 + �3 �3 + … + �� ��
Rewriting the function in terms of the variable (� − �0 ), we get
� � = �0 + �1 (� − �0 ) + �2 � − �0 2 + �3 � − �0 3 + … + �� (� − �0 )�
Let’s find the successive derivatives as follows
�' � = �1 + 2�2 (� − �0 ) + 3�3 � − �0 2 + … + ��� (� − �0 )�−1
�'' � = 2�2 + 6�3 (� − �0 ) + … + �(� − 1)�� (� − �0 )�−2
�''' � = 6�3 + … + � � − 1 (� − 2)�� (� − �0 )�−3
.
�
� � = � � − 1 � − 2 …. (3) 2 1 ��
�(�)
�(�)
� �
� �
�
� � �
The theorem implies that the graph of the continuous function �(�) must intersect the line � =
� = �(�) at least at one point (�, �).
An important result of the theorem is let �(�) be a function continuous in (�, �) and assume that
�(�) and �(�) have different signs, then there is at least one � ∈ �, � such that � � = 0.
Newton’s Method
The intermediate value theorem shows that a given equation �(�) has a solution in a given
interval; however, it doesn’t provide additional information about the location of the solution.
48 | P a g e
�(�)
�(�0) (�0 , � �0 )
�0 �1 �
49 | P a g e
First of all, for an intermediate solution we need to equate � � = 0 so that our reformulated
equation is written as
� � = �15 − 2 = 0
Besides, we have to try manually to find the nearest integer that makes our function to be
approximately equal to zero and that integer is 1 so that our �0 = 1.
Finding and evaluating the function and its first derivative at the initial vale of � (�0 = 1), we
get:
� 1 = �15 − 2 = 115 − 2 =− 1
�' � = 15�14 ⟹ �' 1 = 15(1)14 = 15
Hence, we find for �1 using the Newton’s method formula once:
� �0 −1 16
�1 = �0 − =1− =
�' �0 15 15
16
Finding and evaluating the function and its first derivative at (�1 = 15 ), we get:
16 15
16 15
� =� −2= − 2 = 0.633
15 15
16 16
�' � = 15�14 ⟹ �' = 15( )14 = 37.025
15 15
Once we find �1 it is easier to find �2 using Newton’s method formula twice as follows:
� �1 16 0.633
�2 = �1 − = − = 1.05
�' �1 15 37.025
Therefore, the approximate value of � in the equation �15 = 2 using Newton’s method twice is
found to be equal to 1.05 to the hundredth digit.
3.3. Multivariate Calculus
Calculus for Multivariate Functions
Though functions of singe variable have large simplicity, they are not commonly used in
economics. They are not common since the functions that the real economic world consisted of
are multivariate. For example, our rational consumer will not consume one commodity only;
rather, s/he consumes different combinations to maximize her/his utility. The representative
production function of a firm is not only a function of labor but it is also a function of capital.
The overall income of nations is not a function of physical capital stocks only. It is also a
function of human capital, factor productivity, institutions, and government policies. Hence, we
have to extend the concepts of univariate calculus to multivariate ones. For the time being, let’s
begin with a function of two independent variables so that we can simply extend it to
n-variables case.
Let’s take the Cobb Douglas production function of a hypothetical firm.
Where:
Y-is the level of output
A-is the state of technology, which reflects productivity of factors
50 | P a g e
L & K represent the units of labor and capital
are elasticity of y with respect to labor and capital respectively
Example
b. To check for returns to scale, add the exponents of labor and capital. If the sum is
greater than one it is increasing returns to scale, less than one it is decreasing returns to scale and
if it is equal to one it is constant returns to scale. Since 3/4 +1/4=1, it is constant returns to scale.
For the proof, you can see the following clarification.
Proof: For a general production function, when we change all the inputs by a given proportion if
output changes by equal proportion, then the production function illustrates constant returns to
scale. If it changes by greater/smaller proportion, the production function illustrates
increasing/decreasing returns to scale respectively.
α α β β
=Aλ L λ K
α+β
=A(λ )
α+β
=(λ )A
α+β
=(λ )Y0
If α+β >1, it shows increasing returns to scale, if α+β <1, it shows decreasing returns to scale,
and if α+β =1, it is constant returns to scale.
51 | P a g e
3.3.1. Partial Differentiation
The format � = �(�) represents a function of one variable. Given � = �(�, �) , which is a
function of two variables.
First order partial derivatives are given as:
��
�� = = �� (x, y)
��
��
�� = = �� (x, y)
��
Second order partial derivatives
We have two types of second order partial derivatives
1. Direct Second Order Partial Derivatives
���
��� = = ��� (�, �)
��
���
��� = = ��� (�, �)
��
2. Mixed (Cross) Second Order Partial Derivatives
���
��� = = ��� (�, �)
��
���
��� = = ��� (�, �)
��
Young’s Theorem
The mixed (cross) partials for a given function will always be equal if both cross-partials exist
and are continuous.
Given a function �(�1 , �2 ), then young’s theorem asserts that �12 = �21 or alternatively can be
written as ��1�2 = ��2�1 .
Example: Given a function � = � �1 , �2 = �1 ���2, show that Young’s theorem is true.
Solution
The first task will be to find the first and second derivatives of the functions one by one and
secondly to check the equality of the second mixed partials.
�1
�1 = ���2 �2 =
�2
− �1
�11 = 0 �22 = 2
�2
−1 1
�12 = �21 =
�2 �2
So, as Young’s theorem predicts that �12 = �21 and our result also confirmed that is the
case.Activity 1: Given the production function is given by � = � �, � = �� �� . Prove that
Young’s theorem is true.
2
Activity : For the function� = � �1, �2 = �1 ��1+�1 , show that�12 = �21 .
52 | P a g e
3.3.2The Multivariate Chain Rule
Many economic models involve composite functions. Differentiations of composite functions
require the application of chain rule.
��
(i) For functions of one variable � = � � and � = � � then �� is given by
�� �� ��
= .
�� �� ��
53 | P a g e
�� 2��
� ��, ��, �� = +
�� 3��
� � 2�
= + = �0 � �, �, �
� � 3�
Therefore, the function is homogeneous of degree zero.
Example 2: Consider the Cobb-Douglas production function � = � �, � = �� �1−� .
� ��, �� = �� � �� 1−�
= �� ���1−� �1−�
= ��+1−� �� �1−�
= �1 ���1−�
= �1 �(�, �)
Therefore, this function is homogeneous of degree 1. Such functions are also called linear
homogeneous functions.
Activity: Check the homogeneity of � �, � = 3�2 � − �3 .
In economics, the degree of homogeneity of a function has one important implication. That is,
given � ��1 , ��2 , …, ��� = �� � �1 , �2 , …, ��
If � = 1 ⇒ Constant returns to scale
If � > 1 ⇒ Increasing returns to scale
If � < 1 ⇒ Decreasing returns to scale
Euler’s Theorem
Euler’s theorem states that if a homogeneous function has a degree r, then
�� �� ��
. �1 + . �2 + … + � = ��(�1 , �2 , …, �� )
��1 ��2 ��� �
Where, � measures the degree of homogeneity.
Example:
Suppose � �, � = �4 + �2 �2 , then check the Euler’s theorem.
Solution:
According to Euler’s theorem �� . � + �� . � is equal to some value (r) times the original function
and the question is what is this value.
��
= �� = 4�3 + 2��2
��
��
= �� = 2�2�
��
Then, �� . � + �� . � = 4�3 + 2��2 � + 2�2 � �
= 4�3 + 2�2 �2 + 2�2�2
= 4�3 + 4�2 �2
= 4 �3 + �2 �2
= 4(� �, � )
54 | P a g e
Hence, since the function is homogeneous of degree one, we call it linear homogeneous function
CHAPTER FOUR
UNCONSTRAINED OPTIMIZATION
4.1. Functions with one variable
Given any function Yas a function of x= f(x), then the derivative version of
the function (First Order Condition or FOC) , as described earlier, is stated in
the form dy/dx , y’ or f ’(x) is the instantaneous rate of change of f (X) which
is given by:
55 | P a g e
In general, if a function f has relative maxima at if there exists an interval around x0 on
Hence, we can find relative maxima and minima for a function by finding the values of x for
which is undefined. Once we get the critical value, we can use the behavior
of f ‘(x) near the extreme point to know whether it is a relative minima or relative maxima point.
a. If to the left and to the right of the critical value of f(x), the critical
point is relative maximum
b. If to the left and to the right of the critical value of f(x), the critical point is relative
minima.
Example:
Solution
56 | P a g e
=x -
=-7
Thus, the critical values are ( . To find out which one relative minima or
maxima, use the number line.
b.
57 | P a g e
Critical values are (1,4) and (-1,8).
+ 0 - - 0 +
-1 1
a.
b.
c. .
is relative maxima
is relative minima.
b.
58 | P a g e
C. ,
Absolute Extrema
Absolute extreme value: is an extreme value that the function attains throughout its
domain. The possible largest value that the function attains in its domain is called absolute
maxima of the function. Similarly, the smallest possible value that the function attains
throughout its domain is known as the absolute minima of the function. Note that a function may
(or may not) have absolute extreme value like there may or may not exist relative extreme value.
For example consider a function y=2x+1. As we increase the value of x, the value of y increases
continuously. Again, when we decrease the value of x, the value of y also decreases
continuously. However, if the domain is restricted to a closed interval [a, b] which means and
the function is continuous everywhere within the interval, the function will have an absolute
minima and absolute maxima within the interval. They may be located either:
a. Find the critical values and critical points and check whether they are within the
interval
b. Calculate f(a) and f(b)
c. The largest value and smallest value obtained from the above steps represent the
absolute maxima and the absolute minima of the function over the interval
respectively.
Example
1. Find the absolute maxima and minima of over the interval
Solution
f’(x)=0
59 | P a g e
There is absolute maxima at x=0 and absolute minima at x=3 within [0, 5].
[-4, 2]
Solution
Absolute maxima at x=2 (2, 24) and absolute minima at x=-4 (-4, -12)
Dear student! You have studied optimization of a function in section 4.1 in which case the
discussion was restricted to a function with a single explanatory variable. But, economic
phenomena in reality involve the study and analysis of large number of variables simultaneously.
Therefore, in this section we extend the discussion of optimization to a function with more than a
single explanatory variable.
4.2.1. First and Second Order Conditions for a Function with Two Explanatory Variables
This section is a simple extension to your study on optimization of a function with a single
variable. You will begin your study with the first order necessary condition for optimization of a
function of two explanatory variables and then discuss the second order sufficient condition.
Before going into the discussion, you should note that a maximum and a minimum in this section
refer to a relative maximum and a relative minimum.
60 | P a g e
Consider the function z f ( x, y ) . In this case, the first order necessary condition for an
extremum (either a maximum or a minimum) requires the first partial derivatives of the
function z f ( x, y ) equal to zero simultaneously.
This involves finding the values of x and y so that the first partial derivatives are zero at same
z z
time. That is f x f y 0 or 0.
x y
Can you make a conclusion based upon this first order condition for an extremum? Or, is it
possible for you to conclude whether a particular point is a maximum or a minimum on the basis
of the first order condition? The answer is no because of the presence of points which satisfy the
first order condition but neither a maximum nor a minimum. One good example is the case of an
inflection point. Therefore, similar to what you have studied in the case of a single variable
optimization, the first order condition is only a necessary but not a sufficient condition. As a
result, you need to make use of additional criteria. With this comes the second order test.
Let ( x0 , y 0 ) be the critical values of the function (i.e., where f x f y 0 ) and the second partial
derivatives are evaluated at these points.
61 | P a g e
First order necessary condition fx fy 0 fx fy 0
Examples
a) Find the critical values of the function z x 2 2 y 2 and determine whether they are maximum
or minimum.
Solution
Using the first order condition, you can find the critical values. Thus, you need to find the first
partial derivatives, f x and f y .
z
fx 2x
x
z
fy 4y
y
Then, find the values of x and y (the critical or optimal values) which satisfy the
condition f x f y 0 . Hence,
Now, you go to the second order test. Here, you need to find the second order partial derivatives.
2z
f xx 2
x 2
2z
f yy 2 4
x
62 | P a g e
2z
f xy 0
x 2
Evaluating the second partial derivatives at the critical value, you have
f xx (0,0) 2
f yy (0,0) 4
f xy (0,0) 0
2
Therefore, f xx 0 and f yy 0 , and f xx f yy f xy . Hence, you conclude that the critical
value (0,0) is a relative minimum.
b) Given the function z 2 x 2 xy 3 y 2 , find the stationary values and examine whether they
are maximum or minimum.
Solution
f x 4x y
f y 6y x
4x y 0
6y x 0
Solving the above equations simultaneously, you get x 0 and y 0 and hence, the critical value
is (0,0) .
f xx 4
f yy 6
63 | P a g e
f xy 1
2 2
Since f xx 0 and f yy 0 , and f xx f yy f xy (because f xx f yy 24 and f xy 1 ), the critical
value (0,0) is a relative minimum.
A form of a polynomial expression in which each component term has a unique degree
g= au2 + 2h uv + bv2
Different cases
We can check whether the function (g) is definite or indefinite using determinant test –
determinant test for sign definiteness.
g = a u2 + 2hu+bv2
= au2+huv+ bv2
= u(au+hv) + v(hu+bv)
Note for two matrices to be multiplied, conformability condition must hold true Thus,
64 | P a g e
a h u
u v 1x 2 h b v
2 x 2 2 x1
u
ua vh uh vb
v
u 2 huv uhv bv 2
a h
h b
The first principal minor of Hessian (H1) is the matrix which contains the first one principal
diagonal element; the second principal minor of Hessian (H2) is a matrix which contains the first
two principal diagonal elements, etc.
a h f xx f xy
h b f yx f yy
H 1 f xx f xx
f xx f xy
H2 f xx f yy f xy , f yx
f yx f yy
Notes
1. the second order differential d2z>0 iff /H1/ >o, /H2/ >o, it is positive definite and shows
the minimum point
2. d2z < o, iff /H1/ <o, but /H2/ >o, - negative definite and it refers maximum point
3. d2z is indefinite iff /H2/ <o
Example
Optimize the following function by using the Hessian determinant and state sign definiteness.
Find the maximum or maximum value.
Solution
F.O.C : Zx =12x-9-3y =0
Zy = -3x -7 +10y = 0
65 | P a g e
x=1,y=1
S.O.C : Zxx = 12
Zxy = Zyx = -3
Zyy = 10
Zxx Zxy 12 3
H
Zyy Zyy 3 10
H 1 Zxx 12 o
Zxx Zxy
H2 Zxx Z Zxy , Zyx
Zyx Zyy
12 x10 (3 x 3)
120 9
111 0
Since both /H1/and /H2/are greater than Zero/ the function is at minimum point.
This approach can also be applied to more than three choice variables. Given Z = f(X1, X2, X3),
f ll f 12 f 13
H f 21 f 22 f 23
f 31 f 32 f 33
H 1 f 11 f 11
then the form of Hessian Determinant will be: .
H 2 f 11 f 22 f 21 f 12
f 11 f 12 f 13
H H 3 f 21 f 22 f 23
f 31 f 32 f 33
f 11 f 12 f 13 .......... f 1 n
f 21 f 22 f 23 ...... f 2 n
H
f 31 f 32 f 33 ....... f 3n
f n1 f n2 f n 3 ....... f nn
d 2 z -is positive definite iff, /H1/>0, /H2/ > 0 ,/H3/>0 …. /Hn/= /H/
d 2 z -is negative definite iff, /H1/<0, /H2/ >0,/H3/ <0 …./Hn/ = /H/
Example
A firm produces three types of products Q1, Q2, and Q3. And a profit function is given by:
Try to optimize the function and find the critical values which make the function at maximum or
minimum point and also the maximum or minimum value.
Solution
67 | P a g e
8 3 2 Q1 180
A= 3 10 2 Q 200
2
2 2 8 Q3 150
Here if is difficult to use simultaneous method to solve the critical values. Thus, instead we use
Cramer’s rule.
68 | P a g e
3rd _ find the determinant of coefficient matrix,
8 3 2 Q1 180
A 3 10 2 Q 2 200
2 2 8 Q3 150
A 880 4 324 4 26 20
876 60 28
608 88
520 0
180 3 2
A1
Q1 , A1 200 10 2
A
150 2 8
A1 180 (80 4) 3(1600 300) 2 (400 1500)
180 (76) 3900 2200
13,680 6100
7580
A1 7580
So Q1 14.58
A 520
8 180 2
A1
Similarily , Q2 , A2 3 200 2
A
2 150 8
A2 6900
A2 6900
Q2 13.27
A 520
8 3 180
A
Q3 , A3 3 10 200
A
2 2 150
A3 6130
A2 6900
Q2 13.27
A 520
8 3 180
A. / A3 /
Q3 , A3 3 10 200
A
2 2 150
69 | P a g e
A3 6130
A3 6130
Q2 11 .79
A 520
Therefore the optimal values that optimize the profit function are
11 8 12 3 , 13 2
21 3 22 10, 23 2
31 2, 32 2 , 33 8
11 12 13
H1 21 22 23
31 32 33
8 3 2
3 10 2
2 2 8
H1 8 8 0
8 3
H2 71 0
3 10
8 3 2
H 3 H 3 10 2
2 2 8
H 3 H A 520 0
Since /H1/ <0, /H2/>0 and /H3/ <0, we can conclude that profit is at maximum when evaluated at
optimal values & Q1 = 14.58 Q2 = 13.27 and Q3 = 11.79 and implies negative definiteness.
The maximum profit is obtained by substituting the optimal values in the original profit function
(π)
π=180(14.58)+200(13.27)+150(11.79)-3(14.58x13.27)-2(13.27x11.79) 2(14.58x11.79)-
4(14.58)2 -5 (13.27)2 -4(11.79)2
70 | P a g e
=2624.4+2654+1768.5-580.43-312.91-343.80-850.31-880.46-556.02
=7046.90-3523.93
=3522.97.
Let’s consider the effect of change in one parameter on the optimal solution of a given equation
Y = f(x1 x2, α) where x1 x2 are choice variables and α is the parameter.
The FOC (f1= 0, f2 = 0) yields critical values for x1 and x2 which result in the stationary value of
y. But this is possible if and only if x is fixed.
Thus, the critical values of the choice variables are functions of α are x1* =f(α) and
x2* = f (α). i.e. if we change the value of α we will find different optimal values for x1 and x2 .So
the objective function is also a function of parameter.
Thus, л (x) = f(x1*(α), x2*(α), α) be the indirect objective function. This function shows the effect
of α on the objective function Y.
For instance, if α changes to αo, we may have different optimal values x1 = x10 and
x2 = x20 if x changes. And we maximize f(direct objective function) using x10 , x20 and α0.
Similarly, if α changes from α0 to α1, we will have another new equilibrium(Optimal) values for
x1 and x2 , x1 = x11 and x2 = x21 . The л (α) is a function which will be tangent to f(x1,x2, α) when
x1 and x2 are used optimally for a given α. For a given α if f is not optimized, the tangency will
not occur. Thus, we have a single point where f is equal to л and; hence, the inequality f(x) <= л
(α) .This implies л (α) is an envelope of the function f(x) at different values of α.
71 | P a g e
This theorem is used to determine the change in the optimum value of an optimum function due
to change in the value of a parameter.
Therefore: - The rate of change of the optimum value of Z* as d changes is equal to the
partial derivative of the objective function with respect d.
o The above result is known as the envelop theorem.
Example1: Determine the effect of an increase in the value of on the optimum value of the
function Y 2 x 2 d x a 2
Foc:-
dy
4x 0 , x* a
dx 4
dy
4 0 .................. y is minimum when x a
dx
2 4
y* 2 a 4 2
a a 4 a2 2.
a2
16
a2
4
a2
7a 2
8
dy * dy * 7a
da da 4
dy * dy a 7
Or / x x a x 2a 2a a
da da 4 4
7a
Y * increase by units as a increases.
4
Example 2 : A firm producing a certain product Q wants to maximize its profit. Suppose a tax, t
birr per unit is imposed on the function of α.
72 | P a g e
A. optimize the profit function
Solution
= R(Q) - C(Q) – tQ = O
FOC
Л’(Q)=R’(Q)-C(Q)-t=0
= MR – MC – t = 0
t= MR – MC
SOC
d * ' dQ * dQ * dQ *
R Q * C ' Q * Q * t
dt dt dt dt
( R ' (Q*) C ' (Q*))dQ * / dt Q * tdQ * / dt
Q * Q *
t t
t t
Q *
73 | P a g e
CHAPTER FIVE
5. CONSTRAINED OPTIMIZATION
5.1 Constrained optimization with equality constraint:
Functions of Two Variables and Equality Constraints:
5.1.1 Techniques of Constrained Optimization
The Techniques of constrained optimization presented in this chapter are built on the method for
identifying the stationary point in unconstrained optimization.
a) Substitution Method
One method in constrained optimization is to substitute equality constraints in to an objective
function. This is to convert a constrained optimization in to unconstrained optimization problem
by internalizing the constraint / constraints directly in to the objective function.
The constraint is internalized when we express it as a function of one of the arguments of the
objective function and then substituting that argument by using the constraint. We can then solve
the internalized objective function by using the unconstrained optimization technique. Such a
technique of solving constrained optimization problems is called the substitution method.
In order to develop a procedure for the determination of constrained extrema of a function, we
consider the maximization of the utility function, U xy , of a consumer subject to the budget
constraint, 400 20 x 25 y .
Given the budget constraint 400 20 x 25 y , we note that now x and y cannot take values
independently of each other. For example, if x 5 ,then the value of y must be 12 etc.
One way of tackling this problem is to eliminate one of the independent variables, x or y, from
the utility function, by the use of given constraint. Then, U can be maximized or minimized in
the usual unconstrained manner.
To eliminate y (say) from the utility function, we first solve the constraint for y. on adjusting the
4
terms the constraint can be written as y 16 x . substituting this value of y in the utility
5
4 4
function, U xy we have U x 16 x 16 x x 2
5 5
dU 8
Thus, 16 x 0 (for maxima) x 10.
dx 5
74 | P a g e
attempt will be made to introduce you with the various techniques of constrained optimization,
including the substitution method.
Example 1
Let us consider a consumer with a simple utility function
U x1 x2 2x1
Subject to budget constraint
4 x1 2 x2 60
The constraint can be rewritten as
x2 30 2 x1 .......... .......... .......... .......... .......... .......... .......... ..(1)
Substituting (1) in to an objective function
U x1 , x2 x1 30 2 x1 2 x1
32 x1 2 x12 2 x1 ……………..Internalize objective function
Optimize the internalized objective function.
Solution
First Order Conditions [FOC]:
U U
0 and 0
x1 x2
32 4 x1 0
x1* 8
75 | P a g e
Solve for the internalized objective function. Proceed through the first and second order
conditions.
First order condition:-
1 1
f ' x1 0 Solving for x1 yields x1* 4
x1 20 4 x1
Thus, the maximum value for the function f ( x, y ) subject to the constraint
4 x1 x2 20 is 2 4 1 4 5
2
b). The Lagrange Multiplier Method
When the constraint is a complicated function or when there are several constraints under
consideration, the substitution method could become very cumbersome. This has led to the
development of another simpler method of finding extrema of a function, the lagrangean method.
This method involves forming a lagrangean function that includes the objective function, the
constraint function and the variable, , called the Lagrange multiplier. The essence of this
method is to convert a constrained extremum problem in to a form such that the first order
conditions of the unconstrained optimization problem can still be applied.
We may note here that the necessary condition, obtained above under the substitution method,
can also be obtained from an auxiliary function, to be termed as Lagrange Function. This
function is formed by the used of objective function and the constraints. The Lagrange function
corresponding to the objective function Z f ( x, y ) and the constraint g ( x, y ) 0 , can be
written as
L f x , y g x , y , where is an undetermined multiplier known as
Lagrange Multiplier.
We note that, since g ( x, y ) 0 , the value of L at a point is same as the value of Z f ( x, y ) at
that point. Thus, the exterema of Z f ( x, y ) and V occurs at the same point. The necessary
conditions [first order conditions] for exterma of L are:
76 | P a g e
L
f x g x 0 ………………………………….. (1)
x
L
f y g y 0 …………………………………… (2)
y
L
g x , y 0 ……………………………………… (3)
The simultaneous solutions of (1), (2) and (3) gives the stationary (or critical or equilibrium)
point.
Note that the stationary point obtained above will satisfy equation (3), i.e. the constraint. Thus,
the unconstrained extrema of L is equivalent to the extrema of z f ( x, y ) subject to the
constraint g ( x, y ) 0 .
On eliminating from (1) and (2), we get
fx fy
………………………………………………. (4)
gx gy
This is same condition as obtained earlier. Equation (3) and (4) can be simultaneously solved to
get the coordinates of the stationary point.
c). Total Differential Approach
While we discuss un constrained optimization of Z = f (x , y) , the Foc may be stated in terms of
total differential.
d z fx dx f y d y 0
This statement remains valid after a constraint g ( x, y ) c .
d c d g x , y 0
d g x , y g x d x gy dy 0
gx
dy . dx
gy
gx
Substituting d y . d x in to the objective function
gy
g
d z fx dx x f y dx 0
g
y
gx
fx dx . f y . dx
gy
gx 0
fx fy fx g
Or x is the first order condition g y 0
gx gy fy gy
77 | P a g e
Note: This total differential approach yields the same result (first order condition) as of the
lagrangian method.
L x , y , f x , y c g x , y
L fx
fx gx 0 , fx gx ,
x gx
L fy
fy gy 0 ,
y gy
fx fy
gx gy
g x * c . Y * c g x * c . Y * c
. dx * c d y * c dc
x y
g x * c . Y * c dx * c g x * c .Y * c dy * c dc
. .
x dc y dc dc
Also, totally differentiate the objective function
f * x * c , y * c f * x * c . y * c
d f x * c , y * c . d x * c . d y * c
x y
78 | P a g e
d f x * c . y * c f x * c . y * c d x * c f x * c . y * c d y * c
Equ (1) . .
dc x dc y dc
i.e the Lagrange multiplier represents the change in the optimum value of the objective
function with a small change in the constraint.
Generalization:-
Langrange Function with Multiple Constraints
We can solve the solution to a constrained optimization problem with more than one constraint
by using the lagrangian function with a Lagrange multipliers corresponding to each constraint.
Let an “n” variable function be subject, simultaneously, to the two constraints
g x1 , x2 , ..... xn C and h x1 , x2 , ...... xn d
Then adopting and as the two un constrained multipliers, we may conduct the Lagrange
function as:-
L x1 , x2 , ..... xn , , f x1 , .... xn c g x1 , ... xn d h x1 , ... xn
L
Foc :- C g x1 , .... xn 0
L
fi gi hi 0 i 1 , 2 , ...... h
xi
L
d h x1 , x2 ..... 0
The extension of this model to the case of more than two arguments in the objective function
is straight for ward. In this the lagrangean function takes the form
79 | P a g e
L x1 , x2 , ....... xn , f x1 , x2 , ..... xn C g x1 , x2 .... xn
L
0 for i 1 , 2 , ................ n
xi
L
C g x1 , x2 , ....... xn 0 , g x1 , ............ xn C
The Lagrangian function for a constrained optimization problem with an objective function of
“n” variable subject to m equality constraints is written as:-
L c
m
L x1 , ... xn f x1 , x2 , .... xn i i g i x1 , ... xn
i 1
80 | P a g e
1 1
x1 x2 6
4 2
1 1
x1 8 6
4 2
x1 * 8
Second Order Condition:
2 2
U ' ' x2 2
24 2 x2 2
x2
Substituting x 2 * 8 for x2 in the second order derivative we will get the following.
2 2
0 Therefore, the optimal amount of salad
24 2 82 64
(x1) and soup (x2) to be purchased is 8 and 8 ounces, and the maximum satisfaction from
consuming these is
1 1
U 8 , 8 In 8 In 8 1.56
4 2
Graphically:-
x1 C
T
B
24 A
U 2.29
T U 1.56
8
U 1.04
81 | P a g e
We know that, the consumer will maximum his utility at a point where the budget line is tangent
to the highest possible indifference curve, where
P1 x1 P1 M U x2
P2 x2 P2 M U x1
P1 1 1 1
, MUx1 and M U x2
P2 2 4 x1 2 x2
1
M U x2 2 x1 2 x2 x2
M U x1 1 4 x1 2 x1
4 x2
1 x2
Then at Po int T x1 12 , x 2 6
2 2 x1
1 1
2 2
Or at the optimal point, any decrease in utility due to a small reduction in the consumption of one
good would be just matched by the increase in utility due to an increase in constraint of the other
good.
If salad constraint increase by d x2, then the utility decrease by
M U x2 . x2 d x2 . 1 1 d x2 at x2 6
2 x2 2
When evaluated at optimal point where MRS [the marginal rate of substitution] is -2, the
1
reduction in salad allows the constraint of soup to increase by d x1 because
2
d x1
2
d x2
d x1
And, utility rise by M U x1 . M U x1 . 2 d x2
2
1 1
. 2 d x2 d x2 at x1 * 12
2 x1 12
From the first order conditions, after some rearrangement we can have,
M U x2 . x2 M U x1 . x1 i.e., du 0
We may note here that when the constraint is a complicated function, or when there are several
equality constraints to be considered, the substitution method might be difficult or simply
impossible to carry out in practice.
82 | P a g e
In that case, you need to use other techniques. This has led to the development of alternative
methods of finding extrema of a function under constraints.
One such method is the method of the Lagrange Multiplier which will be discussed in the
following sub section. For now, study the following examples that demonstrate the use of the
substitution technique to solve constrained optimization.
d2 z dz dx dz dy
x y
f x dx f y dy dx f x dx f y dy dy
x y
f xx dx dy
f y f y . dy dx f xy dx dy . f y f y dy
x x y y
83 | P a g e
dy dy
f xx dx 2 2 f xy dx dy f y y dy 2 f y dx dy
x y
dy dy
But : dx dy d dy
x y
Then :- d2 Z = fxx dx2 + 2 fxy dx dy + fyy dy2 + fy d (dy)
= fxx dx2 + 2 fxy dx dy + fyy dy2 + fy d2 y ………… (2)
Here : d2 y dy2 , thus the presence of d2y in the above equation d is qualifies d2 z as a
quadratic form.
However, d2 z can be transformed in to a quadratic form by virtue of the constraint g (x , y) = c.
d2 g = gxx dx2 + 2 gxy dx dy + gyy dy2 + gy dy2 = 0
g xx dx 2 2 g xy dx dy g y y dy 2
2
dy ………………… (3)
gy
Substituting equation (2) in to equation (3):-
g g xy g yy 2
d 2 z f xx dx 2 2 f xy dx dy f yy dy 2 xx dx 2 2 . f y dxdy . f y dy
gy gy gy
fy fy fy
d 2 z f xx f xx dx 2 2 f xy g xy dx dy f y y g yy dy 2
gy gy gy
But, f y / g y
d2 z d2 z
And Z xx f xx g xx / Z xy f xy g xy
dx 2 dx dy
d2 z
Z yy f yy g yy
dy 2
d 2 z Z xx dx 2 Z xy dx dy Z yx dy dx Z yy dy 2
…………………… (4)
84 | P a g e
We note that d 2 y is different from dy 2 . dy 2 can be eliminated from the above equation with the
help of second order differential of the constraint as shown below.
dg g x dx g y dy
Following the procedure used to obtain equation (5), we can write
d 2 g g xx dx 2 2 g xy dx dy g yy dy 2 g y d 2 y 0
2
g xx dx 2 2 g xy dx dy g yy dy 2
d y
gy
Substituting this value in (5), we have
fy fy fy
d 2 z f xx g xx dx 2 2 f xy g xy dx dy f yy g yy dy 2 …....(6)
gy gy gy
fy
From equation (2) we have,
gy
d 2 z f xx g xx dx 2 2 f xy g xy dx dy f yy g yy dy 2
C) The Second – Order Condition for Lagrangian function
The discussion of the Lagrange multiplier method has focused on necessary conditions for
identifying an extreme point. But, the necessary conditions do not distinguish between maxima
and minima.
In this section, we present sufficient conditions for determining whether an optimum set of
values represent a maximum / minimum of the objective function in the context of the Lagrange
multiplier method.
The sufficient conditions for casing the stationary point of a multivariate function in the free –
extremum case is based on whether the Hessian matrix of the function is positive definite or
negative definite when evaluated at the stationary point.
Similarly, the sufficient condition for a constrained optimization problem depend up on whether
the lagrangian function is positive definite or negative definite when evaluated at the stationary
point.
But, we must be careful not to apply the soc developed for the unconstrained problem. As we
shall be, the new conditions must be stated in terms of the second order total differential, d2 z.
For a constrained extremum of Z = f (x , y) , subject to g (x , y) = c, the second order necessary
and sufficient conditions still revolve around the sing of d2 z , evaluated at the stationary point.
But, in the present context, we are concerned with the sign definiteness/ semi definiteness of d2 z,
not for all possible values of dx and dy but only for those dx and dy values (not both zero)
satisfying the linear constraint:-
g x dx g y d y 0
The second – order necessary conditions are :-
85 | P a g e
For maximum of z : d2z is negative semi definite
d 2 z 0 , s to dg 0
For minimum of z :- d2 z is positive semi definite d 2 z 0 s to dg 0
The second order sufficient condition :-
For maximum: - d2z negative definite s to dg = 0
For minimum:- d2 z positive definite d 2 z 0 s to d g 0
5.1.3 The concepts of Border Hessian and Jacobian Determinants
a) The Border Hessian(H)
To determine the conditions under which d 2 z 0 , we consider the quadratic form:
d 2 Z f xx dx 2 f yy dy 2 2 f xy dx dy subject to g x d x g y d y 0
Let a = f xx b f yy , h f xy , U d x V dy d gx gy
a U 2 2 h u V b v 2 s to d U V 0 / V d U
d2 U2
q a U 2 2 h u 2 b 2 u 2 2
a 2
2 h d b d 2
2
To since U 0 the sign of is determined by the sing of 2 2 h d b d 2
2
which can be written in determinant :-
0 d
a h 2h d d 2 b d 2
h b
N. B
a 2 2hd bd 2 2 h d d 2 d b2 .
Thus :
0 d
ve definite 0
q is iff d a h
ve definite 0
h b
86 | P a g e
0 gx gy
q gx f xx f xy
gy f xy f yy
Since the determinant can be obtained by bordering the determinant of the socs, for (the Hessian
determinants un constrained extrema, by a row and a column, it is called Border Hessian and
denoted by H 2 .
Given the Lagrange function L f ( x, y ) g ( x, y ) we can write
Lx f x g x
Lxx f xx g xx
Lxy f xy g xy
Ly f y g y
Lyy f yy g yy
Thus, d 2 z Lxx dx 2 2 Lxy dx dy L yy dy 2
87 | P a g e
Thus, taking Vxx a , Vxy h , V yy b , g x , g y , u dx and V dy , we conclude that
the stationary point corresponds to a maxima (or minima) if the sign of the determinant
0 gx gy
g x Vxx Vxy is positive (or negative).
g y Vxy V yy
Since this determinant can be obtained by bordering the determinant of the second order
conditions, for the unconstrained extrema, by a row and a column, it is also called a Bordered
Hessian and is denoted by H2 .
The Multivariable Case
When the objective function takes the form
Z f x1 , x2 , ......... xn s to g x1 , ............ xn C
0 g1 g2 . . . . . gn
g1 z11 z12 . . . . . z1n
g2 z21 z22 . . . . . z2 n
H2 . . . . . . . . .
. . . . . . . . .
gn zn1 zn 2 . . . . . znn
Its successive border principal minors can be defined as:-
0 g1 g2 g3
0 g1 g 2
g z11 z12 z13
H 2 g1 z11 z12 H2 1 etc
g2 z21 z22 z23
g 2 z21 z22
g3 z31 z32 z33
Hn H
88 | P a g e
Remarks: Given the objective function z f x1 , x2 ........ xn and the constraint
g x1 , x2 , .......... xn 0 , the second order conditions are:
H 2 0 ; H 3 0 , H 4 0 , ............ , 1
n
ii) If H n 0 , the stationary point
is a maxima, where
0 g1 g2 g3
0 g1 g2
g1 V11 V12 V13
H2 g1 V11 V12 , H 3 , and so on
g 2 V21 V22 V23
g 2 V21 V22
g 3 V31 V32 V33
0 g1 g2 . . . gn
g1 V11 V12 . . . V1n
g 2 V21 V22 . . . V2 n
Hn . . . . . . .
. . . . . . .
. . . . . . .
g n Vn1 Vn 2 . . . Vnn
…..... 1
n
Hn 0
89 | P a g e
Solution
While forming the Lagrange function, it should be kept in mind that the constraint should be first
expressed in implicit form i.e. g ( x, y ) 0 form. We can write
V 5 x 2 6 y 2 xy 24 x 2 y
First order conditions:
V
10 x y 0 …………………………………….. (1)
x
V
12 y x 2 0 …………………………………… (2)
y
V
24 x 2 y 0 …………………………………..… (3)
Eliminating from (1) and (2)
12 y x 20 x 2 y or 14 y 21x or 2 y 3 x
90 | P a g e
L 0.1 x12 0.2 x22 0.2 x1 x2 180 x1 60 x2 25,000 1000 x1 x2
L
0.2 x1 0.2 x2 180 0
x1
L
0.4 x2 0.2 x1 60 0 : 0.4 x2 0.2 x1 60
x2
x1 x2 1000 , x1 * 400
Soc :-
C x1 0.2 x1 0.2 x2 180 C x2 0.4 x2 0.2 x1 60
0 1 1
H2 1 0 .2 0 .2 0 .2 0
1 0 .2 0 .4
The stationary point is a minima.
b) Jacobian Determinant (J)
A Jacobian determinant permits testing for functional dependence for both linear and non-linear
case. It is composed of all the first order partial derivation of a system equation arranged in
ordered sequence.
If the equitation’s are linear the Jacobian determinant is same as Hessian determinant. However,
for non-linear equations the normal Hessian determinate is not applicable.
Let Y1=f1 (X1,X2,X3), Y2=f2(X1,X2,X3) .
Therefore, the Jacobian determinant is for partial derivatives is as follows
91 | P a g e
Y 1 Y 1
Y 1
X 1 X 3
X 2
Y 2 Y 2
Y 2
/J/=
X 1 X 2
X 3
Y 3 Y 3
Y 3
X 1 X 3
X 2
Example1: Linear case
Y1=6X1+4X2
Y2=4X1+5X2
Solution
6 4 X 1 Y 1
The Jacobian determinant form is give as,/J/ =
4 5 X 2 Y 2
Thus, /J/=/H/=30-16=14≠ 0, which implies there is unique solution for X and Y, so you can find
the solution by applying cramer’s rule.
Example 2: Non- linear case
Y1=4X1-X2
Y2=16X12 +8X1X2+x22
Y11 Y12 4 1
/J/= = , implies /J/≠0 .
Y 21 Y 22 32 x1 8 x2 8 x1 2 x2
5.1.4. Constrained Optimization and Envelope theorem
Let Z f x , y , d be the objective function and g x , y , d 0 be the constraint. the
Lagrange function is L f x , y , a g x , y , a
Foc
L
fx gx 0
x
L
fy gy 0
y
L
g x , y , a 0
z* f x a , y a , a
92 | P a g e
Differentiating with respect d
d z* d dy
fx . x f y . f a ………………………………… (1)
da da da
dx dy
gx gy g a 0 ………………………………………… (2) .
da da
Multiplying equation (2) by "" and add to equation (1).
Then.
=0
dz * d dy d dy
fx . x f y . f a g x x g y ga
da da da da da
=0
=0
dz * dx dy
fx gx fy gy fa ga
da da da
dz * L
fa ga
da a
x x a
y y a
a
Example:
Let U f x1 , x2 be utility function of a consumer and M x1 P1 x2 P2 be his budget
constraint, where M , P1 , P2 are parameters. Determine the rate of change of optimal utility
with respect each of the three parameters.
Solution:-
L f x1 , x2 M x1 P1 x2 P2
93 | P a g e
du*
U * increase by units with increase in M.
dM
dU *
x1 U * increase by x1 units with increase in P2.
d P1
5.2. Constrained Optimization with Inequality Constraints
Sometimes optimization problems may have constraints that take the form of inequalities rather
than equalities. Constrains may be more naturally expressed as inequalities rather than equalities.
For example, consumption of a good is non – negative.
Maximization problems
Let us maximize f x1 , x2 , ..... xn subject to the constraints g i x1 , x2 , ... , xn Ci ,
where Ci s are constants. This can be rewritten clearly as below.
Maximize f x1 , x2 , ..... xn
Subject to g1 x1 , x2 , ....... xn C1
g 2 x1 , x2 , ..... xn C2
.
g n x1 , ..... xn Cm
[A case with m constraints, n choice variable non negativity restriction, x1 , x2 , .........xn 0 Thus,
we have 3 ingredients; the objective function, the subject function and the non negativity
constraints.
Dear students! What if you have a constraint in sign?
If we encounter optimization problem with ≥ sign, we can simply multiply both sides of it by (-1)
and change the sign in to inequality. We then proceed as usual.
5.2.1. The Kuhn – Tucker Condition
The Kuhn Tucker condition, unlike the classical optimization, may not be the necessary
condition. If we can make adjustments, it can be both necessary and sufficient conditions.
The three conditions to solve inequalities using Kuhn-Tucker approach are;
1. Non- negativity constraint
2. Inequalities constraint
3. Complimentary slackness condition
Each step will be discussed below;
1. Effect of Non – negativity Restriction
Assume the objective function with only one variable as
Max f x1 Subject to the non – negativity restrictions x1 0
There are 3 possibilities for maximum
94 | P a g e
.Thus `from the above discussion, the condition for maximization is
f ' x1 0 and x1 0
x1 . f x1 0
Generalization:- fi 0 , xi fi 0 where fi i 1 , .... n
xi
f x1 , ..... x n s to x1 , x 2 ........ x n 0
Comparative statics is concerned with the comparison of different equilibrium states that are
associated with different sets of values of parameters or exogenous variables. We have two
classes of variables.
1. Choice variables- are also called endogenous variables are variables that are determined
within the model. They are under the control of decision makers or firms.
2. Exogenous Variables or Parameters-are variables outside the model. They are out of the
control of the firms (decision makers). They are determined by factors outside the model.
In comparative statics, there is a need for point of reference or starting point so as to make the
comparison clear and visible. For example, initial equilibrium states are most commonly used as
starting points.
Example: In the theory of demand and supply price is exogenous: �� = � � & �� = �(�)
96 | P a g e
In both cases � is exogenous in the sense that it is not influenced by the actions of decision
makers. It is rather determined by factors outside the control of both consumers and suppliers in
the market.
In comparative statics, we disregard the process of adjustment of the variables i.e. we simply
compare the pre-change and the post-change equilibrium states and the way through which the
change has come is out of the concern of the field. Hence, comparative statics is essentially
concerned with finding the rate of change of the equilibrium value of the choice or endogenous
variables with respect to the change in a particular parameter or exogenous variable.
From example; what happens to the quantity demanded of a good when the price of the good, the
price of relative commodities or income changes is a representative question in comparative
statics.
A comparative static analysis can be either qualitative or quantitative in nature. If the interest is
to known the direction of change of the endogenous variables as a result of the change in the
parameters, the analysis is qualitative type. However, if the concern is both on direction and
magnitude of the change, the analysis will be quantitative.
In economics, theories are commonly tested on the basis of changes in the variables which may
or may not result in relation of assumptions
In comparative statics we make use of derivatives in assessing the rate of change of one
endogenous variable as a result of the change in one or more parameters or exogenous variables.
97 | P a g e
If the firm is in perfect competitative market, � = ��; but from the above,
�� = �� + � ⇒ � = �� + �. This implies that the firm chooses the level of output such that
�� = �� + �.
SOC: �'' = �'' (�) − �'' (�) < 0 ⇒ �'' (�) < �'' (�) i.e., the slope of the marginalrevenue should
be less than that of marginal cost.
Any way X is a function of t. A change in t change the optimal level of the out put
produced⇒ � = �∗ � �� � = �(�). If this is the case, we can insert this definition in the FOC.
��∗ 1
⇒ = '' ∗ <0
�� � � − �'' �∗
��∗
< 0 , implies optimal level of output and tax rate are negatively related.i.e. Out put of the
��
firm decreases as the tax rate of the firm faces incerases and viceversa.
Conclusion: A prediction about the size and direction of the change of the choice variable (out
put in the above case) can be made by looking at the change in the parameter facing the decision
maker-a goal in compartive statics.
2. Maxmize � � = � � − � � = �� − �(�)
Here the assumption is that P is an exogeneous variable.i.e. P is taken as a parameter beyond the
control of decision maker.
FOC: �' � = � − �' � = �…. . (�)
SOC: �' ' � =− �' ' � ≤ �…. . (�),�'' � > 0 the MC must be upward slopping.
But, we know the optimal level of x is a function of P i.e. � = �∗ (P) and from(1)
�' � = �� = �
� = �∗ (P) Shows the supply function of the firm starting from the point where MC=P.
� = �∗ (P) tells us how much the firm offers to the market for every market determined level of
price. Inserting � = �∗ (P) in FOC � − �' (�∗ (P)) =0 and applying the SOC:
98 | P a g e
�� ��' (�∗ ) ��∗
. =0
�� ��∗ ��
��∗ ��∗ 1
1 − �'' �∗ . ��
=0⇒ �� �'' (�∗)
and �'' �∗ > 0
��∗
>0 ⇒
��
This shows there is a positive relationship between the parameter P and the endogenous variable
output. In other words the supply function of a competitive firm is up ward slopping.i.e, � =
�∗ (�) has a graph of the slope.
Q
S
�2
�1
�'
D
P
�1 �2
99 | P a g e
ii) Similarly, an increase in ‘c’ shifts the supply curve to the right. That is, as � ↑⇒ � ↑.
Q
S
�'
�1
�2
P
�1 �2
100 | P a g e
iii) An increase in ‘b’ increases the slope of the demand curve. That is, as � ↑⇒ � ↓.
Q
S
�1
�2
D
�'
P
�2 �1
iv) An increase in‘d’ increases the slope of the demand curve. That is, � ↑⇒ � ↓.
Q �'
�1 S
�2
D
0
�2 �1 P
Activity: Express the relationship between the parameters and the equilibrium level of the
output both graphically and algebraically.
4. National Income Model: The discussion of comparative static analysis can also be made on
national income model. We have
� = � + � 0 + �0
� = � + � � − � ; � > 0; 0 < � < 1
� = � + ��; (� > 0; 0 < � < 1)
�ℎ���,
101 | P a g e
� − �ℎ��� �ℎ� ���������� ����������� �����.
� − �������� �ℎ� �������� ����������� �� ����������� ��� .
� − ��������� �ℎ� ��� − ������ ��� ����.
� − ������ ��� ����.
���, �−T is usually referred to as disposable income (that is, income after tax). Y and C stand
for the endogenous variables national income and consumption expenditure respectively.
�0 ��� �0 are exogeneously determined investment and government expenditures. The first
equation is an equilibrium condition (National Income = Total Expenditure), while the second
and third are behavioral equations, that is, consumption and tax functions. Moreover, the
equations show that the model is of closed type because the trade terms are not incorporated.
NB: The equations are neither functionally dependent nor inconsistent with each other.
Thus, we can determine the equilibrium levels of the endogeneous variables, Y, C and T in terms
of the exogenous variables. �0 ��� �0 and the parameters �, �, � ��� �.
Substitution of the third equation in the second and then the second in the first, we have
� = � + � � − � + �� = � + � � − � − �� = � − �� + 1 − � �
� = � + �� − �� − ��� + �0 + �0
� + ��� − �� = � − �� + �0 + �0
� 1 + �� − � = � − �� + �0 + �0
� − �� + �0 + �0
�=
1 + �� − �
� − �� + �0 + �0
�=�+�
1 + �� − �
The interest in comparative static is to see the effect of the change in one of the exogenous
variable on the endogenous variables. To do so we make first order derivatives.
For example:
�� 1
��
= 1+��−� > 0 indicate the government expenditure multiplier.
��
=− 1 + �� − � −2 � − �� + �0 + �0 �
��
−� � − �� + �0 + �0
=
1 + �� − � 1 + �� − �
−��
= <0
1 + �� − �
This result indicates income tax multiplier.
�� ��
Activity: Find �� ��� ��
, and determine their signs.
102 | P a g e
6.3 Jacobians and Hessian Determinants
Both follow the same procedure in their computation. That is, we take the coefficients of first
order derivatives as elements of the determinant of both the Jacobian and Hessian. Or, simply we
evaluate the second order derivative of the Lagrangian function with respect to the choice
variables and take them as elements of the determinants.
The two differ in their purpose, while the Jacobians are used to check whether a given system
has a solution before an attempt, the Hessians are used as sufficient conditions to determine a
given point as a relative maximization or minimization point.
Note: The detail concepts and mathematical application of Jacobian and Hessian determinants
addressed in the previous chapter. The relevance of this sub topic here is that in order to remind
the students to relate the concepts with comparative statistics application.
103 | P a g e
6.4 Comparative Static and General Function Model
In all the above models the equilibrium values of the endogenous variables could be explicitly
expressed in terms of the exogenous variables. Accordingly the technique of simple partial
differentiation was all we needed to obtain the desired comparative static information.
However, when a model contains functions expressed in general form, explicit solutions are not
available, i.e., the equilibrium values of the endogenous variables can not be expressed explicitly
in terms of parameters and/or exogenous variables. In such cases a new technique must be
adopted that makes use of concepts such as total differentials, implicit function rule e.t.c.
Let’s try to illustrate the point with a market model. Consider a market model where �� is a
function of both price and an exogenously determined income �0 , and �� is a function of price
alone.
�� ��
�� = � �, �0 where < 0 and > 0.
�� ��0
��
�� = � � where > 0.
��
And, the equilibrium position of the market is defined by �� = �� . This implies
� �, �0 = � �
� �, �0 − � � = 0.
Even though the above equation can not be solved explicitly for the equilibrium price, � , we
assume that there does exist a static equilibrium before and after the change in the exogeneous
variable �0 .
Say we have obtained� , if income of consumers changes, the whole equilibrium will be upset.
This indicates that the optimal value of � which sets the market at equilibrium is a function of
the exogenous variable �0 . That is, � = �(�0 ).
The change in income upsets the equilibrium by causing a shift in the demand function implying
that every value of �0 yields a unique value of �.
104 | P a g e
��
�� − �� �� −
��0 �������� ��������
0
= = = > 0.
��0 �� �� �� �������� − ��������
�� −
�� ��
��
Hence, �� > 0. This indicates that an increase in income results in an increase in equilibrium
0
price and vice versa.
��
To answer the second question, that is �� :
0
� = �(�(�0 ))
��
Therefore, �� > 0 implying that an increase in income increases the equilibrium level of output.
0
Generally, the comparative static results convey the proposition that an upward shift of the
demand curve (due to a rise in Y), results in a higher equilibrium price and equilibrium quantity.
The above derivation of the relationship between � and � and the exogenous variable �0 can be
carried out by having a simultaneous equation approach.
�� = � �, �0 ; �� = � �
At equilibrium, �� = �� = � ⇒ �� − � = 0 ��� �� − � = 0
� �, �0 − � = 0
� � −�=0
�1 �, �, �0 = � �, �0 − � = 0
�2 �, �, �0 = � � − � = 0
105 | P a g e
In the above two equations, we have two endogenous variables, P and Q and one exogenous
variable, �0 .
Let’s apply the partial differentiation rule on �1 and �2 , we can form a Jacobian, one row the �1
and the other for �2 :
��1 ��1 ��
�� �� −1 �� �� �� ��
� = = �� = − > 0 because > 0 and < 0.
��2 �� 2 �� �� �� �� ��
−1
�� �� ��
� = �(�0 ).
Thus, the equilibrium condition can be written in the form of the identity � �, �0 − � =
0 and � � − � = 0.
�� ��
. �� + . ��0 − �� = 0
�� ��0
��
. �� − �� = 0
��
Dividing both sides of both equations by ��0 yields:
�� �� �� ��
. − =−
�� ��0 ��0 ��0
�� �� ��
. − =0
�� ��0 ��0
�� ��
Now, taking �� and �� as variables and writing the above equations in matrix form:
0 0
�� ��
−1 �� ��
�� 0 −
= ��0
�� ��
−1 0
�� ��0
In order to solve for the variables we can apply the Cramer’s rule, but before that let us
check � ≠ 0.
106 | P a g e
��
��
−1 �� ��
� = ��
= �� − �� > 0 which is different from zero. Hence, it is possibe to
−1
��
��
− −1
��0 ��
�� 0 −1 = ��0 > 0
=
��0 � �� ��
−
�� ��
�� ��
−
�� ��0
�� �� �� �� ��
0 0− . .
�� �� ��0 �� ��0 ��
= = = >0
��0 � �� �� �� ��
− −
�� �� �� ��
�� ��
Therefore, both �� and �� are greater than zero, identical with results obtained before.
0 0
Comparative static is helpful in finding out how a dis-equilibriating change in a parameter, will
affect the equilibrium state of a model. However, by its very nature, it has the following
limitations.
1. It ignores the process of adjustment from the old equilibrium to the new one.
2. It neglects the time element involved in adjustment process from one equilibrium to another.
3. It assumes always a change in a parameter and/or exogenous variable results in a new
equilibrium. It disregards the possibility that new equilibrium may be attained even because
of the inherent instability of the model.
107 | P a g e
CHAPTER SEVEN
DYNAMIC OPTIMIZATION.
7.1 Definitions and Concepts
I. Discrete –time case: - Where time can be considered as a discrete variable, in which (case) the
variable undergoes a change only once within a period of time. This utilizes the methods of
difference equations.
Example :
Interest compounding per year, month e.t.c. The curve is not smooth here.
II. Continuous-time case: -where time is considered as a continuous variable, in which case
something is happening to the variable at each point of time. Here, integral calculus and
differential equations are used. Example population growth.
108 | P a g e
7.1.1 First –Order Linear Differential Equations
Differential equations are equations involving derivatives (differentials). They express the rates
of change of continuous functions over time. The objective in working with differential
equations is to find a function, without a derivative (differential) which satisfies the differential
equation. Such a function is called the solution or integral of the equation.
Note: The order of a differential equation is the order of the highest derivative in the equation.
The degree of a differential equation is the highest power to which the derivative of the highest
order is raised.
��
Example: 1) ��
= 2� + 6 : First order, first degree
�2� �� 3
2) ��2
+ ��
+ �2 = 0: Second order, first degree
7 5
�2� �3 �
3) ��2
+ ��3
= 75y: Third order, fifth degree
��
General form: ��
+ �� = � where V and Z are constants or functions of time.
��
When constants⇒ ��
+ 2� = 3
��
When functions of time ⇒ ��
+ � � � = �(�)
��
Note: For a first order linear differential equation ��
and y must be not higher than the first
��
degree and no product � �� may occur.
�
General solution: � � = ��−�� + �
Where, A ≡is an arbitarary constant.
��−�� ≡ �� ⇒ the complementary solution.
�
= �� ⇒Particular integral
�
Proof: Given the general linear differential equation:
�� ��
+ �� = � ⇒ = � − ��
�� ��
��
Separating the variables:�−�� = ��
��
Integrating both sides: �−��
= ��
−1
ln � − �� = � + �1
�
ln � − �� =− Vt + ( − V�1 )
�n � − �� =− Vt + C2 Where, �2 =-V�1
−��+�2
� − �� = � Since if ln � = � ⇒= ��
� − �� = �−�� ��2 = �−�� � Where c=��2
−�� = ��−�� − �
−� −�� � −�
�=
�
� +
�
Where �=
�
109 | P a g e
Hence, for the above general linear differential equation, the general solution is given as � =
�
��−�� + . This general solution works for both V and Z as a constants and functions of time.
�
First order linear difference equations are used to analyze changes with respect to time when
what happens in one period depends up on what happened in the previous period. A difference
equation expresses a relationship between a dependent variable and a lagged independent
variable which changes at discrete intervals of time. Example: consumption in one period
depends on the previous income. That is, �� = � ��−1 �� = for discrete case,
110 | P a g e
�(�) = for continous case
Note: The order of a difference equation is determined by the greatest number of periods lagged.
A first order difference equation expresses a time lag of one period; second order difference
equation a two period time lag and so on.
Δ�
�. �. = Δ�� = ��+1 − �� first order
Δ�
Most of the time delta (Δ)is omitted, and we write like this
The solution of a difference equation defines y for every value of t and does not contain a
difference expression.
Note: The dependent variable does not appear raise to a power higher than one or as a cross
product.
1. Iterative method
The first order difference equation describes the pattern of change of y between two consecutive
periods only. Hence, once the difference equation is specified and an initial value �0 is given, it
is possible to find y, from the equation. Similarly, once �1 is found, �2 will be immediately
obtained, and so on, b y repeated application (iteration) of the pattern of change specified in the
difference equation. The results of iteration will enable us to inter a time path or the variable
under consideration.
Examples:
Find the solution of a difference equation Δ �� = 2 assuming an initial value of
� = 15.
��������:
�� =��+1 − �� , Δ �� =2⇒ ��+1 = �� + 2
since Δ Then by successive substitutions of � = 0,1,2,3, �. �. � we obtain
111 | P a g e
� = 0: �1 = �0 + 2
� = 1: �2 = �1 + 2 = �0 + 2 + 2 = �0 + 4 = �0 + 2(2)
� = 2: �3 = �2 + 2 = �0 + 2(2) + 2 = �0 + 6 = �0 + 3(2)
� = 3: �4 = �3 + 2 = �0 + 3(2) + 2 = �0 + 8 = �0 + 4(2)
For any period � = �� = �0 + �(2)
112 | P a g e
value over time, we must also have ��+1 = �. Then, substitute these values in to the complete
equation:
�
��+1 − ��� = � ⇒ � − �� = � ⇒ � 1 − � = � ⇒ � =
1−�
�
Hence, the particular solution becomes �� = � ⇒ �� = 1−� , � ≠ 1.
�
Note that since 1−�
is a constant, a stationary equilibrium is indicated in this case.
Since �� is undefined at b=1, we need to find some other solution for the non-homogeneous
equation. So, let’s try a solution of the form �� = �� which indicates a moving equilibrium,
�� = �� implies ��+1 = �(� + 1).
Substituting these values in the complete equation, we obtain
��+1 − ��� = � ⇒ � � + 1 − � �� = �
⇒ � � + 1 − �� = �, since � = 1
⇒ � � + 1 − � = � ⇒ � = � thus, �� = �� ⇒ �� = ��, for � = 1
c) Finally, adding �� ��� �� we arrive at the general solution
�
�� = �� + �� ⇒ �� = ��� + ……. 1 General solution when � ≠ 1.
1−�
�� = ��� + �� ⇒ �� = � + ��………. 2 General solution when � = 1.
Eliminating the arbitrary constant, the definite solution will be written as:
� �
From (1) at � = 0, �0 = ��0 + 1−� ⇒ � = �0 − 1−�
� �
�� = �0 − �� + 1' Definite solution, when � ≠ 1
1−� 1−�
From (2) at � = 0, �0 = � + � 0 = �
113 | P a g e
In the continuous- time case, the dynamic stability of equilibrium could be done using stability
conditions. Stability condition using the general formula for the difference equation �� = �0 −
� �
1−�
�� + 1−� ,it can be expressed in a more general formula as:
� �
�� = ��� + �, � = �0 − &�=
1−� 1−�
The time path �� will be dynamically stable only if the complimentary function ��� →
0 �� � → ∞ i.e.
�� = � � �=∞
Assuming for the moment � = 1 and � = 0 , the exponential expression �� generates seven
different time paths depending on the value of b (which can range from -∞ to ∞).
1. If � > 1 , �� increases at an increasing rate as t increase. Thus, the time path will explode
(diverge) and move farther and farther away from the horizontal line.
� 0 1 2 3 4
�� 1 3 9 27 81
2. If � = 1, �� =1 for all values of t, thus the time path is represented by a horizontal line.
3. If −1 < � < 1 , (i.e. b is a positive fraction), �� decreases as t increases, thus the time
path would be damped and moves towards the equilibrium line (horizontal line).
Example: if � = 1 3
T 0 1 2 3 4
�� 1 1 1 1 1
3 9 27 81
114 | P a g e
5. If −1 < � < 0 i.e. (b is negative fraction), �� oscillates between positive and negative
values, and the time path will draw closer and closer to the horizontal line (equilibrium).
Example: if � =− 1 3
T 0 1 2 3 4
�� 1 −1 −1 −1 −1
3 9 27 81
6. If � =− 1, �� oscillates between +1 and -1.
Example:
T 0 1 2 3 4 5
�� 1 -1 1 -1 1 -1
7. If � <− 1, �� will oscillate and move farther and farther away from the horizontal line
equilibrium.:
Differential equations are used to determine the conditions for dynamic stability in micro-
economic models of market equilibria and to trace the time path of growth under various
conditions in macro-models.
Given the growth rate of a function, differential equations enable the economist to find the
function whose growth is described. Furthermore, from point elasticity they enable him to
estimate the demand function.
Examples:
Solution:
�−�
First find equilibrium price (�): �� = �� ⇒ � + �� = � + ℎ� ⇒ ℎ − � � = � − � ⇒ � = ℎ−�
Price (P)
Excess Supply Supply
115 | P a g e
� Demand=Supply (Equilibrium)
Excess Demand
Demand
0 � Quantity
��
Assume that the rate of change of p in the market �� is a positive linear function of excess
demand(�� − �� ):
��
�� = �(�� − �� ), �ℎ��� � > 0
� ≡ adjustment coefficient substituting the values parameters of �� and �� ; we get:
��
�� = � � + �� − � − ℎ� = �(� − � + (� − ℎ)�)
��
Rearranging to fit the general format of differential equation: �� + �� = �
��
�� + � ℎ − � � = � � − �
Then, since v= � ℎ − � �, and� = � � − � , the general solution becomes
� �−�
(� � = ��−�� + � ): � � = ��−�(ℎ−�)� + ℎ−�
Consequently, the time path will indeed lead the price towards the equilibrium position. In this
case, the equilibrium is said to be dynamically stable.
116 | P a g e
But, depending on the relative magnitude of � 0 and � , the above solution yields three
possible time paths.
To illustrate the use of difference equations in economic analysis, we use a market model of a
single commodity (cobweb model).
For many products (such as agricultural commodities) which are planted a year before marketing,
current supply depends on last year’s price. ��,�+1 = � �� �� ��,� = �(��−1).
When such a supply function interacts with a demand function of the form: ��,� = �(�� )
interesting dynamic price patterns will result.
Using the linear version of the lagged supply and un-lagged demand function we get the market
model with the following equations:
Using the last equation, the model can reduced to a single first order difference equation:
��� = ���
�−� ℎ ℎ �−�
�� = + ��−1 ⇒ �� − ��−1 =
� � � �
ℎ �−�
Shifting the periods forward by one period: ��+1 − � �� = �
� �
The definite solution is given as �� = �0 − 1−� �� + 1−� :
�−� �−�
�� = �0 − � ℎ �
( �) + �
1−ℎ 1−ℎ
� �
�−� ℎ � �−� �−�
�� = �0 − ( ) + ��� � =
�−ℎ � �−ℎ �−ℎ
117 | P a g e
ℎ
Hence, �� = �0 − � ( )� + �
�
Two points may be observed in regard to this time path:
Graphic Illustration
�� ��
��
� �
�
� � �
�2 �2
�2
�4
D D
�3 �1 � �0 �2 P �1 � �0 P �1 � �2 �0 P
Fig (a): assume that the intersection of D and S yields the intertemporal equilibrium price � .
Given an initial price (�0 ) (where �0 > � ), quantity supplied in the next period (period 1) will
118 | P a g e
be�1 . But, at this point (�1 ), supply exceeds demand, so that there is a tendency for price to fall.
Consequently, the market clears at �1 .
-At�2 , demand exceeds supply, so there is a tendency for p to rise. As a result the market clears
at �2.
- Again, assuming �2 to prevail, the supply in period 3 will be �3 (where�3 > �2 ). At �3 , since
supply>demand, the market clears at �3 . Replacing this reasoning we can trace out the prices
and quantities in subsequent periods by following the arrowheads, thereby, spinning a cob web
around the demand and supply curves.
The time path of price is oscillatory and explosive when S is steeper than D or (h>b).
Fig (b): �� oscillates uniformly so that equilibrium is unstable. �� Neither diverges nor converts.
Fig (c): �� is oscillatory and convergent when S is flatter than D (h<b) so that equilibrium of
time path ( �� ) will be dynamically stable.
Examples:
ℎ 3
Hence, �� = �0 − � ( )� + � ⇒ > 1
� 2
25 3
a. From ��� = ��� ⇒ 20 − 2 �� ==− 5 + 3 ��−1 ⇒ �� = 2
− 2 ��−1
Shifting the time periods forward one period and rearranging:
3 25
��+1 + �� =
2 2
Definite solution:
25 25
�� = 4 − 2
( − 3 2 )� +
=(4-5) ( − 3 2 )� + 5
2
1+3 2 1+3 2
�� =− ( − 3 2 )� + 5 �� �� = 5 − ( − 3 2 )�
Substation in the demand function yields:
119 | P a g e
� �
�� = 20 − 2 5 − − 3 2 = 10 + 2( − 3 2 )� �� �� = 2 − 3 2 + 10
b. If there is an equilibrium (i.e., if the price is constant at any period meaning �� = ��−1 =
��−2 = … = ��−� ), then it follows that
�� = ��−1 = � and Qt = Q
Then, from ��� = ���
20 − 2� =− 5 + 3� ⇒ � = 5
Quantity Supply
�2
�0
�
�1
�3
Demand
P2 P0 � P1 P3 Price
The equilibrium is unstable because the time path is oscillatory and divergent.
120 | P a g e
PART I
ECONOMETRICS ONE
Chapter 1
Introduction
1.1 Definition and scope of econometrics
What Is Econometrics?
Literally, econometrics means “economic measurement”. It is basically concerned with
measuring economic relationships.
But the scope is much broader. Various econometricians used different ways to define
econometrics.
“Econometrics is the science which integrates and applies: economic theory, economic
statistics, and mathematical economics to investigate the empirical support of the general
laws established by economic theory.
It is a special type of economic analysis and research.
- Starting from economic theory, we can express the relationship in mathematical terms so
that they can be measured by the methods called econometric methods in order to obtain
numerical estimates of the coefficients of the economic relationships.
1.1 Why a Separate Discipline?
Econometrics integrate and applies:- economic theory, mathematical economics,
economic statistics and Mathematical statistics to provide numerical values for the
parameters of economic relationships and verifying economic theories.
Economic theory- makes statements or hypotheses that are mostly qualitative in nature. But
It does not provide numerical measure of the relationship between economic variables.
It does not tell by how much the quantity of one variable will go up or down as a result of
a certain change in the other variable.
Therefore, it is the job of the econometrician to provide such numerical estimates.
Econometrics gives empirical content to most economic theory.
Mathematical economics - expresses economic theory in mathematical form (equations) without
regard to measurability or empirical verification of the theory.
No essential difference between mathematical economics and economic theory, economic
theory use verbal exposition while mathematical uses symbols.
Both express economic relationships in an exact or deterministic form. They
(mathematical economics and economic theory) does not allow for random elements
which might affect the relationship and make it stochastic.
121 | P a g e
Furthermore, they do not provide numerical values for the coefficients of economic
relationships.
However, econometrics method does not assume exact or deterministic relationship. It
assumes random disturbance variable in relationships among economic variables which
relate deviations from exact behavioral patterns suggested by economic theory and
mathematical economics. Furthermore, econometric methods provide numerical values
of the coefficients of economic relationships.
Economic statistics- is mainly concerned with collecting, processing, and presenting economic
data in the form of charts and tables and describing the pattern of economic data over time. The
data thus collected constitute the raw data for econometric work. But the economic statistician
does not go any further to test economic theories, however; the econometrician does.
Mathematical (or inferential) statistics- deals with the method of measurement which is
developed on the basis of controlled experiments. But statistical methods of measurement are
not appropriate for a number of economic relationships because for most economic relationships
controlled or carefully planned experiments cannot be designed due to the fact that the nature of
relationships among economic variables are stochastic or random. Yet the fundamental ideas of
inferential statistics are applicable in econometrics, but they must be adapted to the problem of
economic life. Econometric methods are adjusted so that they may become appropriate for the
measurement of economic relationships which are stochastic. The adjustment consists primarily
in specifying the stochastic (random) elements that are supposed to operate in the real world and
enter into the determination of the observed data.
1.4 Economic models vs econometric models
i) Economic models:
- The simplified analytical framework economic theory is called an economic model. It is
an organized set of relationships that describes the functioning of an economic entity
under a set of simplifying assumptions. All economic reasoning is ultimately based on
models.
Economic models consist of the following three basic structural elements.
1. A set of variables
2. A list of fundamental relationships and
3. A number of strategic coefficients
ii) Econometric models:
The most important characteristic of econometric relationships is that they contain a random
element which is ignored by mathematical economic models.
Example: Microeconomic theory postulates that the demand for a commodity depends on its
price, on the prices of other related commodities, on consumers’ income and on tastes. This is
an exact relationship which can be written mathematically as:
Q b 0 b1 P b 2 P0 b 3 Y b 4 t → exact relationship.
However, many more factors may affect demand for the commodity. In econometrics the
influence of these ‘other’ factors is taken into account by the introduction of random variable
into the economic relationships.
122 | P a g e
In our example, the demand function studied with the tools of econometrics would be of the
stochastic form:
Q b0 b1 P b2 P0 b3Y b4 t u , where u stands for the random factors which affect
the quantity demanded.
123 | P a g e
iv. Examination of the degree of correlation between the explanatory variables (i.e.
examination of the problem of multicollinearity).
v. Choice of appropriate econometric techniques for estimation, i.e. to decide a specific
econometric method to be applied in estimation; such as, OLS, MLM, Logit, and
Probit.
3. Evaluation of the estimates
This stage consists of deciding whether the estimates of the parameters are theoretically
meaningful, statistically satisfactory and reliability of the results.
For this purpose we use three groups of criteria which may be classified into:
i. Economic a priori criteria: - determined by economic theory and refer to the size and
sign of the parameters of economic relationships.
ii. Statistical criteria (first-order tests): - determined by statistical theory which focuses
on the evaluation of the statistical reliability of the estimates of the parameters of the
model. Correlation coefficient test, standard error test, t-test, F-test, and R2-test are
some of the most commonly used statistical tests.
iii. Econometric criteria (second-order tests):-
Set by the theory of econometrics investigating whether the assumptions of
the econometric method employed are satisfied or not.
It serve as a second order test (as statistical tests) i.e. they determine the
reliability of the statistical criteria;
Help us to establish whether the estimates have the desirable properties
(unbiasedness, efficiency and consistency).
Econometric criteria aim at the detection of the violation or validity of the
assumptions of the various econometric techniques.
4) Evaluation of the forecasting power of the model:
- Forecasting is one of the aims of econometric research. It is possible that the model
may be economically meaningful and statistically and econometrically correct for the
sample period for which the model has been estimated; however, it may not be
suitable for forecasting due to various factors (reasons).
- Therefore, this stage involves the investigation of the stability of the estimates and
their sensitivity to changes in the size of the sample.
1.6) Desirable properties of an econometric model
An econometric model is a model whose parameters estimated with some appropriate
econometric technique.
The ‘goodness’ of the model is judged according to the following desirable properties.
1. Theoretical plausibility:- compatible with the postulates of economic theory.
- describe the economic phenomena to which it relates.
2. Explanatory ability:- should be able to explain the observations of the actual world.
- must be consistent with the observed behavior of the economic
variables whose relationship it determines.
3. Accuracy of the estimates:- coefficients should be accurate, approximate to the true
parameters of the structural model.
- should possess the desirable properties of unbiasedness,
consistency and efficiency.
124 | P a g e
4. Forecasting ability:- the model should be able to predict future values of the dependent
(endogenous) variables.
5. Simplicity:- The model should represent the economic relationships, fewer the equations
and the simpler mathematical form without affecting desirable properties.
1.7) Goals of Econometrics
Three main goals of Econometrics are:
i) Analysis - testing economic theory
ii) Policy making - obtaining numerical estimates of the coefficients of economic
relationships for policy simulations.
iii) Forecasting - using the numerical estimates of the coefficients in order to
forecast the future values of economic magnitudes.
1.8) The Sources, Types and Nature of Data
The success of any econometric analysis ultimately depends on the availability of the
appropriate data. It is therefore essential that we spend some time discussing the nature,
sources, and limitations of the data that one may encounter in empirical analysis.
Types of Data
Three types of data may be available for empirical analysis.
1) Time Series Data
- is a set of observations or values that a variable takes at different times.
- may be collected at regular time intervals (daily, weekly, monthly, quarterly,
annually).
- most empirical work based on time series data assumes stationary (its mean and
variance do not vary systematically over time).
2) Cross-Section Data
- data on one or more variables collected at the same point in time, such as the census
of population conducted by the Census Bureau every 10 years.
- Just as time series data create their own special problems (stationarity issue), cross-
sectional data have their own problems (the problem of heterogeneity).
3) Pooled Data:-
- In pooled, or combined, data are elements of both time series and cross-section data.
- Panel, Longitudinal, or Micro panel Data- is a special type of pooled data in which the same
cross-sectional unit (say, a family or a firm) is surveyed over time.
Chapter Two
2.1 THE CLASSICAL REGRESSION ANALYSIS
[The Simple Linear Regression Model]
Economic theories mainly concerned on the relationships among economic variables.
When it phrased in mathematical terms, can predict the effect of one variable on another.
125 | P a g e
The functional relationships of these variables define the dependence of one variable upon
the other variable (s) in the specific form.
The specific functional forms may be linear, quadratic, logarithmic, exponential,
hyperbolic, or any other form.
In this chapter we shall consider a simple linear regression model, i.e. a relationship between two
variables related in a linear form. We shall first discuss two important forms of relation:
stochastic and non-stochastic, among which we shall be using the former in econometric analysis.
2.1. Stochastic and Non-stochastic Relationships
A relationship between X and Y, as Y = f(X) is said to be:
Deterministic - if for each value of the independent variable (X) there is one and only one
corresponding value of dependent variable (Y).
Stochastic - if for a particular value of X there is a whole probabilistic distribution of values
of Y. In such a case, for any given value of X, the dependent variable Y assumes some
specific value only with some probability.
Let’s illustrate the distinction between stochastic and non stochastic relationships with the help
of a supply function.
Assuming that the supply for a certain commodity depends on its price (other determinants taken
to be constant) and the function being linear, the relationship can be put as:
Q f ( P ) P ( 2.1)
From the above relationship for a particular value of P, there is only one corresponding
value of Q. Therefore, it is a deterministic (non-stochastic) relationship since for each
price there is always only one corresponding quantity supplied.
This implies that all the variation in Y is due solely to changes in X, and that there are no
other factors affecting the dependent variable.
If plotted on a two-dimensional plane, would fall on a straight line. However, if we
gather observations on the quantity actually supplied in the market at various prices and
we plot them on a diagram we see that they do not fall on a straight line.
The derivation of the observation from the line may be attributed by the factors:
i) Omission of important variables from the function
ii) Random behavior of human beings
iii) Imperfect specification of the model
iv) Error of aggregation
v) Error of measurement
To take into account the above sources of errors we introduce a random variable(error
term or random disturbance or stochastic term) which is denoted by the letter ‘u’ or ‘ ’
in the econometric function. So called because u is supposed to ‘disturb’ the exact linear
relationship which exist between X and Y.
By introducing this random variable in the function the model is rendered stochastic of
the form:
Yi X u i ……………………………………………………….(2.2)
Thus a stochastic model is a model in which the dependent variable is not only
determined by the explanatory variable(s) included in the model but also by others which
are not included in the model.
126 | P a g e
2.2. Simple Linear Regression model.
The above stochastic relationship (2.2) with one explanatory variable is called simple linear
regression model.
The true relationship which connects the variables involved is split into two parts:
- a part represented by a line and
- a part represented by the random term ‘u’.
The scatter of observations represents the true relationship between Y and X. The line
represents the exact part of the relationship and the deviation of the observation from the
line represents the random component of the relationship.
- Were it not for the errors in the model, we would observe all the points on the line.
However because of the random disturbance, we observe points deviating from the line.
These points diverge from the regression line by u1 , u 2 ,...., u n .
Yi xi ui
the dependent var iable the regression line random var iable
- The first component in the bracket is the part of Y explained by the changes in X and the
second is the part of Y not explained by X, that is to say the change in Y is due to the
random influence of u i .
2.2.1 Assumptions of the Classical Linear Stochastic Regression Model
The classicals made important assumption in their analysis of regression .The most important of
these assumptions are discussed below.
1. The model is linear in parameters.
- They assumed that the model should be linear in the parameters regardless of whether the
explanatory and the dependent variables are linear or not.
- because if the parameters are non-linear it is difficult to estimate them since their value is
not known but you are given with the data of the dependent and independent variable.
U i is a random real variable
-
the value which u may assume in any one period depends on chance; it may be positive,
negative or zero.
2. The mean value of the random variable(U) is zero
- for each value of x, the random variable(u) may assume various values, some greater
than zero and some smaller than zero,
- if we considerer all the positive and negative values of u, for any given value of X,
average value equal to zero. In other words the positive and negative values of u cancel
each other.
Mathematically, E (U i ) 0 ………………………………..….(2.3)
3. The variance of the random variable(U) is constant in each period
(The assumption of homoscedasticity)
Mathematically;
Var (U i ) E[U i E (U i )] 2 E (U i ) 2 2 ; (since E (U i ) 0 ).
This constant variance is called homoscedasticity assumption and the constant variance
itself is called homoscedastic variance.
4. The random variable (U) has a normal distribution
127 | P a g e
- the values of u (for each x) have a bell shaped symmetrical distribution about their zero
mean and constant variance 2 , i.e.
U i N (0, 2 ) ………………………………………..……2.4
5. The random terms of different observations U i , U j are independent.
(The assumption of no autocorrelation)
- This means the value which the random term assumed in one period does not depend on
the value which it assumed in any other period.
Algebraically,
Cov (u i u j ) [(u i (u i )][u j (u j )]
E (u i u j ) 0 …………………………..….(2.5)
6. The X i are a set of fixed values in the hypothetical process of repeated sampling which
underlies the linear regression model.
- This means that, in taking large number of samples on Y and X, the X i values are the
same in all samples, but the u i values do differ from sample to sample, and so of course do
the values of y i .
7. The random variable (U) is independent of the explanatory variables.
- This means there is no correlation between the random variable and the explanatory
variable.
- If two variables are unrelated their covariance is zero.
Hence Cov( X i , U i ) 0 ………………………………………..….(2.6)
8. The explanatory variables are measured without error
- Random variable (u) absorbs the influence of omitted variables and possibly errors of
measurement in the y’s. i.e., we will assume that the regressors are error free, while y
values may or may not include errors of measurement.
Dear students! We can now use the above assumptions to derive the following basic concepts.
A. The dependent variable Yi is normally distributed.
i.e Yi ~ N ( x i ), 2 ………………………………(2.7)
128 | P a g e
The next step is the estimation of the numerical values of the parameters of economic
relationships.
The parameters of the regression model can be estimated by various methods.
Three of the most commonly used methods are:
1. Ordinary least square method (OLS)
2. Maximum likelihood method (MLM)
3. Method of moments (MM)
2.2.2.1 The ordinary least square (OLS) method
The model Yi X i U i is called the true relationship between Y and X because Y
and X represent their respective population value, and
and are the true parameters since they are estimated from the population value of
Y and X.
But it is difficult to obtain the population value of Y and X because of technical or
economic reasons.
- So we take the sample value of Y and X. The parameters estimated from the sample value
of Y and X are called the estimators of the true parameters and and are symbolized
as ˆ and ˆ .
- The model Yi ˆ ˆX i ei , is called estimated relationship between Y and X since
ˆ and ˆ are estimated from the sample of Y and X and
- ei represents the sample counter part of the population random disturbance U i .
Estimation of and by least square method (OLS) involves finding values for the
estimators ˆ and ˆ which will minimize the sum of square of the squared residuals( ei2 ).
From the estimated relationship: Y ˆ ˆX e , i i i
we obtain:
ei Yi (ˆ ˆX i ) ……………………………(2.6)
e 2
i (Yi ˆ ˆX i ) 2 ……………………….(2.7)
To find the values of ˆ and ˆ that minimize this sum, we have to partially differentiate
ei2 with respect to ˆ and ˆ and set equal to zero.
ei2
1. 2 (Yi ˆ ˆX i ) 0.......................................................(2.8)
ˆ
Rearranging this expression we will get: Y n ˆX ……….(2.9)
i i
129 | P a g e
Note: the term in the parenthesis in equation 2.8 and 2.11 is the residual, e Yi ˆ ˆX i .
Hence it is possible to rewrite (2.8) and (2.11) as 2 ei 0 and 2 X i ei 0 .
It follows that;
e i 0 and X e i i 0............................................(2.12)
If we rearrange equation (2.11) we obtain;
Yi X i ˆX i ˆX i2 ……………………………………….(2.13)
Equation (2.9) and (2.13) are called the Normal Equations. Substituting the values of ̂ from
(2.10) to (2.13), we get:
Y X X (Y ˆX ) ˆX 2
i i i i
Y X i ˆXX i ˆX i2
Y X i i Y X i ˆ (X i2 XX i )
XY nXY = ˆ ( X i2 nX 2)
XY n X Y
ˆ ………………….(2.14)
X i2 n X 2
Equation (2.14) can be rewritten in somewhat different way as follows;
( X X )(Y Y ) ( XY XY XY XY )
XY Y X XY nXY
XY nY X nXY nXY
( X X )(Y Y ) XY n X Y (2.15)
( X X ) 2 X 2 nX 2 (2.16)
Substituting (2.15) and (2.16) in (2.14), we get
( X X )(Y Y )
ˆ
( X X ) 2
Now, denoting ( X i X ) as xi , and (Yi Y ) as yi we get;
xi yi
ˆ ……………………………………… (2.17)
x i2
The expression in (2.17) is termed as the deviation form.
2.2.2.2 Estimation of a function with zero intercept
Suppose: Yi X i U i , subject to the restriction 0.
To estimate ˆ , put in a form of restricted minimization problem and then Lagrange method is
applied.
n
We minimize: ei2 (Yi ˆ ˆX i ) 2
i 1
130 | P a g e
Subject to: ˆ 0
The composite function then becomes
Z (Yi ˆ ˆX i ) 2 ˆ , where is a Lagrange multiplier.
We minimize the function with respect to ˆ , ˆ , and
Z
2(Yi ˆ ˆX i ) 0 (i)
ˆ
Z
2(Yi ˆ ˆX i ) ( X i ) 0 (ii )
ˆ
z
2 0 (iii)
Substituting (iii) in (ii) and rearranging we obtain:
X (Y ˆX ) 0
i i i
Yi X i ˆX i 0
2
X i Yi
ˆ ……………………………………..(2.18)
X i2
This formula is in actual observation not in deviation forms.
132 | P a g e
But we can compute these variances if we take the unbiased estimator of 2 which is ˆ 2
from the sample value of the disturbance term �� from the expression:
ei2
ˆ u2 …………………………………..2.30
n2
To use ˆ 2 in the expressions for the variances of ˆ and ˆ , we have to prove whether ˆ 2 is
unbiased estimator of 2 , i.e.,
e
2
E (ˆ 2 ) E
i 2
n2
133 | P a g e
It measures the proportion or percentage of the total variation in Y explained by the
regression model.
If all the observations lie on the regression line, a “perfect” fit, but this is rarely the case,
there will be some positive �� and some negative �� .
By fitting the line Yˆ ˆ 0 ˆ1 X we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes in the explanatory variable X.
y i2
yˆ 2 e i2 ………………………………... (2.47)
Total Explained Un exp lained
var iation var iation var ation
TSS ESS RSS … this shows the total variation in the observed Y values about their
mean value can be partitioned into two parts:
i. ��� = attributable to the regression line and
ii. ��� = random forces because not all actual Y observations lie on the fitted line.
Now dividing the above Equation by TSS on both sides, we obtain:
��� ���
1= +
��� ���
(�� −�)2 �2�
1= )2
+
(�� −� (�� −�)2
We now define �2 as:
(��−�)2 ��� �2� ���
�2 = (�� −�)2
= ��� �� �2 = 1 − (�� −�)2
�2 = 1 − ���
Properties of r2
i. �� �� � ����������� ��������
ii. ��� ������ ��� 0 ≤ � 2 ≤ 1.
iii. �2 = 1 ����� � ������� ���, that is, Ŷ i = Yi for each i. On the other hand,
iv. �2 = 0 ����� �� ���������ℎ�� ������� �ℎ� ���������� ��� �ℎ� ���������.
2.6 Confidence Intervals and Hypothesis Testing
To test the significance of the OLS estimators we need:
Variance of the parameter estimators
Unbiased estimator of 2
The assumption of normal distribution of error term:
The OLS estimators ˆ and ˆ are obtained from a sample of observations on Y and X. Since
sampling errors are inevitable in all estimates, it is necessary to apply test of significance in order
to measure the size of the error and determine the degree of confidence in order to measure the
validity of these estimates. This can be done by using various tests. The most common ones are:
134 | P a g e
i) Standard error test
ii) Student’s t-test
iii) Confidence interval
All of these testing procedures reach on the same conclusion. Let us now see these testing
methods one by one.
i) Standard error (s.e test
To decide whether the estimators ˆ and ˆ are significantly different from zero, i.e.
whether the sample from which they have been estimated might have come from a
population whose true parameters are zero, 0 and / or 0 .
Formally we test the null hypothesis
Null hypothesis, H 0 : i 0 against
Alternative hypothesis, H 1 : i 0
The standard error test may be outlined as follows.
1st : Compute standard error of the parameters.
SE ( ˆ ) var(ˆ )
SE (ˆ ) var(ˆ )
2nd : compare the �� with the numerical values of ˆ and ˆ .
Decision rule:
If SE ( ˆ )
i
1
2 ˆi , accept �0 and reject �1 .
We conclude that ̂ i is statistically insignificant.
If SE ( ˆi ) 1 2 ˆi , reject �0 and accept �1 .
We conclude that ̂ i is statistically significant.
The acceptance or rejection of the null hypothesis has definite economic meaning.
Acceptance of the �0 : 0 (the slope parameter is zero) implies that:
- the explanatory variable to which this estimate relates does not influence the
dependent variable Y.
- evidenced that changes in � ����� � ����������.
- Y (0) x , i.e. no relationship between X and Y.
Numerical example:
Suppose that from a sample of size n=30, we estimate the following supply function.
Q 120 0.6 p ei
SE : (1.7) (0.025)
Test the significance of the slope parameter at 5% level of significance using the �� test.
SE ( ˆ ) 0.025
( ˆ ) 0.6 , 1
2 ˆ 0.3
SE ( ˆi ) 1
2 ˆi ,
135 | P a g e
The implication is ˆ is statistically significant at 5% level of significance.
Note: The SE test is an approximated test (which is approximated from the z-test and t-test) and
implies a two tail test conducted at 5% level of significance.
( X X ) 2
sx
n 1
n sample size
Recall that under the normality assumption we can derive the t-value of the OLS estimates
ˆ
t ˆ i
SE ( ˆ )
���ℎ � − � ������ �� �������.
ˆ
tˆ
SE (ˆ )
�ℎ���: �� = �� �������� �����
� = ������ �� ���������� �� �ℎ� �����.
������� �ℎ� � ������������ ���ℎ � − 2 ��.
And since this test statistic follows the t distribution, the confidence-interval statements can be
constructed as the following:
�−�∗
Pr − � ≤ ≤� = 1 −
�� �
2 2
Where �∗ is the value � under �0 and where − � ��� � are the values of � (the critical t
2 2
values) obtained from �ℎ� � �����.
Rearranging the above equation gives:-
136 | P a g e
Pr �∗ − � �� � ≤ � ≤ �∗ + � �� � = 1 −
2 2
137 | P a g e
If � ∗ > �� , reject �0 and accept�1 .
- Conclusion: ˆ is statistically significant.
If � ∗ < �� , accept �0 and reject�1 .
- Conclusion: ˆ is statistically insignificant.
A statistic is said to be statistically significant if the value of the test statistic lies in the
critical region. In this case the null hypothesis is rejected.
By the same token, a test is said to be statistically insignificant if the value of the test
statistic lies in the acceptance region.
TABLE 2.1: The t Test of Significance: Decision
Numerical Example:
Suppose that from a sample size n=20 we estimate the following consumption function:
� = 100 + 0.7� + �
(75.5) (0.21)
The values in the brackets are standard errors. We want to test the null hypothesis:
H 0 : i 0 against
H 1 : i 0 , use the � − ���� at 5% level of significance.
a. the t-value for the test statistic is:
ˆ 0 ˆ 0.70
t* = 3.3
SE ( ˆ ) SE ( ˆ ) 0.21
b. Since the alternative hypothesis (H1) is stated by inequality sign ( ) ,it is a two tail test,
hence we divide 2 0.05 2 0.025 to obtain the critical value of ‘t’ at 2 =0.025 and 18
degree of freedom (df) i.e. (n-2=20-2). From the
t-table ‘tc’ at 0.025 level of significance and 18 df is 2.10.
c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ˆ is statistically significant.
iii) Confidence interval
a) Confidence Intervals for Regression Coefficients
Rejection of the null hypothesis doesn’t mean that our estimates ˆ and ˆ are correct estimate of
the true population parameter and . It simply means that our estimate comes from a sample
drawn from a population whose parameter is different from zero.
In order to define how close the estimate to the true parameter, we must construct confidence
interval for the true parameter, in other words we must establish limiting values around the
estimate with in which the true parameter is expected to lie within a certain “degree of
confidence”. In this respect we say that with a given probability the population parameter will
be within the defined confidence interval (confidence limits).
We choose a probability in advance and refer to it as confidence level (interval coefficient). It is
customarily in econometrics to choose the 95% confidence level. This means that in repeated
sampling the confidence limits, computed from the sample, would include the true population
parameter in 95% of the cases. In the other 5% of the cases the population parameter will fall
outside the confidence interval.
138 | P a g e
In a two-tail test at level of significance, the probability of obtaining the specific t-value either
–tc or tc is 2 at n-2 degree of freedom. The probability of obtaining any value of t which is
ˆ
equal to at n-2 degree of freedom is 1 2 2 i.e. 1 .
SE ( ˆ )
i.e. Pr t c t* t c 1 …………………………………………(2.57)
ˆ
but t* …………………………………………………….(2.58)
SE ( ˆ )
Substitute (2.58) in (2.57) we obtain the following expression.
ˆ
Pr t c t c 1 ………………………………………..(2.59)
SE ( ˆ )
Pr SE ( ˆ )t c ˆ SE ( ˆ )t c 1 by multiplying SE ( ˆ )
Pr ˆ SE ( ˆ )t ˆ SE ( ˆ )t 1 by subtracting ˆ
c c
Pr ˆ SE ( ˆ ) ˆ SE ( ˆ )t 1 by multiplying by 1
c
The limit within which the true lies at (1 )% degree of confidence is:
[ ˆ SE ( ˆ )t c , ˆ SE ( ˆ )t c ] ; where t c is the critical value of t at 2 confidence interval and n-2
degree of freedom.
The test procedure is outlined as follows.
H0 : 0
H1 : 0
Decision rule: If the hypothesized value of in the null hypothesis is within the confidence
interval, accept H0 and reject H1. The implication is that ˆ is statistically insignificant; while if
the hypothesized value of in the null hypothesis is outside the limit, reject H0 and accept H1.
This indicates ˆ is statistically significant.
*A 100(1−α)% confidence interval for β2.
Numerical Example:
Suppose we have estimated the following regression line from a sample of 20 observations.
Y 128.5 2.88 X e
(38.2) (0.85)
The values in the bracket are standard errors.
a. Construct 95% confidence interval for the slope of parameter
b. Test the significance of the slope parameter using constructed confidence interval.
Solution:
139 | P a g e
a. The limit within which the true lies at 95% confidence interval is:
ˆ SE ( ˆ )t c
X Y 1,296,836
i i
Y 539,512
i
2
140 | P a g e
i) Estimate the regression line of sale on price and interpret the results
ii) What is the part of the variation in sales which is not explained by the
regression line?
iii) Estimate the price elasticity of sales.
5. The following table includes the GNP(X) and the demand for food (Y) for a country over
ten years period.
year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Y 6 7 8 10 8 9 10 9 11 10
X 50 52 55 59 57 58 62 65 68 70
a. Estimate the food function
b. Compute the coefficient of determination and find the explained and unexplained
variation in the food expenditure.
c. Compute the standard error of the regression coefficients and conduct test of
significance at the 5% level of significance.
6. A sample of 20 observation corresponding to the regression model Yi X i U i
gave the following data.
Yi 21.9 Yi Y 86.9
2
X 186.2 X X 215.4
2
i i
X X Y
i i Y 106.4
a. Estimate and
b. Calculate the variance of our estimates
c.Estimate the conditional mean of Y corresponding to a value of X fixed at X=10.
7. Suppose that a researcher estimates a consumptions function and obtains the following
results:
C 15 0.81Yd n 19
(3.1) (18.7) R 2 0.99
where C=Consumption, Yd=disposable income, and numbers in the parenthesis are the ‘t-
ratios’
a. Test the significant of Yd statistically using t-ratios
b. Determine the estimated standard deviations of the parameter estimates
8. State and prove Guass-Markov theorem
9. Given the model:
Yi 0 1 X i U i with usual OLS assumptions. Derive the expression for the error
variance.
CHAPTER THREE
The Multiple Linear Regression Analysis
141 | P a g e
3.1 Introduction
A dependent variable Y can depend on many factors (explanatory variables) or regressors. For
instance, in demand studies we study the relationship between quantity demanded of a good and
price of the good, price of substitute goods and the consumer’s income. The model we assume
is:
Yi 0 1 P1 2 P2 3 X i u i -------------------- (3.1)
Where Yi quantity demanded, P1 is price of the good itself, P2 is price of substitute goods, Xi is
consumer’s income, and ' s are unknown parameters and u i is the disturbance.
Equation (3.1) is a multiple regression with three explanatory variables. In general for K-
explanatory variable we can write the model as follows:
Yi 0 1 X 1i 2 X 2i 3 X 3i ......... k X ki u i ------- (3.2)
Where X k i (i 1,2,3,......., K ) are explanatory variables, Yi is the dependent variable and
j ( j 0,1,2,....( k 1)) are unknown parameters and u i is the disturbance term.
The disturbance term has similar nature to that in simple regression, reflecting:
- the basic random nature of human responses
- errors of aggregation
- errors of measurement
- errors in specification and any other factors, other than xi that might influence Y.
Let’s start our discussion with the assumptions of the multiple regressions and we will proceed
our analysis with the case of two explanatory variables and then we will generalize the multiple
regression model in the case of k-explanatory variables using matrix algebra.
142 | P a g e
Every disturbance term u i is independent of the explanatory variables,
��� (��, �1� ) = ��� (�� , �2� ) = 0. �. �. E (u i X 1i ) E (u i X 2i ) 0
7. No specification bias: The model must be correctly specified.
8. No perfect multicollinearity:-The explanatory variables are not perfectly linearly
correlated.
- Informally, no collinearity means none of the regressors can be written as exact
linear combinations of the remaining regressors in the model.
- Formally, no collinearity means that there exists no set of numbers, λ2 and λ3, not
both zero such that: λ2X2i +λ3X3i =0.
- If such an exact linear relationship exists, then X2 andX3 are said to be collinear
or linearly dependent. On the other hand, if λ2=λ3=0, then X2 and X3 are said to
be linearly independent.
3.3. A Model with Two Explanatory Variables
In order to understand the nature of multiple regression model easily, we start our analysis with
two explanatory variables, then extend this to the case of k-explanatory variables.
3.3.1 Estimation of parameters of two-explanatory variables model
The model, PRF: Y 0 1 X 1 2 X 2 U i ……………………………………(3.3)
Let us suppose that the sample data has been used to estimate the population regression equation
and equation (3.4) has been estimated by sample regression equation, which we write as:
Yˆ ˆ ˆ X ˆ X ……………………………………………….(3.5)
0 1 1 2 2
ei2
2 X 1i Yi ˆ 0 ˆ1 X 1i ˆ1 X 1i 0 ……………………. (3.9)
ˆ
1
ei2
2 X 2i Yi ˆ 0 ˆ1 X 1i ˆ 2 X 2i 0 ………… ………..(3.10)
ˆ 2
Summing from 1 to n, the multiple regression equation produces three Normal Equations:
143 | P a g e
Y nˆ ˆ X ˆ X …………………………………….(3.11)
0 1 1i 2 2i
X Y ˆ X ˆ X ˆ X X …………………………(3.12)
2i i 0 1i 1
2
1i 2 1i 1i
X Y ˆ X ˆ X X ˆ X ………………………...(3.13)
2i i 0 2i 1 1i 2i 2
2
2i
We know that
X Yi (X i Yi nX i Yi ) xi y i
2
i
X X i ( X i nX i ) x i
2 2 2 2
i
Substituting the above equations in equation (3.14), the normal equation (3.12) can be written in
deviation form as follows:
x y ˆ x ˆ x x …………………………………………(3.16)
2
1 1 1 2 1 2
x y ˆ1x1 x 2 ˆ 2 x 2 ……………………………………….(3.19)
2
2
x x 1 2 x2
2
ˆ 2 = yx 2
144 | P a g e
In multiple linear regression models R2 measures the proportion of variation in the dependent
variable explained by all explanatory variables included in the model.
The coefficient of determination R2 has been defined as:
2
ESS
2 RSS e
R 1 1 i2
TSS TSS y i
ei2 y 2 ˆ1x1i yi ˆ 2 x2i yi
ˆ1x1i y i ˆ 2 x 2i y i
2
y2 ei
Total sum of Explained sum of Re sidual sum of squares
square (Total square ( Explained ( un exp lained var iation )
var iation ) var iation )
ESS ˆ1x1i y i ˆ 2 x 2i y i
In three variable case R 2
TSS y 2
�2 measures the prediction ability of the model over the sample period, or
It measures how well the estimated regression line fits the data.
The value of �2 is equal to the squared sample correlation coefficient between Yˆ & Yt .
Since the sample correlation coefficient measures the linear association between two
variables,
If �2 is high, there is a close association between the values of Yt and the values predicted
by the model, Ŷt . In this case, the model is said to “fit” the data well.
If �2 is low, there is no association between the values of Yt and the values predicted by
the model, Ŷt and the model does not fit the data well.
Adjusted Coefficient of Multiple Determination ( R 2 )
One difficulty with R 2 is that it can be made large by adding more and more variables,
even if the variables added have no economic justification. Algebraically, it is the fact
that as the variables are added the sum of squared errors (RSS) goes down (it can remain
unchanged, but this is rare) and thus R 2 goes up.
Adjusted R 2 is an alternative way of measuring goodness of fit, and symbolized as R 2 .
It is computed as:
e 2 / n k n 1
R 2 1 i2 1 (1 R 2 )
y / n 1 nk
This measure does not always goes up when a variable is added because of the degree of
freedom term � − � in the denominator.
As the number of variables increases, ��� goes down, but so does n-k. The effect on
R 2 depends on the amount by which R 2 falls.
While solving one problem, this corrected measure of goodness of fit unfortunately
introduces another one. It loses its interpretation; R 2 is no longer the percent of variation
explained.
Variances and Standard Errors of OLS Estimators
145 | P a g e
� ��� ���� + ��� ���� − ��� �� ������
��� � = + . �
� ���� ���� −( ��� ��� )�
�. � � = ��� �
�
��� � = , where �12 is the sample
���� .(� –����
coefficient of correlation between �1 and �2 .
�. � � = ��� �
����
��� � = . �
���� . ���� – ( ��� ��� )�
�
��� � =
���� .(� –����
�. � � = ��� �
3.3 Properties of OLS Estimators and Gauss-Markov Theorem
3.3.2 Statistical Properties of the Parameters (Matrix Approach)
(Gauss-Markov Theorem)
In multiple regressions, the OLS estimators must satisfy the small sample property of estimators
(BLUE property). Now we proceed to examine the desired properties of the estimators in matrix
notations as follows:
1. Linearity
2. Unbiasedness.
3. Minimum variance
3.5. Hypothesis Testing in Multiple Regression Model
In multiple regression models we will undertake two tests of significance.
1. Significance of individual parameters of the model and
2. Overall significance of the model.
1. Tests of individual significance
Assuming that U i ~. N (0, 2 ) , we can use either the t-test or standard error test to test a
hypothesis about individual partial regression coefficient.
To illustrate consider the following example.
Let Y ˆ ˆ X ˆ X e ………………………………… (3.51)
0 1 1 2 2 i
i) H 0 : 1 0
H 1 : 1 0
146 | P a g e
ii) H 0 : 2 0
H1 : 2 0
The null hypothesis in (i) states that, holding X2 constant X1 has no influence on Y.
Similarly hypothesis (ii) states that holding �1 constant, �2 has no influence on the dependent
variable ��.
To test these null hypothesis we will use the following tests:
i- Standard error test: Under this testing method let’s test only for ˆ .The test for ˆ
1 2
will be done in the same way.
ˆ 2 x 22i ei2
SE ( ˆ1 ) var( ˆ1 ) ; where ˆ 2
x x
2
1i
2
2i ( x1 x 2 ) 2 n3
If SE ( ˆ1 ) 1 2 ˆ1 , we accept the null hypothesis that is, we can conclude that the
estimate i is not statistically significant.
If �� 1 < 1 2 1 , we reject the null hypothesis, and we can conclude that the
estimate i is statistically significant.
Note: The smaller the standard errors, the stronger the evidence that the estimates are statistically
reliable.
ii. The student’s t-test: We compute the � − ����� for each ̂ i
ˆ
t* i ~ t n -k , where n is number of observation and k is number of parameters. If
SE ( ˆi )
we have 3 parameters, the degree of freedom will be (� − 3). So;
ˆ 2
t* 2 ; with (� − 3) degree of freedom
SE ( ˆ 2 )
In our null hypothesis 2 0, the t* becomes:
ˆ 2
t*
SE ( ˆ 2 )
If �∗ (��������) < � (���������),
- we accept the null hypothesis ( �0 ), we can conclude that ˆ 2 is not significant.
This implies the regressor does not contribute to the explanation of the variations
in the dependent variable(�).
If �∗ �������� > � (���������),
ˆ
- we reject the null hypothesis ( � ) and we accept the alternative ( � ); 2 is
0 1
statistically significant.
- This implies the regressor does contribute to the explanation of the variations in
the dependent variable(�).
147 | P a g e
- Thus, the greater the value of t* the stronger the evidence that i is statistically
significant.
2 Test of Overall Significance of the Model
In this section we extend significance test of the individual estimated partial regression
coefficient to joint test of the relevance of all the included explanatory variables. Now consider
the following:
Y 0 1 X 1 2 X 2 ......... k X k U i
H 0 : 1 2 3 ............ k 0
H1 : at least one of the � is non-zero
This null hypothesis is a joint hypothesis that 1 , 2 ,........ k are jointly or simultaneously equal
to zero. A test of such a hypothesis is called a test of overall significance of the observed or
estimated regression line, that is, whether Y is linearly related to X 1 , X 2 ,........ X k .
The test procedure for any set of hypothesis can be based on a comparison of the sum of squared
errors from the original, the unrestricted multiple regression model to the sum of squared errors
from a regression model in which the null hypothesis is assumed to be true. When a null
hypothesis is assumed to be true, we in effect place conditions or constraints, on the values that
the parameters can take, and the sum of squared errors increases. The idea of the test is that if
these sum of squared errors are substantially different, then the assumption that the joint null
hypothesis is true has significantly reduced the ability of the model to fit the data, and the data do
not support the null hypothesis.
If the null hypothesis is true, we expect that the data are compatible with the conditions placed
on the parameters. Thus, there would be little change in the sum of squared errors when the null
hypothesis is assumed to be true.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors in the model
obtained by assuming that the null hypothesis is true and URSS be the sum of the squared error
of the original unrestricted model i.e. unrestricted residual sum of square (URSS). It is always
true that RRSS - URSS 0.
Consider Yˆ ˆ ˆ X ˆ X ......... ˆ X e .
0 1 1 2 2 k k i
148 | P a g e
URSS ei2 y 2 ˆ1yx1 ˆ 2 yx 2 ..........ˆ k yx k RSS
(TSS RSS ) / k 1
F
RSS / n k
ESS / k 1
F ………………………………………………. (3.54)
RSS / n k
If we divide the above numerator and denominator by y 2 TSS then:
ESS
/ k 1
F TSS
RSS
/nk
TSS
R2 / k 1
F …………………………………………..(3.55)
1 R2 / n k
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or R2 & 1-
R2. If the null hypothesis is not true, then the difference between RRSS and URSS (TSS & RSS)
becomes large, implying that the constraints placed on the model by the null hypothesis have
large effect on the ability of the model to fit the data, and the value of F tends to be large. Thus,
we reject the null hypothesis if the F test static becomes too large. This value is compared with
the critical value of F which leaves the probability of in the upper tail of the F-distribution
with k-1 and n-k degree of freedom.
If the computed value of F is greater than the critical value of F (k-1, n-k), then the parameters
of the model are jointly significant or the dependent variable Y is linearly related to the
independent variables included in the model.
Application of Multiple Linear Regression Models
Example 1: Consider the data given below fit a linear function:
Y 1 X 1 2 X 2 3 X 3 U
Σyi2=619; Σx1x2=240; Σx2x3=-420; Σx1x3=-330;
Σx12=270
Σx3yi=-625;
149 | P a g e
iv. Report the regression result.
v. Test significance of each partial slope coefficients at 5% significance level
[use t0.025(6)=2.447]
vi. Test the overall significance of the model.[use F0.05(3,6)=4.76]
Chapter Four
Violations of basic Classical Assumptions
4. Introduction
The classicalists set important assumptions about the distribution of Yt and the random error
term ‘ut’ in the regression models.
They assumed that the error term (ui) follow normal distribution with mean zero and
constant variation, var(u t ) 2 , and that the
Errors corresponding to different observation are uncorrelated, cov(u t , u s ) 0 (for t s)
and
In multiple regression, no perfect correlation between the independent variables.
Now, we address the following ‘what if’ questions in this chapter.
What if the error variance is not constant over all observations?
What if the different errors are correlated?
What if the explanatory variables are correlated?
4.1 Heteroscedasticity
4.1.1 The nature of Heteroscedasticty
In linear regression model
The distribution of the disturbance term remains same over all observations of X; i.e. the
variance of each u i is the same for all the values of the explanatory variable.
Symbolically,
var(u i ) u i (u i ) (u i2 ) u2 ; constant value.
2
150 | P a g e
iii) greater for low values of X.
Its pattern depend on the signs and values of the coefficients of the relationship and take
the forms:
i. ui2 K 2 ( X i2 )
ii. 2 K 2 (X i )
K
iii. etc.
ui2
Xi
4..1.4 Examples of Heteroscedastic functions
a) Consumption Function:
Suppose: C i Yi U i ;
where: C i Consumption expenditure of ith household; Yi Disposable income of ith
household
- At low levels of income, the average consumption is low, the variation is low.
- At high incomes the u' s will be high, while at low incomes the u' s will be small.
Therefore, the assumption of constant variance of u' s is does not hold when estimating
the consumption function from a cross section of family budgets.
b) Production Function:
Suppose the production function X f ( K , L) .
-
Disturbance terms stand for many factors; like entrepreneurship, technological
differences, selling and purchasing procedures, differences in organizations, etc.
other than inputs, labor (L) and capital (K) considered in the production function.
These factors show considerable variance in large firms than in small ones. This
leads to breakdown of our assumption on homogeneity of variance terms.
4.1.5 Reasons for Hetroscedasticity
There are several reasons why the variances of u i may be variable. Some of these are:
a) Error learning model: it states that as people learn their error of behavior become
smaller over time. In this case i2 is expected to decrease.
Example: as the number of hours of typing practice increases, the average number of
typing errors and as well as their variance decreases.
b) As data collection technique improves, ui2 is likely to decrease.
c) Heteroscedasticity can also arise as a result of the presence of outliers.
- Observation that is much different (either very small or very large) in the sample.
d) Heteroscedasticity can arise from violating the assumption that linear regression model is
correctly specified.
e) Heteroscedasticity can arise due to skewness in one or more regressors in the model
- uneven distribution of data of the regressors
- Examples: distribution of income and wealth in most societies is uneven, being
owned by a few at the top.
f) Heteroscedasticity can also arise because of: incorrect data transformation and incorrect
functional form (e.g., linear versus log–linear models).
151 | P a g e
4.1.6 Consequences of Hetroscedasticity for the Least Square estimators
Heteroscedasticity has the following consequences:
i) The OLS estimators will have no bias (unbiased)
ii) Variance of OLS estimators will be incorrect
iii) OLS estimators shall be inefficient:
- the OLS estimators do not have the smallest variance in the class of unbiased
estimators both in small and large samples.
- i.e., the OLS estimator is linear, unbiased and consistent, but it is inefficient. .
- Under the heteroscedastic assumption:
- As a result the true s.e( ˆ ) shall be underestimated.
- the t-test and F-test associated with it will be overestimated which might lead to the
conclusion that in a specific case at hand ˆ is statistically significant (which in fact may
not be true).
- Our inference and prediction about the population coefficients would be incorrect.
- In the presence of heteroscedasticity, the BLUE estimators are provided by the method of
weighted least squares(WLS).
4.1.7 Detecting Heteroscedasticity
- Methods of testing heteroscedasticity. These are:
i. Informal method
ii. Formal method
i. Informal method
- It is a test based on the nature of the graph.
- When there exists a systematic relation between residual squared ei2 and the mean value
of Y i.e. (Yˆ ) or with X , heteroscedasticity in the data.
i
- When there is no systematic pattern between the two variables, may suggest no
hetroscedasticity is present in the data.
152 | P a g e
H1 : 0
- If is statistically significant, it would suggest hetroscedasticity in the data.
- If is insignificant, we may accept the assumption of homoscedasticity.
- The park test is thus a two-stage test procedure;
i) run OLS regression disregarding the heteroscedasticity to obtain ûi2. and
ii) then run the regression in equation (3.15) above.
Example: Suppose that from a sample of size n=100 we estimate the relation between
compensation and productivity.
Y 1992 .342 0.2329 X i ui 3.16
SE (936 .479 ) (0.0098)
t (2.1275) (2.333) R 2 0.4375
The residual obtained from (3.16) were regressed on X i gives the following result.
ln ui2 35.817 2.8099 ln X i vi (3.17)
SE (38.319) ( 4.216)
t (0.934) ( 0.667 ) R 2 0.0595
The above result revealed that:
- the slope coefficient is statistically insignificant implying there is no statistically
significant relationship between the two variables.
- we may conclude that there is no heteroscedasticity in the error variance.
Goldfeld and Quandt have argued the park test in that the error term vi entering into the
equation ln ui2 ln X i vi may itself be heteroscedastic.
b. Glejser test:
Glejser suggested regressing the absolute value of U i on the X i variable that is thought to
be closely associated with i2 .
In his experiment, Glejser use the following functional forms:
1
ui X i vi , ui vi
Xi
ui X i vi , ui X i vi ; where vi is error term.
1
ui vi , ui X i2 vi
Xi
Goldfeld and Quandt point out problems with Glejser test is that:-
Expected value of error term vi is non-zero, i.e. it is serially correlated and heteroscedastic.
Models such as: ui X i vi and ui X i2 vi are non-linear in parameters
and therefore cannot be estimated with the usual OLS procedure.
Glejser has found that for large samples the first four preceding models give generally
satisfactory results in detecting heteroscedasticity.
153 | P a g e
c. Goldfield-Quandt test
This method is applicable if the heteroscedastic variance i2 is positively related to one of the
explanatory variables in the regression model.
For simplicity, consider the usual two variable models:
Yi i X i U i
Suppose 2� is positively related to �� as:
i2 2 X i2 3.18; where 2 is constant.
- In the above equation i2 would be larger, for larger values of X i . If that turns out to be
the case, heteroscedasticity is most likely to be present in the model.
To test this explicitly, Goldfeld and Quandt suggest the following steps:
Step 1: Order the observations according to the values of X i beginning with the lowest value.
Step 2: Omit C central observations where C is specified a priori, and divide the remaining (n-c)
( n c)
observations into two groups each of observations
2
( n c)
Step 3: Fit separate OLS regression to the first observations and the last
2
( n c)
observations, and obtain the respective residual sums of squares RSS, RSS1 and RSS2,
2
RSS1 representing RSS from the regression corresponding to the smaller X i values (the small
variance group) and RSS2 that from the larger X i values (the large variance group).
(n c) (n c 2 K )
These RSS each have K or df , where: K is the number of
2 2
parameters to be estimated, including the intercept term; and df is the degree of freedom.
RSS 2 / df
Step 4: compute
RSS1 / df
follows F distribution with numerator and denominator df each (n-c-2k)/2.
RSS 2 /(n c 2 K ) / 2
~ F (n -c) (n -c)
RSS1 /(n c 2 K ) / 2
2
K ,
2
K
154 | P a g e
Yi 3.4094 0.6968 X i ei
(8.7049) (0.0744)
R 2 0.8887
RSS1 377.17
df 11
Regression based on the last 13 observations
Yi 28.0272 0.7941 X i ei
(30.6421) (0.1319)
R 2 0.7681
RSS 2 1536.8
df 11
RSS 2 / df 1536.8 / 11
From these results we obtain: RSS1 / df 377.17 / 11
4.07
�� ���ℎ �� �� (11,11)�� �ℎ� 5% ����� �� 2.82.
Since ���� ( =) > �0.05 (11,11),
we may conclude that there is heteroscedasticity in the error variance.
Spearman’s rank correlation test, Breusch-Pagan-Godfrey test and White’s General
Heteroscedasticity test are also another test methods heteroscedasticity.
155 | P a g e
these tests indicate that there is no evidence of heteroscedasticity, we may then assume
that the transformed error term is homoscedastic.
Example: Suppose the heteroscedasticity is of the form
(u i ) 2 i2 k 2 X i2 , the transforming variable is X 2 X if
Y X i U i where var(u i ) i2 K i2 X i2 .
Y X i U i U
The transformed model is: i
Xi Xi Xi Xi Xi Xi
2) If the true error variance is proportional to one of the regressors, we can use the so-called
square transformation, that is, we divide both sides of equation by the square root of the
chosen regressor. We then estimate the regression thus transformed and subject that
regression to heteroscedasticity tests. If these tests are satisfactory, we may rely on this
regression.
Example: Suppose the heteroscedasticity is of the form : (u i2 ) i2 k 2 X i
The transforming variable is Xi
Y X i U
The transformed model is: i
Xi Xi Xi Xi
U
Xi i
Xi Xi
3) The logarithmic transformation: sometimes, instead of estimating regression, we can
regress the logarithm of the dependent variable on the regressors, which may be linear or
in log form. The reason for this is that the log transformation compresses the scales in
which the variables are measured, thereby reducing a tenfold difference between two
values to a twofold difference.
4.2 Autocorrelation
4.2.1 The nature of Autocorrelation
The assumptions of the classicalist says that the cov(u i u j ) (u i u j ) 0 which implies that
successive values of disturbance term U are temporarily independent, i.e. disturbance occurring
at one point of observation is not related to any other disturbance. This means that when
observations are made over time, the effect of disturbance occurring at one period does not carry
over into another period.
If the above assumption is not satisfied, that is, if the value of U in any particular period is
correlated with its own preceding value(s), we say there is autocorrelation of the random
variables. Hence, autocorrelation is defined as a ‘correlation’ between members of series of
observations ordered in time or space.
156 | P a g e
while correlation may also refer to the relationship between two or more different variables.
Autocorrelation is also sometimes called as serial correlation but some economists distinguish
between these two terms. According to G.Tinner, autocorrelation is the lag correlation of a given
series with itself but lagged by a number of time units. The term serial correlation is defined by
him as “lag correlation between two different series.” Thus, correlations between two time series
such as u1 , u 2 .........u10 and u 2 , u 3 .........u11 , where the former is the latter series lagged by one time
period, is autocorrelation. Whereas correlation between time series such as u1 , u 2 .........u10 and
v2 , v2 .........v11 where U and V are two different time series, is called serial correlation.
a. Cyclical fluctuations
Time series data such as GNP, PI, production, employment and unemployment exhibit business
cycle. When economic recovery starts, most of these series move upward, the value of a series at
one point in time is greater than its previous value.
Thus, there is a ‘momentum’ built in to them, and it continues until something happens (e.g.
increase in interest rate or taxes or both) to slowdown them. Therefore, regression involving
time series data, successive observations are likely to be interdependent.
b. Specification bias
This arises because of the following.
i. Exclusion of variables from the regression model
ii. Incorrect functional form of the model
iii. Neglecting lagged terms from the regression model
157 | P a g e
Let’s see one by one how the above specification biases causes autocorrelation.
i. Exclusion of variables: The error term will show a systematic change when important
variable is excluded from the model.
For example, suppose the correct demand model is given by:
y t 1 x1t 2 x 2t 31 x3t U t 3.21
where y quantity of beef demanded, x1 price of beef, x2 consumer income,
x3 price of pork and t time. Now, suppose we run the following
regression: y t 1 x1t 2 x 2t Vt ------3.22
running equation 3.22 when equation 3.21 is the ‘correct’ model or true relation,
letting Vt 3 x3t U t . The error or disturbance term V will reflect a systematic pattern, thus
creating autocorrelation.
ii. Incorrect functional form of the model:
It is also one source of the autocorrelation of error term.
Suppose the correct model in a cost-output study is as follows.
2
Marginal cost= 0 1output i 2 output i U i 3.23
However, we incorrectly fit the following model.
M arg inal cos t i 1 2 output i Vi -------------------------------3.24
The MC curve corresponding to the ‘true’ model is shown in the figure below along with the
‘incorrect’ linear cost curve.
Between points A and B the linear marginal cost curve consistently over estimate the true
marginal cost; whereas,
Outside these points it will consistently underestimate the true marginal cost. This result is
to be expected because the disturbance term Vi is, in fact, equal to (output)2+ ui, and hence
will catch the systematic effect of the (output)2 term on the marginal cost. In this case, Vi
will reflect autocorrelation because of the use of an incorrect functional form.
iii. Neglecting lagged term from the model: -
- If the dependent variable is to be affected by the lagged value of itself or the explanatory
variable and is not included in the model, the error term of the incorrect model will
reflect a systematic pattern which indicates autocorrelation in the model.
158 | P a g e
Suppose the correct model for consumption expenditure is:
Ct 1 yt 2 Ct 1 U t -----------------------------------3.25
this is known as autoregression because one of the explanatory variables is the lagged value
of the dependent variable.
The rationale for such a model is consumers do not change their consumption habits readily
for psychological, technological, or institutional reasons.
If we neglect the lagged term from the Equ/n, the resulting error term will reflect a systematic
pattern due to the influence of lagged consumption on current consumption.
4.2.5 The coefficient of autocorrelation
Autocorrelation is a kind of lag correlation between successive values of same variables. Thus,
we treat autocorrelation in the same way as correlation in general.
If the value of U in any particular period depends on its own value in the preceding period alone,
we say that U’s follow a first order autoregressive scheme AR(1) i.e.
u t f (u t 1 ) . ---------------------------------------------------------- -------------3.28
If �� depends on the values of the two previous periods, then:
�� = �(��−1 , ��−2 ) this form of autocorrelation is called a second order AR scheme and so on.
Generally when autocorrelation is present, we assume simple first form of autocorrelation:
�� = �(��−1 , ��−2 ),
u t u t 1 vt
--------------------------------------------3.30
where the coefficient of autocorrelation and V is a random variable satisfying all the basic
assumption of OLS, (v) 0, (v 2 ) v2 and ( v i v j ) 0 for i j
The above relationship states the simplest possible form of autocorrelation; if we apply OLS on
the model given in ( 3.30) we obtain:
n
u u t t 1
̂ t 2
n
--------------------------------3.31
u
t 2
2
t 1
Given that for large samples: u t2 u t21 , we observe that coefficient of autocorrelation
represents a simple correlation coefficient r.
n n n
u u t t 1 u u t t 1 u u t t 1
ˆ t 2
n
t 2
t 2
rut ut 1 (Why?)---------------------3.32
2
u t2 u t21
u
2 n
t 2
t 1
u 2 t 1
t 2
1 ˆ 1 since 1 r 1 ---------------------------------------------3.33
This proves the statement “we can treat autocorrelation in the same way as correlation in
general”. From statistics
159 | P a g e
if r= 1 we call it perfect positive correlation,
if r = -1 , perfect negative correlation and
if r = 0 , there is no correlation.
By the same analogy:
- if ̂ = 1, perfect positive autocorrelation,
- if ̂ = -1, perfect negative autocorrelation and
- if = 0, , no autocorrelation.
4.2.7 Effect of Autocorrelation on OLS Estimators.
If the error terms are correlated, the following consequences follow:
1. The OLS estimators are still unbiased and consistent.
2. They are still normally distributed in large samples.
3. But they are no longer efficient. That is, they are no longer BLUE (best linear unbiased
estimator). In most cases OLS standard errors are underestimated, which means the
estimated t-values are inflated, giving the appearance that a coefficient is more
significant than it actually may be.
4. As a result, as in the case of heteroscedasticity, the hypothesis-testing procedure becomes
suspect, since the estimated standard errors may not be reliable, even asymptotically (i.e.
in large samples). In consequence, the usual t- test and F- tests may not be valid.
i.e., If var( ˆ ) is underestimated, SE ( ˆ ) is also underestimated, this makes t-ratio large.
This large t-ratio may make ̂ statistically significant while it may not. Wrong
predictions and inferences about the characteristics of the population.
4.2.8 Detection (Testing) of Autocorrelation
There are two methods that are commonly used to detect the existence or absence of
autocorrelation in the disturbance terms. These are:
1. Graphic method
Detection of autocorrelation using graphs will be based on two ways.
a. Apply OLS to the given data whether it is auto correlated or not and obtain the error
terms. Plot et horizontally and et 1 vertically. i.e. plot the following observations
(e1 , e2 ), (e2 , e3 ), (e3 , e4 ).......( en , en 1 ) .
If it is found that most of the points fall in quadrant I and III, as shown in fig (a) below,
we say that the given data is autocorrelated and the type of autocorrelation is positive
autocorrelation.
If most of the points fall in quadrant II and IV, as shown in fig (b) below the
autocorrelation is said to be negative. But if the points are scattered equally in all the
quadrants as shown in fig (c) below, then we say there is no autocorrelation in the given
data.
2. Formal testing method
It is based on either the � − ����, � − ����, � − ���� �� �2 − ���� . If a test applies any of the
above, it is called formal testing method. The most frequently and widely used testing methods
by researchers are the followings:
A. Run test:
160 | P a g e
Run is the number of positive and negative signs of the error term arranged in sequence
according to the values of the explanatory variables, like “++++++++-------------
++++++++------------++++++”
If there are too many runs, it would mean the U' ˆ s change sign frequently, thus indicating
negative serial correlation. Similarly, if there are too few runs, they may suggest positive
autocorrelation.
Now let: n total number of observations n1 n2
n1 number of + symbols; n2 number of – symbols; and k number of runs
Under the null hypothesis that successive outcomes (here, residuals) are independent, and
assuming that n1 10 and n2 10 , the number of runs is distributed (asymptotically) normally
with:
2n1n2
Mean: ( k ) 1
n1 n2
2n1 n 2 (2n1 n 2 n1 n 2 )
Variance: k2
(n1 n 2 ) 2 (n1 n 2 1)
Decision rule:
Do not reject the null hypothesis of randomness or independence with 95% confidence.
- If (k ) 1.96 k k 1.96 k ;
Reject the null hypothesis
- If the estimated � lies outside these limits.
In a hypothetical example of n1 14, n2 18 and k 5 we obtain:
(k ) 16.75, k2 7.49395 k 2.7375
Hence the 95% confidence interval is:
16.75 1.96(2.7375) 11.3845,22.1155
since k=5, it clearly falls outside this interval. Therefore we can reject the hypothesis that the
observed sequence of residuals is random (are of independent) with 95% confidence.
B. The Durbin-Watson d test:
The most celebrated test for detecting serial correlation is one that is developed by statisticians
Durbin and Waston. It is popularly known as the Durbin-Waston d statistic, which is defined as:
t n
(e t et 1 ) 2
d t 2
t n
------------------------------------3.47
e
t 1
2
t
Note that, in the numerator of d statistic the number of observations is n 1 because one
observation is lost in taking successive differences.
It is important to note the assumptions underlying the d-statistics
161 | P a g e
1. The regression model includes an intercept term. If such term is not present, as in the case
of the regression through the origin, it is essential to rerun the regression including the
intercept term to obtain the RSS.
2. The explanatory variables, the �’�, are non-stochastic, or fixed in repeated sampling.
3. The disturbances U t are generated by the first order autoregressive scheme:
Vt u t 1 t
4. The regression model does not include lagged value of the dependent variable as one of
the explanatory variables. Thus, the test is inapplicable to models of the following type
y t 1 2 X 2t 3 X 3t ....... k X kt ry t 1 U t
Where y t 1 the one period lagged value of y is such models are known as autoregressive
models. If d-test is applied mistakenly, the value of � in such cases will often be around 2,
which is the value of � in the absence of first order autocorrelation. Durbin developed the
so-called h-statistic to test serial correlation in such autoregressive.
5. There are no missing observations in the data.
In using the Durbin –Watson test, it is, therefore, to note that it cannot be applied in violation
of any of the above five assumptions.
t n
(e t et 1 ) 2
From equation 3.47 the value of d t 2
t n
e
t 1
2
t
Thus,
d 2(1 ˆ )
From the above relation, therefore
ˆ 0, d 2
if ˆ 1, d 0
ˆ 1, d 4
Thus we obtain two important conclusions
i. Values of � ���� ������� 0 ��� 4
ii. If there is no autocorrelation ˆ 0, then d 2
Whenever, therefore, the calculated value of d turns out to be sufficiently close to 2, we accept
null hypothesis, and if it is close to zero or four, we reject the null hypothesis that there is no
autocorrelation.
162 | P a g e
However, because the exact value of d is never known, there exist ranges of values with in
which we can either accept or reject null hypothesis. We do not also have unique critical value
of � − ��������.
We have d L -lower bound and d u upper bound of the initial values of d to accept or reject the
null hypothesis.
For the two-tailed Durbin Watson test, we have set five regions to the values of d as depicted in
the figure below.
The mechanisms of the D-W test are as follows, assuming that the assumptions underlying the
tests are fulfilled.
Run the OLS regression and obtain the residuals
Obtain the computed value of � using the formula given in equation 3.47
For the given sample size and given number of explanatory variables, find out critical d L
and d U values.
Now follow the decision rules given below.
1. If d is less that d L or greater than (4 d L ) we reject the null hypothesis of no
autocorrelation in favor of the alternative which implies existence of autocorrelation.
2. If, d lies between d U and (4 d U ) , accept the null hypothesis of no autocorrelation
3. If however the value of d lies between d L and d U or between (4 d U ) and (4 d L ) , the
D-W test is inconclusive.
Example 1. Suppose for a hypothetical model: Y X U i ,if we found
d 0.1380 ; d L 1.37; d U 1.50
Based on the above values test for autocorrelation
Solution: First compute (4 d L ) and (4 d U ) and compare the computed value
of d with d L , d U , (4 d L ) and (4 d U )
(4 d L ) =4-1.37=2.63
(4 d U ) =4-1.5=2.50
since d is less than dL we reject the null hypothesis of no autocorrelation
Example 2. Consider the model Yt X t U t with the following observation on X and Y
X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11
Test for autocorrelation using Durbin -Watson method
Solution:
1. regress Y on X: i.e. Yt X t U t :
From the above table we can compute the following values.
163 | P a g e
xy 255, Y 7, (ei et 1 ) 2 60.21
x 2 280, X 8, et2 41.767
y 2 274
xy 255
ˆ 2 0.91
x 280
ˆ Y ˆX 7 0.91(8) 0.29
Y 0.29 0.91X U i
Yˆ 0.28 0.91X , R 2 0.85
(et et 1 ) 2 60.213
d 1.442
et2 41.767
Values of d L and d U on 5% level of significance, with n=15 and one explanatory variable are:
d L =1.08 and d U =1.36.
(4 d u ) 2.64
d U d 4 d U (1.364 2.64)
d * 1.442
Since d* lies between dU d 4 dU , accept H0. This implies the data is not autocorrelated.
Although � − � ���� is extremely popular, the � − ���� has one great drawback in that if it falls
in the inconclusive zone or region, one cannot conclude whether autocorrelation does or does not
exist. Several authors have proposed modifications of the � − � ����.
In many situations, however, it has been found that the upper limit d U is approximately the true
significance limit. Thus, the modified DW test is based on d U in case the estimated d value lies
in the inconclusive zone, one can use the following modified d test procedure. Given the level of
significance ; if
1. 0 versus H1 : 0 if the estimated d d U , reject H0 at level , that is there is
statistically significant positive correlation.
2. H 0 : 0 versus H 1 : 0 if the estimated d d U or (4 d u ) d U reject H0 at level
2 statistically there is significant evidence of autocorrelation, positive or negative.
4.2.9 Remedial Measures for the problems of Autocorrelation
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measures. The remedy however depends on what knowledge one has about the
nature of interdependence among the disturbances. This means the remedy depends on whether
the coefficient of autocorrelation is known or not known.
A. when is known- When the structure of autocorrelation is known, i.e is known, the
appropriate corrective procedure is to transform the original model or data so that error term of
164 | P a g e
the transformed model or data is non auto correlated. When we transform, we are wippy of the
effect of .
Suppose that our model is
Yt X t U t 3.49
and U t U t 1 Vt , | | 1 3.50
Equation (3.50) indicates the existence of autocorrelation. If is known, then transform
Equation (3.49) into one that is not autocorrelated. The procedure of transformation will be given
below.
Take the lagged form of equation (1) and multiply through by ..
y t 1
X t 1 U t 1 3.51
Subtracting (3) from (1), we have:
Yt Yt 1 ( ) ( X t X t 1 ) (U t U t 1 ) 3.52
By rearranging the terms in (3.50), we have
Vt U t U t 1 which on substituting the last term of (3.52) gives
Yt Yt 1 ( ) ( X t X t 1 ) vt 3.53
Let: Yt * Y y t 1
a (1 )
X t* X t X t 1
Equation (3.53) may be written as:
Yt * a BX t* vt (3.54)
It may be noted that in transforming Equation (3.49) into (3.54) one observation shall be lost
because of lagging and subtracting in (3.52). We can apply OLS to the transformed relation in
(3.54) to obtain ˆ and ˆ for our two parameters and .
aˆ
ˆ and it can be shown that
1
2
1
var ˆ var(aˆ )
1
Because ̂ is perfectly and linearly related to â . Again since vt satisfies all standards
assumptions, the variance of ˆ and ˆ would be given by our standard OLS formulae.
u2 X t2 * u2
var(ˆ ) n
, var( ˆ ) n
n ( X t* X ) 2 ( X *
t X t* ) 2
ti
Estimators obtained in equation 6 are efficient, only if our sample size is large so that loss of one
observation becomes negligible.
165 | P a g e
B. When is not known
When is not known, we will describe the methods through which the coefficient of
autocorrelation can be estimated.
Method I: A priori information on
Many times an investigator makes some reasonable guess about the value of autoregressive
coefficient by using his knowledge or institution about the relationship under study. Many
researchers usually assume that =1 or -1.
Under this method, the process of transformation is the same as when is known.
When =1, the transformed model becomes;
(Yt Yt 1 ) ( X t X t 1 ) Vt ; where Vt U t U t 1
Note that the constant term is suppressed in the above. B̂ is obtained by taking merely the first
differences of the variable and obtaining line that passes through the origin. Suppose that one
assumes =-1 instead of =1, i.e the case of perfect negative autocorrelation. In such a case,
the transformed model becomes:
Yt Yt 1 ( X t X t 1 ) vt
Yt Yt 1 2 ( X t X t 1 ) vt Or
2 2 2
This model is then called two period moving average regression model because actually we are
Y Yt 1 ( X X t 1 )
regressing the value of one moving average t on another t
2 2
This method of first difference in quite popular in applied research for its simplicity. But the
method rests on the assumption that either there is perfect positive or perfect negative
autocorrelation in the data.
Method II: Estimation of from d-statistic:
From equation ( 3.47 ), we obtained d 2(1 ˆ ) . Suppose we calculate certain value of d-
statistic in the case of certain data.
Given the d-value we can estimate from this.
d 2(1 ˆ )
1
ˆ 1 d
2
As already pointed out, ̂ will not be accurate if the sample size is small. The above
relationship is true only for large samples. For small samples, Theil and Nagar have suggested
the following relation:
n 2 (1 d 2 ) k 2
ˆ ………………………………………………..3.55
n2 k 2
166 | P a g e
where n=total number of observation; d= Durbin Watson statistic ; k=number of coefficients
(including intercept term). Using this value of ̂ we can perform the above transformation to
avoid autocorrelation from the model.
Method III: The Cochrane-Orcutt iterative procedure: In this method, we remove
autocorrelation gradually starting from the simplest form of a first order scheme. First we obtain
the residuals and apply OLS to them;
et et 1 vt …………………………………………………….3.56
We estimate ̂ from the above relation. With the estimated ̂ , we transform the original data
and then apply OLS to the model.
(Yt ˆYt 1 ) (1 ˆ ) ( X t ˆX t 1 ) Vt ˆu t 1 ……………......…3.57
we once again apply OLS to the newly obtained residuals
et et 1 wt
* *
……………………………………………………………3.58
We use this second estimate ̂ˆ to transform the original observations and so on we keep
proceeding until the value of the estimate of converges. It can be shown that the procedure is
convergent. When the data is transformed only by using this second stage estimate of , it is then
called two stages Cochrane-Orcutt method. However one can follow an alternative approach to
use at each step of interaction, the Durbin Watson d-statistic to residuals for autocorrelation or
till the estimates of do not differ substantially from one another.
Method IV: Durbin’s two-stage method: Assuming the first order autoregressive scheme,
Durbin suggests a two-stage procedure for resolving the serial correlation problem. The steps
under this method are:
Given Yt X t u t ----------------------------------- (3.59)
U t U t 1 vt
1. Take the lagged term of the above and multiply by
Yt 1 X t 1 u t 1 --------------------------(3.60)
2. Subtract (3.60) from (3.59)
Yt Yt 1 (1 ) ( X t X t 1 ) u t u t 1 ------(3.61)
3. Rewrite (3.61) in the following form
Yt (1 ) Yt 1 X t X t 1 vt
Yt * Yt 1 X t X t 1 vt
This equation is now treated as regression equation with three explanatory variables
X t , X t 1 and Yt 1 . This provides estimate of which is used to construct new variables
(Yt ˆYt 1 ) and ( X t ˆX t 1 ). In the second step, estimators of and are obtained from the
regression equation:
(Yt ˆYt 1 ) * ( X t ˆX t 1 ) u t* ; where * (1 )
167 | P a g e
4.3 Multicollinearity
4.3.1 The nature of Multicollinearity
Multicollinearity meant the existence of a ‘perfect’ or ‘exact’, linear relationship among some or
all explanatory variables of a regression model. For regression model involving k-variable
explanatory variables x1 , x2 ,......, xk , an exact linear relationship is said to exist if the following
condition is satisfied.
1 x1 2 x2 ....... k xk vi 0 (1)
where 1 , 2 ,..... k are constants such that not all of them are simultaneously zero.
Note that:- multicollinearity refers only to linear relationships among the explanatory variables.
It does not rule out non-linear relationships among the x-variables.
For example: Y 1 xi 1 xi2 1 xi3 vi (3.31)
Where: Y-Total cost and X-output.
The variables xi2 and xi3 are obviously functionally related to xi but the relationship is non-linear.
Strictly, therefore, models such (3.31) do not violate the assumption of no multicollineaity.
However, in concrete applications, the conventionally measured correlation coefficient will show
xi , xi2 and xi3 to be highly correlated, which as we shall show, will make it difficult to estimate
the parameters with greater precision (i.e. with smaller standard errors).
4.3.2 Reasons for Multicollinearity
1. The data collection method employed: Example: If we regress on small sample values of
the population; there may be multicollinearity but if we take all the possible values, it
may not show multicollinearity.
2. Constraint over the model or in the population being sampled.
For example: in the regression of electricity consumption on income (x1) and house size
(x2), there is a physical constraint in the population in that families with higher income
generally have larger homes than with lower incomes.
3. Overdetermined model: This happens when the model has more explanatory variables
than the number of observations. This could happen in medical research where there may
be a small number of patients about whom information is collected on a large number of
variables.
4.3.3 Consequences of Multicollinearity
Why does the classical linear regression model put the assumption of no multicollinearity among
the X’s? It is because of the following consequences of multicollinearity on OLS estimators.
1. If multicollinearity is perfect, the regression coefficients are indeterminate and their
standard errors are infinite.
Proof: - Consider a multiple regression model as follows:
y ˆ x ˆ x e
i 1 1i 2 2i i
Recall the formulas of ˆ1 and ˆ 2 from our discussion of multiple regressions
168 | P a g e
x 1 y x 22 x 2 y x 1 x 2
ˆ 1
x 12i x 22 ( x 1 x 2 ) 2
x 2 y x 12 x 1 y x 1 x 2
ˆ 1
x 12 x 22 ( x 1 x 2 ) 2
Assume x2 x1 ------------------------3.32
Where is non-zero constants. Substitute 3.32in the above ˆ1 and ˆ 2 formula:
x1 y ( x1 ) 2 x1 y x1 x1
ˆ 1
x 12i ( x 1 ) 2 ( x 1 x 1 ) 2
2 2
x1 y 2 x1 x1 y x1 0
indeterminate.
2 ( x 12 ) 2
2 2
( x1 ) 2
0
Applying the same procedure, we obtain similar result (indeterminate value) for ˆ 2 .
Likewise, from our discussion of multiple regression model, variance of ˆ is given by : 1
2 2
x
var(ˆ1 ) 2
x x12 (x1 x 2 ) 2
2
1
x1i x (x1 x 2 ) 2
2
2
169 | P a g e
x1 y2 x12 vi2 (y i x1i y i vi )x12i 0
determinate.
x 22i (2 x 22i vi2 ) (x12 ) 2 0
This proves that if we have less than perfect multicollinearity the OLS coefficients are
determinate.
The implication of indetermination of regression coefficients in the case of perfect
multicolinearity is that it is not possible to observe the separate influence of x1 and x2 . But
such extreme case is not very frequent in practical applications. Most data exhibit less than
perfect multicollinearity.
3. If multicollineaity is less than perfect (i.e. near or high multicollinearity) , OLS estimators
retain the property of BLUE
- Hence, if the basic assumptions which are important to prove the BLUE property are not
violated whether multicollinearity exist or not ,the OLS estimators are BLUE .
4. Although BLUE, the OLS estimators have large variances and covariances.
2 x 22
var( ˆ 2 ) 2 2
x1 x 2 (x1 x 2 ) 2
1
Multiply the numerator and the denominator by 2
x2
1
2 x 22 . 2
x 2 2
var( ˆ 2 )
1
x x2
1
2
2 (x1 x 2 ) 2 . 2 x12
(x1 x 2 ) 2
x 2 x12
2 2
(x1 x 2 ) 2 x12 (1 r122 )
x12 1 2
2
x1 x1
As r12 tends to 1 or as collinearity increases, the variance of the estimator increase and in
the limit when r 1 variance of ˆ becomes infinite.
12 1
r12 2
Similarly cov( 1 , 2 ) . (why?)
(1 r122 ) x12 x12
As r12 increases to ward one, the covariance of the two estimators increase in absolute
value. The speed with which variances and covariance increase can be seen with the
variance-inflating factor (VIF) which is defined as:
1
VIF
1 r122
170 | P a g e
VIF shows how the variance of an estimator is inflated by the presence of
multicollinearity. As r122 approaches 1, the VIF approaches infinity. That is, as the extent
of collinearity increase, the variance of an estimator increases and in the limit the variance
becomes infinite. As can be seen, if there is no multicollinearity between x1 and x 2 ,
VIF will be 1.
Using this definition we can express var(1 ) and var( ˆ 2 ) in terms of VIF
ˆ 2 ˆ 2
var( 1 ) 2 VIF and var( 2 ) 2 VIF
x1 x 2
Which shows that variances of ˆ1 and ˆ 2 are directly proportional to the VIF.
5. Because of the large variance of the estimators, which means large standard errors, the
confidence interval tend to be much wider, leading to the acceptance of “zero null
hypothesis” (i.e. the true population coefficient is zero) more readily.
6. Because of large standard error of the estimators, the computed t-ratio will be very small
leading one or more of the coefficients tend to be statistically insignificant when tested
individually.
7. Although the t-ratio of one or more of the coefficients is very small (which makes the
coefficients statistically insignificant individually), R2, the overall measure of goodness of fit,
can be very high.
Example: if y 1 x1 2 x 2 .... k x k vi
In the cases of high collinearity, it is possible to find that one or more of the partial slope
coefficients are individually statistically insignificant on the basis of t-test. But the R2 in
such situations may be so high, in such a case on the basis of � ���� one can convincingly
reject the hypothesis that 1 2 k 0 Indeed, this is one of the signals of
multicollinearity- insignificant � − ������ but a high overall �2 (i.e a significant
� − �����).
8. The OLS estimators and their standard errors can be sensitive to small change in the data.
4.3.4 Detection of Multicollinearity
A recognizable set of symptoms for the existence of multicollinearity are:
a. High coefficient of determination ( R2)
b. High correlation coefficients among the explanatory variables (rxi x j ' s)
c. Large standard errors and smaller t-ratio of the regression parameters
Note: None of the symptoms by itself are a satisfactory indicator of multicollinearity.
Because:
i) Large standard errors may arise for various reasons and not only because of the presence
so linear relationships among explanatory variables.
ii) A high rxi x j is only sufficient but not a necessary condition (adequate condition) for the
existence of multicollinearity because multicollinearity can also exist even if the
correlation coefficient is low.
However, the combination of all these criteria should help the detection of multicollinearity.
171 | P a g e
4.3.4.1 Test Based on Auxiliary Regressions:
Since multicollinearity arises because one or more of the regressors are exact or approximately
linear combinations of the other regressors, one may of finding out which X variable is related to
other X variables to regress each Xi on the remaining X variables and compute the corresponding
R2, which we designate as Ri2 , each one of these regressions is called auxiliary to the main
regression of Y on the X’s. Then, following the relationship between F and R2 established in
chapter three under over all significance, the variable,
R x21 , x2 , x3 ,... xk / k 2
Ri ~ F( k 2, n k 1)
1 R x21 , x2 , x3 ,... xk /( n k 1)
where: - n is number of observation
- k is number of parameters including the intercept
If the computed F exceeds the critical F at the chosen level of significance, it is taken to mean
that the particular Xi collinear with other X’s; if it does not exceed the critical F, we say that it is
not collinear with other X’s in which case we may retain the variable in the model.
If Fi is statistically significant, we will have to decide whether the particular Xi should be
dropped from the model.
Note that according to Klieri’s rule of thumb, which suggest that multicollinearity may be a
troublesome problem only if R2 obtained from an auxiliary regression is greater than the overall
R2, that is obtained from the regression of Y on all regressors.
4.3.4.2 The Farrar-Glauber test - They use three statistics for testing mutlicollinearity there are
chi-square, F-ratio and t-ratio. This test may be outlined in three steps.
A. Computation of 2 to test orthogonalitly: two variables are called orthogonal if rxi x j 0 i.e.
if there is no any colinearity between them. In our discussion of in multiple regression models,
we have seen the matrix representation of a three explanatory variable model which is given by
x12 x1 x 2 x1 x3
x' x x 2 x1 x 22 x 2 x 3
x3 x1 x3 x 2 x32
Divide each elements of x’x by xi x j and compute the determinant
x12 x1 x 2 x1 x3
( x 2 ) 2 x1 x 2 x12 x12
1
x 2 x1 x 22 x 2 x 3
x 2 x 2 2
x 22 x32
1 2 (x 22 )
x3 x1 x 3 x 2 x32
2 2
x1 x1 x12 x32 (x32 ) 2
172 | P a g e
1 r12 r13
r12 1 r23
r13 r23 1
The value of the determinant is equal to zero in the case of perfect multicollinearity. (since
r12 , r23 1)
On the other hand, it the case of orthogonality of the x’s, therefore rij 0 and the value of the
determinant is unity. It follows, therefore, that if the value of this determinant lies between zero
and unity, there exist some degree of multicollinearity. For detecting the degree of
multicollinearity over the whole set of explanatory variables, Glauber and Farrar suggests 2 to
test in the following way.
H 0 : x' s are orthogonal (i.e. rxi x j 0)
H1 : x' s are not orthogonal (i.e. rxi x j 1)
Glauber and Farrar have found that the quantity
2 = -[ n –1 – 1/6(2k+5)] . log e {value of the standardized determinant}; has a 2 distribution
with 1/2k(k-1) df. If the computed 2 is greater than the critical value of 2, reject H0 in favour of
multicollinearity. But if it is less, then accept H0.
173 | P a g e
t*<t (tabulated), H0 is accepted, we accept Xi and Xj are not the cause of
multicollinearity since ( rxi x j is not significant)
4.3.4.3 Test of multicollinearity using Eigen values and condition index:
Using Eigen values we can drive a number called condition number K as follows:
max imum eigen value
k
Minimum eigen value
In addition using these value we can drive the condition index (CI) defined as
Max.eigen value
CI k
min . eigen value
Decision rule: if K is between 100 and 1000 there is moderate to strong multicollinearity and if
it exceeds 1000 there is sever multicollinearity. Alternatively if CI( k ) is between 10 and 30,
there is moderate to strong multicollinearity and if it exceeds 30 there is sever multicollinearity.
Example . If k=123,864 and CI=352 – This suggest existence of multicollinearity
4.3.4.4 Test of multicollinearity using Tolerance and variance inflation factor
2 1 2
var(ˆ1 ) 2 VIF
x1 1 Ri2 xi2
where Ri2 is the R 2 in the auxiliary regression of Xj on the remaining (k-2) regressors and VIF is
variance inflation factor.
Some authors therefore use the VIF as an indicator of multicollinearity: The larger is the value
of VIFj, the more “troublesome” or collinear is the variable Xj. However, how high should VIF
be before a regressor becomes troublesome? As a rule of thumb, if VIF of a variable exceeds 10
(this will happens if Ri2 exceeds (0.9) the variable is said to be highly collinear.
Other authors use the measure of tolerance to detect multicollinearity. It is defined as
1
TOLi (1 R 2j )
VIF
Clearly, TOLj =1 if Xj is not correlated with the other regressors, where as it is zero if it is
perfectly related to other regressors.
VIF (or tolerance) as a measure of collinearity, is not free of criticism. As we have seen earlier
2
var(ˆ ), 2 (VIF ) ; depends on three factors 2 , xi2 and VIF . A high VIF can be
xi
counter balanced by low 2 or high xi2 . To put differently, a high VIF is neither necessary
nor sufficient to get high variances and high standard errors. Therefore, high multicollinearity, as
measured by a high VIF may not necessary cause high standard errors.
174 | P a g e
The following corrective procedures have been suggested if the problem of multicollinearity is
found to be serious.
1. Increase the size of the sample:
- Multicollinearity may be avoided or reduced if the size of the sample is increased.
- If the variables are collinear in the population, the procedure of increasing the size of the
sample will not help to reduce multicollinearity.
2. Introduce additional equation in the model:
- The addition of new equation transforms our single equation (original) model to
simultaneous equation model.
- The reduced form method (which is usually applied for estimating simultaneous
equation models) can then be applied to avoid multicollinearity.
3. Use extraneous information: –
- Extraneous information is the information obtained from any other source outside the
sample which is being used for the estimation.
- Extraneous information may be available from economic theory or from some empirical
studies already conducted.
- Three methods, through which extraneous information is utilized in order to deal with
the problem of multicollinearity.
4. Methods of transforming variables: This method is used when the relationship between
certain parameters is known as a priori
175 | P a g e
PART TWO
ECONOMETRICS TWO
CHAPTER ONE
1. Regression Analysis with Qualitative Information: Binary (or Dummy Variables)
1.1. Describing Qualitative Information
Qualitative factors often come in the form of binary information: a person is female or
male; a person does or does not own a personal computer; a firm offers a certain kind of
employee pension plan or it does not; a state administers capital punishment or it does not. In
all of these examples, the relevant information can be captured by defining a binary variable
or a zero- one variable. In econometrics, binary variables are most commonly called dummy
variables, although this name is not especially descriptive.
In defining a dummy variable, we must decide which event is assigned the value one and
which is assigned the value zero. For example, in a study of individual wage determination,
we might define female to be a binary variable taking on the value one for females and the
value zero for males.
The name in this case indicates the event with the value one. The same information is captured
by defining male to be one if the person is male and zero if the person is female. Either of
these is better than using gender because this name does not make it clear when the
dummy variable is one: does gender =1 correspond to male or female? What we call our
variables is unimportant for getting regression results, but it always helps to choose names
that clarify equations and expositions. Variables that assume such 0 and 1 values are called
dummy variables. Such variables are thus essentially a device to classify data into mutually
exclusive categories such as male or female. Dummy Variables usually indicates the
dichotomized “presence” or “absence”, “yes” or “no”, etc. Variables indicates a “quality” or
an attribute, such as “male” or “female”, “black” or “white”, “urban” or non - urban” ,
“before” or “after”, “North” or “south”, “East” or “West” , marital status, job category,
region, season, etc. We quantify such variables by artificially assigning values to them (for
example, assigning 0 and 1 to sex, where 0 indicates male and 1 indicates female), and use
them in the regression equation together with the other independent variables. Such variables
are called dummy variables. Alternative names are indicator variables, binary variables,
categorical variables, and dichotomous variables.
Dummy variables can be incorporated in regression models just as easily as quantitative
variables. As a matter of fact, a regression model may contain regressors that are all
exclusively dummy, or qualitative, in nature. Such models are called Analysis of Variance
(ANOVA) models. Regression models in most economic research involve quantitative
176 | P a g e
explanatory variables in addition to dummy variables. Such models are known as Analysis
of Covariance (ANCOVA) models.
177 | P a g e
���� ������ �� ������ ������� ���������: � � � = 1, ����
= 0 + 0 + 1 ����……………………. 1.2
���� ������ �� ���� ������� ���������: �( � � = 0, ����) = 0 + 1����………………………………. . 1.3
If the assumption of common slopes is valid, a test of the hypothesis that the two regressions
(1.2) and (1.3) have the same intercept (i.e., there is no sex discrimination) can be made
easily by running the regression (1.2) and noting the statistical significance of the estimated
0 on th e basis of the traditional t test. If the t test shows that it is statistically
significant, we reject the null hypothesis that the male and female college professors’ levels
of mean annual salary are the same.
Before proceeding further, note the following features of the dummy variable regression
model considered previously.
1. To distinguish the two categories, male and female, we have introduced only one dummy
variable D. For example, if �� = 1 always denotes a male, when �� = 0 we know
that it is a female since there are only two possible outcomes. Hence, one dummy
variable suffices to distinguish two categories. The general rule is this: If a qualitative
variable has ‘m’ categories, introduce only ‘m - 1’ dummy variables. In our
example, sex has two categories, and hence we introduced only a single dummy
variable. If this rule is not followed, we shall fall into what might be called the
dummy variable trap, that is, the situation of perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D=1 for female and
D=0 for male.
3. The group, category, or classification that is assigned the value of 0 is often referred
to as the base, benchmark, control, comparison, reference, or omitted category . It is
the base in the sense that comparisons are made with that category.
4. The coefficient attached to the dummy variable D can be called the differential
intercept coefficient because it tells by how much the value of the intercept term of the
category that receives the value of 1 differs from the intercept coefficient of the base
category.
Example 1.1 (Effects of Computer Ownership on College GPA)
In order to determine the effects of computer ownership on college grade point average, we
estimate the model : ������ = 0 + 0 � + 1 ℎ���� + 2 ��� + �, where the dummy
variable D equals one if a student ow ns a personal computer and zero otherwise. There are
various reasons PC ownership might have an effect on colGPA. A student’s work might be of
higher quality if it is done on a computer, and time can be saved by not having to wait at a
computer lab. Of course, a student might be more inclined to play computer games or surf the
Internet if he or she owns a PC, so it is not obvious that 0 is positive. The variables hsGPA
(high school GPA) and ACT (achievement test score) are used as controls: it could be that
stronger students, as measured by high school GPA and ACT scores, are more likely to own
computers. We control for these factors because we would like to know the average effect on
colGPA if a student is picked at random and given a personal computer.
178 | P a g e
If the estimated model is
������ = 1.26 + 0.157� + 0.447ℎ���� + 0.0087���, �2 = 0.219
�� . 33 (. 057) (. 049) (. 0105)
This equation implies that a student who owns a PC has a predicted GPA about .16 point higher
than a comparable student without a PC (remember, both colGPA and hsGPA are on a four point
scale). The effect is also very statistically significant, with��� = 0.157 0.057≈2.75.
Figure 1.2. Expenditure on health care in relation to income for three levels of education
179 | P a g e
1.2.3. Regression on One Quantitative Variable and Two Qualitative Variables
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. Let us revert to the college professors’ salary regression (1.1), but now assume that in
addition to years of teaching experience and sex, the skin color of the teacher is also an
important determinant of salary. For simplicity, assume that color has two categories: black
and white. We can now write (1.1) as:
�� = 1 + 2 �2� + 3 �3� + �� + �� …………………………………………………1.5
Where Yi = annual salary
Xi = years of teaching experience
1, �� ������
�2 =
0, ��ℎ������
1, �� �ℎ���
�3 =
0, ��ℎ������
Notice that each of the two qualitative variables, sex and color, has two categories and hence
needs one dummy variable for each. Note also that the omitted, or base, category now is “black
female professor.”
Assuming E(u)= 0, we can obtain the following regression from (1.5)
���� ������ ��� ����� ������ ���������: � �� �2 = 0, �3 = 0, �� = 1 + ��
���� ������ ��� ����� ���� ���������: � �� �2 = 1, �3 = 0, �� = (1 + 2 ) + ��
���� ������ ��� �ℎ��� ������ ��������� : � �� �2 = 0, �3 = 1, �� = (1 + 3 ) + ��
���� ������ ��� �ℎ��� ���� ���������: � �� �2 = 1, �3 = 1, �� = (1 + 2 + 3 ) + ��
Once again, it is assumed that the preceding regressions differ only in the intercept
coefficient but not in the slope coefficient . An OLS estimation of (1.5) will enable us to test a
variety of hypotheses. Thus, if 3 is statistically significant, it will mean that color does affect
a professor’s salary. Similarly, if 2 is statistically significant, it will mean that sex also affects
a professor’s salary. If both these differential intercepts are statistically significant, it would
mean sex as well as color is an important determinant of professors’ salaries.
From the preceding discussion it follows that we can extend our model to include more than
one quantitative variable and more than two qualitative variables. The only precaution to be
taken is that the number of dummies for each qualitative variable should be one les s than
the number of categories of that variable.
Example 1.2
(Log Hourly Wage)
Let us estimate a model that allows for wage differences among four groups: married
men, married women, single men, and single women. To do this, we must select a base
group; we choose single men. Then, we must define dummy variables for each of the
remain ing groups. Call these D1 (married male), D2 (married female), and D3 (single female).
Putting these three variables into (1.1) and if the estimated equation gives the following result,
log ���� = 0.321 + . 213�1 − . 198�2 − . 110�2 + . 079���� + . 027���
180 | P a g e
�. � . 100 . 055 . 058 . 056 . 007 (. 005)
To interpret the coefficients on the dummy variables, we must remember that the base group is
single males. Thus, the estimates on the three dummy variables measure the proportionate
difference in wage relative to single males. For example, married men are estimated to
earn about 21.3% more than single men, holding levels of education and experience fixed. A
married woman, on the other hand, earns a predicted 19.8% less than a single man with the
same levels of the other variables.
1.2.4. Interactions Among Dummy Variables
Consider the following model:
�� = 1 + 2 �2� + 3 �3� + �� + �� ……………………………………………………. . 1.6
1 �� ������
Where Yi= annual expenditure �2 =
0, ��ℎ������
1, �� ������� ��������
Xi = annual income �3 =
0, ��ℎ������
Implicit in this model is the assumption that the differential effect of the sex dummy D2 is
constant across the two levels of education and the differential effect of the education
dummy D3 is also constant across the two sexes. That is, if, say, the mean expenditure on
clothing is higher for females than males this is so whether they are college graduates or
not. Likewise, if, say, college graduates on the average spend more on clothing than non-
college graduates, this is so whether they are female or males.
In many applications such an assumption may be untenable. A female college graduate may
spend more on clothing than a male college graduate. In other words, there may be interaction
between the two qualitative variables D2 and D3 . Therefore their effect on mean Y may
not be simply additive as in (1.6) but multiplicative as well, as in the following model:
�� = 1 + 2 �2� + 3 �3� + 4 �2� �3� + �� + �� …………………………………………. . 1.7
From (1.7 ) we obtain:
� �� �2 = 1, �3 = 1, �� = 1 + 2 + 3 + 4 + �� …………………. . 1.8
which is the mean clothing expenditure of graduate females. Notice that
2 = differential effect of being a female, 3 =differential effect of being a college graduate
4 =differential effect of being a female graduate which shows that the mean clothing
expenditure of graduate females is different (by 4 from the mean clothing expenditure of
females or college graduates. If 2 , 3 ��� 4 and are all positive, the average clothing
expenditure of females is higher (than the base category, which here is male no graduate), but
it is much more so if the females also happen to be graduates. Similarly, the average
expenditure on clothing by a college graduate tends to be higher than the base category but
much more so if the graduate happens to be a female. This shows how the interaction dummy
modifies the effect of the two attributes considered individually.
Whether the coefficient of the interaction dummy is statistically significant can be tested by the
usual t- test. If it turns out to be significant, the simultaneous presence of the two attributes will
181 | P a g e
attenuate or reinforce the individual effects of these attributes. Needless to say, omitting a
significant interaction term incorrectly will lead to a specification bias.
183 | P a g e
other factors fixed. This is the usual linear regression model. This makes linear probability
models easy to estimate and interpret, but it also highlights some shortcomings of the LPM.
The drawbacks of this model are:
1. The right hand side of equation (1.9) is a combination of discrete and continuous
variables while the left hand side variable is discrete.
2. Usually we arbitrarily (or for convenience) use 0 and 1 for Y. If we use other values for
Y, say 3 and 4, β will also change even if the vector of factors X remains unchanged.
3. U assumes only two values: �� � = 1 �ℎ�� � = 1 − �, �� � = 0 �ℎ� � =− �
Consequently, u is not normally distributed but rather has a discrete (binary)
probability distribution defined.
4. It is easy to see that, if we plug in certain combinations of values for the
independent variables into (1.9), we can get predictions either less than zero or greater
than one. Since these are predicted probabilities, and probabilities must be between zero
and one, this can be a little embarrassing.
5. Due to problem 3, the variance u is heteroscedastic.
1.3.2. The Logit and Probit Models
The linear probability model is simple to estimate and use, but it has some drawbacks
that we discussed in Section 1.3.1. The two most important disadvantages are that the fitted
probabilities can be less than zero or greater than one and the partial effect of any
explanatory variable (appearing in level form) is constant. These limitations of the LPM can
be overcome by using more sophisticated binary response models.
In a binary response model, interest lies primarily in the response probability
� �=1 �)=�(�=1 �1 ,�2 ….�� ……………………………………. . 1.11
where we use X to denote the full set of explanatory variables. For example, when Y is
an employment indicator, X might contain various individual characteristics such as
education, age, marital status, and other factors that affect employment status, including a
binary indicator variable for participation in a recent job training program.
Specifying Logit and Probit Models
In the LPM, we assume that the response probability is linear in a set of parameters, � . To
avoid the LPM limitations, consider a class of binary response models of the form
�( � = 1 �) = �(0 + 1�1 + 2 �2 + …���) = �(�)………………………………. 1.10
where G is a function taking on values strictly between zero and one: 0 <G(z)< 1 , for all
real numbers z . This ensures that the estimated response probabilities are strictly between zero
and one.
As in earlier Econometrics I, we write � =0 + 1�1 + 2 �2 + …���
Various nonlinear functions have been suggested for the function G in order to make sure
that the probabilities are between zero and one. The two we will cover here are used in the vast
majority of applications (along with the LPM). In the logit model, G is the logistic function:
��
� � = exp (�) 1 + exp � = � = …………………………………………1.13
1 + ��
184 | P a g e
which is between zero and one for all real numbers z. This is the cumulative distribution
function (cdf) for a standard logistic random variable.
Here the response probability �( �=1 �) is evaluated as
��
� = �( � = 1 � ) =
1 + ��
Similarly, the non- response probability is evaluated as:
�� 1
1−�=� �=0 � =1− =
1 + ��
1+��
Note that the response and non- response probabilities both lie in the interval [0 , 1] , and
hence, are interpretable.
For the logit model, the ratio:
��
� �=1 � =
�
= 1 + �� = �� = �0+1�1+2�2+…+���
1−� � �=0 � = 1
1 + ��
is the ratio of the odds of Y=1 against Y=o. The natural logarithm of the odds (log - odds) is:
�
ln = 0 + 1 �1 + 2 �2 + …� ��
1−�
Thus, the log- odds is a linear function of the explanatory variables.
In the probit model , G is the standard normal cumulative distribution function (cdf),
which is expressed as an integral:
∞
� � = � = � �� …………………………………. . �. ��
−∞
Where � �� ��� �������� ������ �������
−(�−)2
1 22
� = �
22
�0
1 − �− 2/22
� � = �
−∞ 22
The standard normal cdf has a shape very similar to that of the logistic cdf.
185 | P a g e
The estimating model that emerges from the normal CDF is popularly known as the probit
model, although sometimes it is also known as the normit model.
Note that both the probit and the logit models are estimated by Maximum Likelihood Estimation.
1.3.3. Interpreting the Probit and Logit Model Estimates
Given modern computers, from a practical perspective, the most difficult aspect of logit or
probit models is presenting and interpreting the results. The coefficient estimates, their
standard errors, and the value of the log- likelihood function are reported by all software
packages that do logit and probit, and these should be reported in any application.
The coefficients give the signs of the partial effects of each Xj on the response probability,
and the statistical significance of Xj is determined by whether we can reject H0: j = 0 at a
sufficiently small significance level. However, the magnitude of the estimated parameters
(dZ/dX) has no particular interpretation. We care about the magnitude of dProb(Y)/dX. From
the computer output for a probit or logit estimation, you can interpret the statistical significance
and sign of each coefficient directly. Assessing magnitude is trickier.
Goodness of Fit Statistics
The conventional measure of goodness of fit, R2, is not particularly meaningful in binary
regressand models. Measures similar to R2, called pseudo R2, are available, and there are a
variety of them.
1. Measures based on likelihood ratios
Let LUR be the maximum likelihood function when maximized with respect to all the
parameters and LR be the maximum likelihood function when maximized with restrictions
j= 0.
�� 2
�2 = 1 − ( )�
���
2. Cragg and Uhler (1970) suggested a pseudo R2 that lies between 0 and 1.
2/� 2/�
� −�
�2 = �� 2/� � 2/�
(1 − �� )���
3. McFadden (1974) defined R as: 2
186 | P a g e
������
�2 = 1 −
log ��
4. Another goodness- of - fit measure that is usually reported is the so - called percent
correctly predicted , which is computed as follows. For each i, we compute the
estimated probability that Y i takes on the value one,��. If �� ≥ 0.5 the prediction of Yi
is unity, and if �� < 0.5, Yi is is predicted to be zero. The percentage of times the
predicted �� matches the actual Yi (which we know to be zero or one) is the percent
correctly predicted.
��. �� ������� �����������
����� �2 =
����� ��. �� ������������
Numerical Example:
This session shows an example of probit and logit regression analysis with Stata. The data
in this example were gathered on undergraduates applying to graduate school and includes
undergraduate GPAs, the reputation of the school of the undergraduate (a topnotch indicator),
the students' GRE score, and whether or not the student was admitted to graduate school.
Using this dataset, we can predict admission to graduate school using undergraduate GPA,
GRE scores, and the reputation of the school of the undergraduate. Our outcome variable is
binary, and we will use either a probit or a logit model. Thus, our model will calculate a
predicted probability of admission based on our predictors.
Iteration History - This is a listing of the log likelihoods at each iteration for the
probit/logit model. Remember that probit/logit regression uses maximum likelihood estimation,
which is an iterative procedure. The first iteration (called Iteration 0) is the log likelihood
of the "null" or "empty" model; that is, a model with no predictors. At the next iteration
(called Iteration 1), the specified predictors are included in the model. In this example, the
predictors are GRE, topnotch and GPA. At each iteration, the log likelihood increases
because the goal is to maximize the log likelihood. When the difference between successive
iterations is very small, the model is said to have "converged" and the iterating stops.
Log likelihood - This is the log likelihood of the fitted model. It is used in the Likelihood
Ratio Chi - Square test of whether all predictors' regression coefficients in the model are
simultaneously zero.
187 | P a g e
LR chi2(3) - This is the Likelihood Ratio (LR) Chi - Square test that at least one of the
predictors' regression coefficient is not equal to zero. The number in the parentheses indicates
the degrees of freedom of the Chi - Square distribution used to test the LR Chi - Square
statistic and is defined by the number of predictors in the model (3).
Prob > chi2 - This is the probability of getting a LR test statistic as extreme as, or more so,
than the observed statistic under the null hypothesis; the null hypothesis is that all of the
regression coefficients are simultaneously equal to zero. In other words, this is the
probability of obtaining this chi - square statistic or one more extreme if there is in fact no
effect of the predictor variables. This p- value is compared to a specified alpha level, our
willingness to accept a type I error, which is typically set at 0.05 or 0.01. The small p- value
from the LR test, 0.0001, would lead us to conclude that at least one of the regression
coefficients in the model is not equal to zero. The parameter of the chi - square distribution
used to test the null hypothesis is defi ned by the degrees of freedom in the prior line, chi2(3).
Pseudo R2 - This is McFadden's pseudo R- squared. Because this statistic does not mean
what Rsquare means in OLS regression (the proportion of variance of the response variable
explaine d by the predictors), it should be interpreted with great caution. The interpretation
of the coefficients can be awkward. For example, for a one unit increase in GPA, the log
odds of being admitted to graduate school (vs. not being admitted) increases by .667. For this
reason, many researchers prefer to exponentiate the coefficients and interpret them as
odds- ratios. Look at the following result.
Now we can say that for a one unit increase in GPA, the odds of being admitted to graduate
school (vs. not being admitted) increased by a factor of 1.94. Since GRE scores increase
only in units of 10, we can take the odds ratio and raise it to the 10th power, e.g. 1.00248 ^ 10
= 1.0250786, and say for a 10 unit increase in GRE score, the odds of admission to graduate
school increased by a factor of 1.025.
188 | P a g e
CHAPTER TWO
2. Introduction to Basic Regression Analysis with Time Series Data
2.1.The Nature of Time Series Data
Time series data are data collected for a single entity (person, firm, and country)
collected (observed) at multiple time periods. Examples :
Aggregate consumption and GDP for a country (for example, 20 years of
quarterly observations= 80 observations)
Birr/$, pound/$ and Euro/$ exchange rates (daily data for 2 years= 730 observations)
Inflation rate for Ethiopia (quarterly data for 30 years = 120 observations )
Gross domestic investment for Ethiopia (annual data for 40 years= 40 observations )
An obvious characteristic of time series data which distinguishes it from cross- sectional
data is that a time series data set comes with a temporal ordering. For example, in the
above data set mentioned in the examples, we must know that the data for 1970 immediately
precede the data for 1971. For analyzing time series data in the social sciences, we must
recognize that the past can affect the future, but not vice versa. To emphasize the proper
ordering of time series data, Table 2.1 gives a partial listing of the data on Ethiopia gross
capital formation (GCF) and gross domestic savings (GDS) both in million ETB since 1969 -
2010.
Another difference between cross- sectional and time series data is more subtle. In Chapters 2
and 3 of Econometrics I, we studied statistical properties of the OLS estimators based on the
notion that samples were randomly drawn from the appropriate population. Understanding
189 | P a g e
why cross sectional data should be viewed as random outcomes is fairly straightforward: a
different sample drawn from the population will generally yield different values of the
independent and dependent variables (such as education, experience, wage, and so on).
Therefore, the OLS estimates computed from different random samples will generally differ,
and this is why we consider the OLS estimators to be random variables.
How should we think about randomness in time series data? Certainly, economic time series
satisfy the intuitive requirements for bein g outcomes of random variables. For example,
today we do not know what the stock price will be at its close at the end of the next trading
day. We do not know what the annual growth in output will be in Ethiopia during the coming
year. Since the outcomes of these variables are not foreknown, they should clearly be viewed as
random variables. Formally, a sequence of random variables indexed by time is called a
stochastic process or a time series process. (“Stochastic” is a synonym for random.) When we
collect a time series data set, we obtain one possible outcome, or realization, of the
stochastic process. We can only see a single realization, because we cannot go back in time and
start the process over again. (This is analogous to cross- sectional analysis where we can
collect only one random sample.) However, if certain conditions in history had been
different, we would generally obtain a different realization for the stochastic process, and
this is why we think of time series data as the outcome of random variables.
The set of all possible realizations of a time series process plays the role of the population in
cross sectional analysis.
2.2. Stationary and non-stationary Stochastic Processes
2.2.1. Stochastic Processes
A random or stochastic process is a collection of random variables ordered in time. If we let
Y denote a random variable, and if it is continuous, we denote it as Y(t), but if it is
discrete, we denoted it as Yt . An example of the form er is an electrocardiogram, and an
example of the latter is GDP, PDI, GDI, GDS, etc. Since most economic data are collected at
discrete points in time, for our purpose we will use the notation Yt rather than Y(t). If we
let Y represent GDS, for our data we have Y1,Y 2,Y 3,...,Y39,Y40,Y41, where the subscript 1
denotes the first observation (i.e., GDS of 1969) and the subscript 41 denotes the last
observation (i.e., GDS of 2010).Keep in mind that each of these Y’s is a random variable.
Stationary Stochastic Processes
A type of stochastic process that has received a great deal of attention and analysis by time series
analysts is the so- called stationary stochastic process. Broadly speaking, a stochastic process
is said to be stationary if its mean and variance are constant over time and the value of
the covariance between the two time periods depends only on the distance or gap or lag between
the two time periods and not the actual time at which the covariance is computed. In the time
series literature, such a stochastic process is known as a weakly stationary, or covariance
stationary, or second-order stationary, or wide sense, stochastic process. For the purpose of this
chapter, and in most practical situations, this type of stationarity often suffices.
To explain weak stationarity, let Yt be a stochastic time series with these properties:
190 | P a g e
����: � �� =……………………………………………………………………1
��������: ��� �� = �(�� − )2 = 2 …………………………………………. . 2
����������: � = � �� − ��+� − …………………………………………. . 3
Where � , the covariance (or autocovariance) at lag k, is the covariance between the values
of Y tand Yt+k, that is, between two Y values k periods apart. If k=0, we obtain 0 , which is
simply the variance of Y(=σ2); if k=1, 1 is the covariance between two adjacent values of Y.
Suppose we shift the origin of Y from Yt to Yt+m (say, from 1969 to 1974 for our GDS data).
Now if Y t is to be stationary, the mean, variance, and autocovariances of Yt+m must be the
same as those of Yt . In short, if a time series is stationary, its mean, variance, and
autocovariance (at various lags) remain the same no matter at what point we measure them;
that is, they are time invariant . Such a time series will tend to return to its mean (called
mean reversion) and fluctuations around this mean (measured by its variance) will have a
broadly constant amplitude. If a time series is not stationary in the sense just defined, it is
called a nonstationar y time series(keep in mind we are talking only about weak
stationarity). In other words, a nonstationary time series will have a time- varying mean or a
time- varying variance or both.
Why are stationary time series so important? Because if a time series is nonstationary, we
can study its behavior only for the time period under consideration. Each set of time series
data will therefore be for a particular episode. As a consequence, it is not possible to generalize
it to other time periods. Therefore, for the purpose of forecasting, such (nonstationary) time
series may be of little practical value.
How do we know that a particular time series is stationary? In particular, is the time series
shown in Figure 2.1 stationary? We will take this important topic up in Sections 2.5,
where we will consider several tests of stationarity. But if we depend on common sense, it
would seem that the time series depicted in Figures 2.1 is nonstationary, at least in the
mean values
191 | P a g e
Before we move on, we mention a special type of stochastic process (or time series),
namely, a purely random, or white noise, process. We call a stochastic process purely random if
it has zero mean, constant variance σ2, and is serially uncorrelated. You may recall that the
error term u t , entering the classical normal linear regression mode l that we discussed in
Econometrics I was assumed to be a white noise process, which we denoted as u t ∼IID
N(0,σ2); that is, ut is independently and identically distributed as a normal distribution
with zero mean and constant variance.
Nonstationary Stochastic Processes
Although our interest is in stationary time series, one often encounters nonstationary time
series, the classic example being the random walk model (RWM). It is often said that asset
prices, such as stock prices or exchange rates, follow a random walk; that is, they are
nonstationary. We distinguish two types of random walks: (1) random walk without drift (i.e.,
no constant or intercept term) and (2) random walk with drift (i.e., a constant term is present).
Random Walk without Drift: Suppose ut is a white noise error term with mean 0 and
variance 2.Then the series Yt is said to be a random walk if:
�� = ��−1 + �� ……………………………………………………………………………. . 2.4
In the random walk model, as (2.4) shows, the value of Y at time t is equal to its value at time
(t−1) plus a random shock; thus it is an AR (1) model in the language of Chapter 4 of
Econometrics I. We can think of (2.4) as a regression of Y at time t on its value lagged one
period. Believers in the efficient capital market hypothesis argue that stock prices are
essentially random and therefore there is no scope for profitable speculation in the stock
market: If one could predict tomorrow’s price on the basis of today’s price, we would all be
millionaires.
Now from (2.4) we can write
�1 = �0 + �1
�2 = �1 + �2 = �0 + �1 + �2
192 | P a g e
�3 = �2 + �3 = �0 + �1 + �2 + �3
In general, if the process started at some time 0 with a value of Y0, we have
�� = �0 + �� ………………………………………………………. 2.5
Therefore,
�(�� ) = � �0 + �� = �0 ………………………………………………………. 2.6
In like fashion, it can be shown that
��� �� = �2 ………………………………………………………………………. 2.7
As the preceding expression shows, the mean of Y is equal to its initial, or starting, value,
which is constant, but as t increases, its variance increases indefinitely, thus violating a
condition of stationarity. In short, the RWM without drift is a nonstationary stochastic process.
In practice Y0 is often set at zero, in which case E (Yt ) =0.
An interesting feature of RWM is the persistence of random shocks (i.e., random errors),
which is clear from (2.5): Yt is the sum of initial Y0 plus the sum of random shocks. As a
result, the impact of a particular shock does not die away. For example, if u2= 2 rather than u2=0,
then all Yt ’s from Y 2 onward will be 2 units higher and the effect of this shock never dies
out. That is why random walk is said to have an infinite memory. The implication is that,
random walk remembers the shock forever; that is, it has infinite memory.
Interestingly, if you write (2.4) as
�� − ��−1 = ��� = �� ………………………………………………………………2.8
Where �is the first difference operator? It is easy to show that, while Yt is nonstationary, its first
difference is stationary. In other words, the first differences of a random walk time series
are stationary.
Random Walk with Drift. Let us modify (2.4) as follows:
�� = � + ��−1 + �� ………………………………………………………………2.9
Where � is known as the drift parameter. The name drift comes from the fact that if we write
the preceding equation as
�� − ��−1 = ��� = � + �� ………………………………………………………………2.10
it shows that Y t drifts upward or downward, depending on δ being positive or negative.
Note that model (2.9) is also an AR(1) model. Following the procedure discussed for random
walk without drift, it can be shown that for the random walk with drift model (2.9),
�(�� ) = �0 + �. � + �� ………………………………………………………………2.11
��� �� = �2 ……………………………………………………………………………2.12
As you can see, for RWM with drift the mean as well as the variance increases over time,
again violating the conditions of (weak) stationarity. In short, RWM, with or without drift,
is a nonstationary stochastic process. The random walk model is an example of what is
known in the literature as a unit root process .
Unit Root Stochastic Process
Let us write the RWM (2.4) as:
193 | P a g e
�� =��−1 + �� ; − 1 ≤ ≤ 1……………………………………………………………. 2.13
This model resembles the Markov first- order autoregressive model that we discussed on
autocorrelation. If ρ=1, (2.4.1) becomes a RWM (without drift). If ρ is in fact 1, we face
what is
known as the unit root problem, that is, a situation of nonstationary; we already know that
in this case the variance of Yt is not stationary. The name unit root is due to the fact that ρ=1.
Thus the terms nonstationary, random walk, and unit root can be treated as synonymous. If,
however, |ρ|<1, that is if the absolute value of ρ is less than one, then it can be shown that
the time series Yt is stationary in the sense we have defined it.
2.3. Trend Stationary and Difference Stationary Stochastic Processes
If the trend in a time series is completely predictable and not variable, we call it a deterministic
trend, whereas if it is not predictable, we call it a stochastic trend. To make the definition more
formal, consider the following model of the time series Yt.
�� = 1 + 2 � + 3 ��−1 + �� ………………………………………………………………….14
Where ut is a white noise error term and where t is time measured chronologically. Now we have
the following possibilities:
Pure random walk: If in (2.14) β1=0,β2=0,β3=1,we get
�� = ��−1 + �� ………………………………………………………………………………. 2.15
which is nothing but a RWM without drift and is therefore nonstationary. But note that, if we
write (2.15) as
��� = �� − ��−1 + �� ……………………………………………………………………. 2.16
it becomes stationary, as noted before. Hence, a RWM without drift is a difference stationary
process (DSP).
Random walk with drift: If in (2.14) 1 ≠ 0, 2 = 0, 3 = 1, we get
�� = 1 + ��−1 + �� ……………………………………………………………………. 2.17
which is a random walk with drift and is therefore nonstationary. If we write it as
�� − ��−1 = ��� = 1 + �� …………………………………………………………. 2.17�
this means Y t will exhibit a positive (β1>0) or negative (β1<0) trend. Such a trend is called a
stochastic trend. Equation (2.16a) is a DSP process because the nonstationarity in Yt can be
eliminated by taking first differences of the time series.
Deterministic trend: If in (2.14), 1 ≠ 0, 2 ≠ 0, 3 = 0we get
�� = 1 + 2 � + �� ……………………………………………………………………. 2.17�
which is called a trend stationary process (TSP).Although the mean of Yt is β1+β2t, which
is not constant, its variance (=σ2) is. Once the values of β1 and β2 are known, the mean can
be forecasted perfectly. Therefore, if we subtract the mean of Yt from Yt , the resulting
series will be stationary, hence the name trend stationary. This procedure of removing the
(deterministic) trend is called detrending.
194 | P a g e
Random walk with drift and deterministic trend: If in (2.14) 1 ≠ 0, 2 ≠ 0, 3 = 1 , we
obtain:
�� = 1 + 2 � + ��−1 + �� ……………………………………………………………………. 2.18 we have a
random walk with drift and a deterministic trend, which can be seen if we write this equation
as:
�� − ��−1 = ��� = 1 + 2 � + �� …………………………………………………………. 2.18�
which means that Yt is nonstationary.
Deterministic trend with stationary AR (1) component: If in (2.14) 1 ≠ 0, 2 ≠ 0, 3 < 1, we
obtain:
�� = 1 + 2 � + 3 ��−1 + �� ……………………………………………………………………. 2.19
which is stationary around the deterministic trend.
2.4. Integrated Stochastic Process
The random walk model is but a specific case of a more general class of stochastic
processes known as integrated processes. Recall that the RWM without drift is
nonstationary, but its first difference, as shown in (2.8), is stationary. Therefore, we call the
RWM without drift integrated of order 1, denoted as I(1). Similarly, if a time series has to be
differenced twice (i.e., take the first difference of the first differences) to make it stationary,
we call such a time series integrated of order 2. In general, if a (nonstationary) time series
has to be differenced d times to make it stationary, that time series is said to be integrated
of order d. A time series Y t integrated of order d is denoted as Yt ∼I(d). If a time series Yt is
stationary to begin with (i.e., it does not require any differencing), it is said to be integrated
of order zero, denoted by Yt ∼I(0).Thus, we will use the terms “stationary time series” and
“time series integrated of order zero” to mean the same thing .
Most economic time series are generally I(1); that is, they generally become stationary only
after taking their first differences.
Properties of Integrated Series
The following properties of integrated time series may be noted: Let Xt , Y t , and Z t be
three time series.
i. If Xt ∼I(0) and Yt ∼I(1),then Zt = (Xt + Yt ) = I(1); that is, a linear combination
or sum of stationary and nonstationary time series is nonstationary.
ii. If Xt∼I(d), then Zt = (a+bXt ) = I(d), where a and b are constants. That is, a
linear combination of an I(d) series is also I(d).
Thus, if Xt∼I(0), then Zt = (a+bXt )∼I(0).
iii. If Xt ∼ I(d1) and Yt ∼ I(d2), then Zt =(aXt + bYt )∼I(d2), where d1<d2.
iv. If Xt ∼I(d) and Yt ∼ I(d), then Z t =(aXt +bYt )∼I(d*); d* is generally equal to d, but
in some cases d*<d.
2.5. Tests of Stationarity: The Unit Root Test
A test of stationarity (or nonstationarity) that has become widely popular over the past
several years is the unit root test. The starting point is the unit root (stocha stic) process that we
discussed in Section 2.2. We start with (2.4.1)
195 | P a g e
�� = ��−1 + �� ; − 1 ≤ ≤ 1………………………………………………………………. 2.20
Where ut is a white noise error term.
We know that if ρ =1, that is, in the case of the unit root, (2.20) becomes a random walk
model without drift, which we know is a nonstationary stochastic process. Therefore, why
not simply regress Yt on its (one period) lagged value Yt−1 and find out if the estimated ρ is
statistically equal to 1? If it is, then Yt is nonstationary. This is the general idea behind the unit
root test of stationarity.
For theoretical reasons, we manipulate (2.20) as follows: Subtract Yt−1 from both sides of
(2.20) to obtain:
�� − ��−1 = ��−1 − ��−1 + �� ; …………………………………………………. 2.21
which can be alternatively written as:
��� =��−1 + �� ; …………………………………………………. 2.22
Where δ = (ρ−1) and , is the first- difference operator.
In practice, therefore, instead of estimating (2.20), we estimate (2.22) and test the (null)
hypothesis that δ = 0. If δ = 0, then ρ = 1, that is we have a unit root, meaning the
time series under consideration is nonstationary. Unfortunately, under the null hypothesis that
δ = 0 (i.e., ρ = 1), the t value of the estimated coefficient of Yt−1 does not follow the t
distribution even in large samples; that is, it does not have an asymptotic normal
distribution. Dickey and Fuller have shown that under the null hypothesis that δ = 0, the
estimated t value of the coefficient of Yt−1 in (2.22) follows the τ(tau) statistic. In the literature
the tau statistic or test is known as the Dickey–Fuller (DF) test, in honor of its discoverers.
To allow for the various possibilities, the DF test is estimated in three different forms, that is,
under three different null hypotheses.
�� �� ������ ����: ��� =��−1 + �� ; …………………………………………………. 2.23�
�� �� ������ ���� ���ℎ �����: ��� = 1 +��−1 + �� ; …………………………………. 2.23�
�� �� ������ ���� ���ℎ ����� ��� �����: ��� = 1 + 2 � +��−1 + �� ; ………………. 2.23�
Where t is the time or trend variable. In each case, the null hypothesis is that δ = 0; that is, there
is a unit root—the time series is nonstationary. The alternative hypothesis is that δ is less than
zero; that is, the time series is stationary. If the null hypothesis is rejected, it means that Yt is a
stationary time series.
196 | P a g e
Where εt is a pure white noise error term and where ΔYt−1= (Y t−1−Yt−2), ΔYt−2 = (Yt−2−Yt−3 ),
etc. In ADF we still test whether δ = 0 and the ADF test follows the same asymptotic distribution
as the DF statistic, so the same critical values can be used.
To give a glimpse of this procedure, we estimated (2.24) for the GDP series using one
lagged difference of natural log of GDP of Ethiopia; the results were as follows :
����� =− 0.2095 + 0.0016� + 0.0197157����−1 + 0.0269�����−1
� = τ tau −0.28 0.67 0.27 0.15
1% �� =− 4.242 5%�� =− 3.540
The t(=τ) value of the GDP t−1 coefficient ( =δ) is 0.27, but this value in absolute terms is
much less than even the 1% and 5% critical τ value of −4.242 and - 3.540 respectively ,
again suggesting that even after taking care of possible autocorrelation in the error term, the
GDP series is nonstationary.
The Phillips–Perron (PP) Unit Root Tests
An important assumption of the DF test is that the error terms ut are independently and
identically distributed. The ADF test adjusts the DF test to take care of possible serial
correlation in the error terms by adding the lagged difference terms of the regressand. Phillips
and Perron use non parametric statistical methods to take care of the serial correlation in the
error terms without adding lagged difference terms.
The Phillips- Perron test involves fitting the following regression:
�� = 1 + 2� +��−1 + ��
Under the null hypothesis that ρ = 0, the PP Z(t ) and Z (ρ) statistics have the same
asymptotic distributions as the ADF t- statistic and normalized bias statistics. One advantage
of the PP tests over the ADF tests is that the PP tests are robust to general forms of
heteroscedasticity in the error term ut . Another advantage is that the user does not have to
specify a lag length for the test regression. Now let’s test for whether lnGDP is stationary or
not using PP test.
Interpolated Dickey-Fuller
Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value
197 | P a g e
Consider the following two random walk models:
�� = ��−1 + ��
�� = ��−1 + ��
Where ut ∼N(0, 1)and vt ∼N(0, 1). We also assumed that ut and vt are serially uncorrelated as
well as mutually uncorrelated. As you know by now, both these time series are nonstationary;
that is, they are I(1) or exhibit stochastic trends.
Suppose we regress Yt on Xt. Since Yt and Xt are uncorrelated I(1) processes, the R2 from the
regression of Yon X should tend to zero; that is, there should not be any relationship between the
two variables.
�� = 13.26 + 0.337�� , �2 = 0.104, � = 0.0121
� = 0.62 (0.044)
The coefficient of X is highly statistically significant, and, although the R2 value is low, it is
statistically significantly different from zero. From these results, you may be tempted to
conclude that there is a significant statistical relationship between Y and X, whereas a priori
there should be none. This is in a nutshell the phenomenon of spurious or nonsense regression,
first discovered by Yule. That there is something wrong in the preceding regression is suggested
by the extremely low Durbin–Watson d value, which suggests very strong first-order
autocorrelation. According to Granger and Newbold, an R2 > d the estimated regression is
spurious. The R2 and the t statistic from such a spurious regression are misleading, and the t
statistics are not distributed as (Student’s) t distribution and, therefore, cannot be used for testing
hypotheses about the parameters.
198 | P a g e
If the variables �� ��� �� are cointegrated, that is, there is a long-term, or equilibrium,
relationship between the two. Of course, in the short-run there may be disequilibrium. Therefore,
we can treat the error term in the following equation as the “equilibrium error.” And we can
use this error term to tie the short-run behavior of �� to its long-run value:
�� = �� − 1 − 2 �� − 3 �…………………………………………………………………2.25
If two variables Y and X are cointegrated, the relationship between the two can be expressed as
ECM.
Now consider the following model:
��� = 0 + 1 ��� + 2 ��−1 + � ………………………………………………………………. . 2.26
ECM equation (2.25) states that ��� depends on ��� and also on the equilibrium error term. If
the equilibrium error term is nonzero, then the model is out of equilibrium. Suppose, if ��� is
zero and ��−1 is positive, ��−1 is too high to be in equilibrium, that is, ��−1 is above its
equilibrium value of ( 0 + 1 ��−1 ). Since 2 is expected to be negative, the term 2 ��−1 is
negative and, therefore, ��� will be negative to restore the equilibrium. That is, if �� is above its
equilibrium value, it will start falling in the next period to correct the equilibrium error; hence
the name ECM. By the same token, if ��−1 is negative (i.e., �� is below its equilibrium value),
2��−1 will be positive, which will cause ��� to be positive, leading �� to rise in period t. Thus,
the absolute value of 2 decides how quickly the equilibrium is restored.
In practice, we estimate ��−1 by:
��−1 = �� − 1 − 2 �� − 3� , Note that the error correction coefficient 2 is expected to be
negative.
�������: ��� = 0.061 + 0.29��� − 0.122��−1 , �2 = 0.1658 , � = 2.15
� = 9.6753 6.2282 ( − 3.8461)
Statistically, the ECM term is significant, suggesting that �� adjusts to �� with a lag; only
about 12 percent of the discrepancy between long-term and short-term.
199 | P a g e
CHAPTER THREE
3. INTRODUCTION TO SIMULTANEOUS EQUATION MODELS
3.1. The Nature of Simultaneous Equation Models
In many situations, such a one- way or unidirectional cause- and- effect relationship is not
meaningful. This occurs if Y is determined by the X’s, and some of the X’s are, in turn,
determined by Y. In short, there is a two way, or simultaneous, relationship between Y and
(some of) the X’s, which makes the distinction between dependent and explanatory variables o f
dubious value. It is better to lump together a set of variables that can be determined
simultaneously by the remaining set of variables—precisely what is done in simultaneous-
equation models. In such models there is more than one equation—one for each of the
mutually, or jointly, dependent or endogenous variables. And unlike the single- equation
models, in the simultaneous- equation models one may not estimate the parameters of a single
equation without taking into account information provided by other equations in the system.
Example:
The classic example of simultaneous causality in economics is supply and demand. Both prices
and quantities adjust until supply and demand are in equilibrium. A shock to demand or supply
causes both prices and quantities to move. As is well known, the price P of a commodity and
the quantity Q sold are determined by the intersection of the demand - and- supply curves
for that commodity.
Figure 3.1. Interdependence of Price and Quantity
Thus, assuming for simplicity that the demand - and- supply curves are linear and adding
the stochastic disturbance terms u1 and u2, we may write the empirical demand- and- supply
functions as:
������ ��������: �� = 0 + 1 ��� + 2 �� + ��� , 1 < 0…………………. 3.1
������ ��������: �� = 0 + 1 ��� + �2� , 1 > 0………………………. . ……. 3.2
����������� ���������: �� = ��
Where ��� =quantity demanded, ��� = quantity supplied t = time and the ’s and β’s are the
parameters.
Now it is not too difficult to see that P and Q are jointly dependent variables. If, for example, u1t
in (3.1) changes because of changes in other variables affecting ��� (such as income, wealth,
and tastes), the demand curve will shift upward if u1t is positive and downward if �1� is
negative. These shifts are shown in Figure 3.1. As the figure shows, a shift in the demand
curve changes both P and Q. Similarly, a change in u2t (because of strikes, weather, import
or export restrictions, etc.) will shift the supply curve, again affecting both P and Q.
Because of this simultaneous dependence between Q and P, u 1t and Pt in (3.1) and u2t
and Pt in (3.2) cannot be independent. Therefore, a regression of Q on P as in (3.1)
would violate an important assumption of the classical linear regression model, namely,
the assumption of no correlation between the explanatory variable(s) and the disturbance
term.
200 | P a g e
Definitions of Some Concepts
The variables P and Q are called endogenous variables because their values are
determined within the system we have created.
The income variable Y has a value that is given to us, and which is determined outside
this system. It is called an exogenous variable.
Predetermined variables are exogenous variables, lagged exogenous variables and
lagged endogenous variables. Predetermined variables are non- stochastic and hence
independent of the disturbance terms.
Structural models: A structural model describes the complete structure of the relations
hips among the economic variables. Structural equations of the model may be expressed
in terms of endogenous variables, exogenous variables and disturbances (random
variables).
Reduced form of the model: The reduced form of a structural model is the model
in which the endogenous variables are expressed a function of the predetermined
variables and the error term only.
Example: The following simple Keynesian model of income determination can be considered as
a structural model.
� = +� + �, ��� > 0, 0 < > 1………………………………………………. . 3.3
� = � + �…………………………………………………………………………………3.4
where: C = Consumption expenditure
Z = non - consumption expenditure
Y = national income
C and Y are endogenous variables while Z is exogenous variable.
Reduced form of the model:
The reduced form of a structural model is the model in which the endogenous variables
are expressed a function of the predetermined variables and the error term only.
Illustration: Find the reduced form of the above structural model.
Since C and Y are endogenous variables and only Z is the exogenous variables, we have to
express C and Y in terms of Z. To do this substitute Y=C+Z into equation (3.4).
� = +(� + �) + �
� = +� +� + �
� −� = +� + �
�(1 −) = +� + �
1
�= + �+ �…………………………………………………………..3.5
1− 1− 1−
201 | P a g e
Equation (3.5) and (3.6) are called the reduced form of the structural model of the above. We can
write this more formally as:
Structural form of Equation Reduced form of Equation
1
� = +� + � � = + � + �
1 − 1 − 1 −
� = 01 + 11 � + �11
�=�+� 1 1
�= + �+ �
1 − 1 − 1 −
� = 02 + 12 � + �12
Parameters of the reduced form measure the total effect (direct and indirect) of a change
in exogenous variables on the endogenous variable
202 | P a g e
Applying OLS to the first equation of the above structural model will result in biased
estimator because ��� �� �� = �(�� �� ) ≠ 0.
Now, let’s proof whether this expression.
��� �� = �[{� − � � }{� − � � ]
� � − � � � ……………………………………………………………………. . 3.9
0 + 0 1 2 1 � + � 0 + 0 1 2
� + �+ − −
� � ………………. . 3.10
1 − 1 1 1 − 1 1 1 − 1 1 1 − 1 1 1 − 1 1
Substituting the value of X in equation (3.8) into equation (3.9)
�
� (0 − 01 + 2 � + � − 0 + 0 1 − 2 �
1 − 1 1
�
� (1 � + �
1 − 11
1
�(1 �2 + ��)
1 − 1 1
1 1 2�
� �2 = ≠ 0, ����� �(��) = 0
1 − 1 1 1 − 1 1
That is, covariance between X and U is not zero. As a consequence, if OLS is applied to each
equation of the model separately the coefficients will turn out to be biased. Now, let’s examine
how the non - zero co- variance of the error term and the explanatory variable will lead to
biasness in OLS estimates of the parameters.
If we apply OLS to the first equation of the above structural equation (3.3)
� = 0 + 1 � + �, �� ������
�� � �−� �� � �
� = = = , ����� �� ����
�2 �2 �2 �2
� 0 + 1 � + � 0 � 1 �� ��
= = + +
�2
�2 �2 �2
��
But, we know that � = 0 ��� �2
= 1, ℎ����,
��
� = 1 + ……………………………………………………………………………………11
�2
Taking the expected values on both sides;
��
�(�) = 1 + � �2
. Since, we have already proved that �( ��) ≠ 0 which is the same
203 | P a g e
In simultaneous equation models, the Problem of identification is a problem of model
formulation; it does not concern with the estimation of the model. The estimation of the model
depends up on the empirical data and the form of the model. If the model is not in the proper
statistical form, it may turn out that the parameters may not uniquely estimated even
though adequate and relevant data are available. In a language of econometrics, a model is said
to be identified only when it is in unique statistical form to enable us to obtain unique estimates
of its parameters from the sample data.
By the identification problem we mean whether numerical estimates of the parameters of a
structural equation can be obtained from the estimated reduced-form coefficients. If this can be
done, we say that the particular equation is identified. If this cannot be done, then we say that the
equation under consideration is unidentified, or under identified. An identified equation may be
either exactly (or fully or just) identified or over identified.
It is said to be exactly identified if unique numerical values of the structural parameters can be
obtained. It is said to be over identified if more than one numerical value can be obtained for
some of the parameters of the structural equations. The circumstances under which each of these
cases occurs will be shown in the following discussion. The identification problem arises
because different sets of structural coefficients may be compatible with the same set of data. To
put the matter differently, a given reduced-form equation may be compatible with different
structural equations or different hypotheses (models), and it may be difficult to tell which
particular hypothesis (model) we are investigating. In the remainder of this section we consider
several examples to show the nature of the identification problem.
204 | P a g e
Equations (3.14) and (3.15) are reduced-form equations. Now our demand and supply model
contains four structural coefficients 0, 1 ,0 ���1 . These reduced-form coefficients contain all
four structural parameters, but there is no way in which the four structural unknowns can be
estimated from only two reduced-form coefficients.
There is an alternative way of looking at the identification problem. Suppose we multiply Eq.
(3.12) by λ (0≤λ≤1) and Eq. (3.13) by 1−λ to obtain the following equations:
�� = α0+α1Pt +u1t
(1-) Qs = (1-)β0+(1-)β1Pt +(1-)u2t
Adding these two equations gives the following linear combination of the original demand and
supply equations:
�� = 0 + 1 �� + �� , …... …………………………………………………………………….3.15
Where, 0 = 0 + (1 −)0
1 = 1 + (1 −)1
��= �1� +(1 −)u2t
The “mongrel,” equation (3.15) is observationally indistinguishable from either supply or
demand equation because they involve the regression of Q and P. For an equation to be identified,
that is, for its parameters to be estimated, it must be shown that the given set of data will not
produce a structural equation that looks similar in appearance to the one in which we are
interested. If we set out to estimate the demand function, we must show that the given data are
not consistent with the supply function or some mongrel equation.
A function (an equation) belonging to a system of simultaneous equations is identified if it
has a unique statistical form, i.e. if there is no other equation in the system, or formed by
algebraic manipulations of the other equations of the system, contains the same variables as the
function(equation) in question.
Identification problems do not just arise only on two equation- models. Using the above
procedure, we can check identification problems easily if we have two or three equations
in a given simultaneous equation model. However, for ‘n’ equations simultaneous equation
model, such a procedure is very cumbersome. In general for any number of equations in a given
simultaneous equation, we have two conditions that need to be satisfied to say that the
model is in general identified or not. In the following section we will see the formal
conditions for identification.
(K-M ) ( G-1)
excluded
For example, if a system contains 10 equations with 15 variables, ten endogenous and five
exogenous, an equation containing 11 variables is not identified, while another containing 5
variables is identified.
a. For the first equation we have
G=10 , K=15, M= 11
Order condition:
(K-M) G-1 (15-11=4) < (10-1=9); that is, by the order condition it is not satisfied.
The order condition for identification is necessary for a relation to be identified, but it is not
sufficient, that is, it may be fulfilled in any particular equation and yet the relation may not be
identified.
ii. The rank condition for identification
The rank condition states that: in a system of G equations any particular equation is identified if
and only if it is possible to construct at least one non-zero determinant of order (G-1) from the
coefficients of the variables excluded from that particular equation but contained in the other
equations of the model. The practical steps for tracing the identifiably of an equation of a
structural model may be outlined as follows.
Firstly. Write the parameters of all the equations of the model in a separate table, noting that the
parameter of a variable excluded from an equation is equal to zero.
For example let a structural model be:
�1 = 3�2 − 2�1 + �2 + �1
�2 = �3 + �3 + �2
�3 = �1 − �2 + 2�3 + �3
where the y’s are the endogenous variables and the x’s are the predetermined variables. This
model may be rewritten in the form
�1 − 3�2 − 0�3 + 2�1 − �2 + 0�3 = �1
0�1 + �2 − �3 + 0�1 + 0�2 − �3 = �2
206 | P a g e
−�1 + �2 + �3 0�1 + 0�2 − 2�3 = �3
Ignoring the random disturbance the table of the parameters of the model is as follows:
Secondly. Strike out the row of coefficients of the equation which is being examined for
identification. For example, if we want to examine the identifiability of the second equation of
the model we strike out the second row of the table of coefficients.
Thirdly. Strike out the columns in which a non-zero coefficient of the equation being
examined appears. By deleting the relevant row and columns we are left with the
coefficients of variables not included in the particular equation, but contained in the other
equations of the model. For example, if we are examining for identification the second
equation of the system, we will strike out the second, third and the sixth columns of the above
table, thus obtaining the following tables.
Fourthly. Form the determinant(s) of order (G -1) and examine their value. If at least one of
these determinants is non-zero, the equation is identified. If all the determinants of order (G-1)
are zero, the equation is underidentified .
In the above example of exploration of the identifiability of the second structural equation we
have three determinants of order (G-1) =3-1=2. They are:
(the symbol Δ stands for ‘determinant’) We see that we can form two non-zero determinants of
order G-1=3-1=2; hence the second equation of our system is identified.
Fifthly. To see whether the equation is exactly identified or overidentified we use the order
condition (K-M)(G-1).With this criterion, if the equality sign is satisfied, that is if (K-M)=(G-1),
the equation is exactly identified. If the inequality sign holds, that is, if
(K-M)>(G-1), the equation is overidentified .
207 | P a g e
In the case of the second equation we have:
G=3 K=6 M=3
And the counting rule
(K-M)(G-1)gives (6-3) > (3-1). Therefore the second equation of the model is overidentified.
3.5 Estimation of Simultaneous Equations Models
1. Indirect Least Squares (ILS) Method
In this method, we first obtain the estimates of the reduced form parameters by applying
OLS to the reduced form equations and then indirectly get the estimates of the parameters of
the structural model. This method is applied to exactly identified equations.
Steps:
a. Obtain the reduced form equations (that is, express the endogenous variables in terms of
predetermined variables).
b. Apply OLS to the reduced form equations individually. OLS will yield consistent
estimates of the reduced form parameters (since each equation involves only non
stochastic (predetermined) variables that appear as ‘independent’ variables).
c. Obtain (or recover back) the estimates of the original structural coefficients using the
estimates in step (b).
2. Two-Stage Least Squares (2SLS) Method
The 2SLS procedure is generally applicable for estimation of over- identified equations as it
provides unique estimators.
Steps:
a) Estimate the reduced form equations by OLS and obtain the predicted �� .
b) Replace the right hand side endogenous variables in the structural equations by
c) the corresponding �� and estimate them by OLS.
Consider the following simultaneous equations model:
�1 = �1 + �1 �2 + �1 �1 + �2 �2 + �1………………………………. …………. . (�)
�2 = �2 + �2 �1 + �3 �3 + �2 ……………………………………………………. (�)
Where �1 and �2 are endogenous while�1 , �2 and �3 are predetermined.
The 2- SLS procedure of estimation of equation (b) (which is over- identified) is:
We first estimate the reduced form equations by OLS; that is, we regress Y1 on �1 ,
�2 and �3 using OLS and obtain �� . We then replace Y1 by �� and estimate equation
(b) by OLS, that is, we apply OLS to: �2 = �2 + �2 �1 + �3�3 + �2
CHAPTER FOUR
4. INTRODUCTION TO PANEL DATA REGRESSION MODELS
4.1. Introduction
In panel data the same cross-sectional unit (say a family or a firm or a state) is surveyed over
time. In short, panel data have space as well as time dimensions.
Hypothetical examples:
208 | P a g e
Data on 200 Ethiopian Somali regional state school in 2004 and again in 2005, for 400
observations total.
Data on 9 regional states of Ethiopia, each state is observed in 5 years, for a total of 45
observations.
Data on 1000 individuals, in four different months, for 4000 observations total.
There are other names for panel data, such as pooled data (pooling of time series and
cross sectional observations), combination of time series and cross -section data (cross -
sectional time -series data), and micro panel data, longitudinal data (a study over time of a
variable or group of subjects).
Why Should We Use Panel Data? Their Benefits and Limitations
Baltagi (2005) list several benefits from using panel data. These include the following.
1. Controlling for individual heterogeneity. Panel data allows you to control for variables
you cannot observe or measure like cultural factors or difference in business
practices across companies; or variables that change over time but not across entities
(i.e. national policies, federal regulations, international agreements, etc.). This is, it
accounts for individual heterogeneity. Time- series and cross- section studies not
controlling this heterogeneity run the risk of obtaining biased results.
2. Panel data give more informative data, more variability, less collinearity among
the variables, more degrees of freedom and more efficiency. Time - series studies are
plagued with multicollinearity.
3. Panel data are better able to study the dynamics of adjustment. Cross- sectional
distributions that look relatively stable hide a multitude of changes.
4. Panel data are better able to identify and measure effects that are simply not detectable in
pure cross- section or pure time- series data.
5. Panel data models allow us to construct and test more complicated behavioral models
than purely cross- section or time- series data. For example, technical efficiency is
better studied and modeled with panels.
6. Micro panel data gathered on individuals, firms and households may be more
accurately measured than similar variables measured at the macro level. Biases
resulting from aggregation over firms or individuals may be reduced or eliminated.
Limitations of panel data include:
1. Design and data collection problems. These include problems of coverage
(incomplete account of the population of interest), non response (due to lack of
cooperation of the respondent or because of interviewer error), recall (respondent not
remembering correctly), frequency of interviewing, interview spacing, reference
period, the use of bounding and time- in - sample bias.
2. Distortions of measurement errors. Measurement errors may arise because of
faulty responses due to unclear questions, memory errors, deliberate distortion of
responses (e.g. prestige bias), inappropriate informants, misrecording of responses
and interviewer effects .
3. Selectivity problems. These include:
(a) Self - selectivity. People choose not to work because the reservation wage is higher
than the offered wage. In this case we observe the characteristics of these individuals
but not their wage. Since only their wage is missing, the sample is censored.
209 | P a g e
However, if we do not observe all data on these people this would be a truncated
sample.
(b) Non response. This can occur at the initial wave of the panel due to refusal to
participate, nobody at home, untraced sample unit, and other reasons. Item (or partial)
non response occurs when one or more questions are left unanswered or are found
not to provide a useful response.
(c) Attrition. While non response occurs also in cross- section studies, it is a more
serious problem in panels because subsequent waves of the panel are still
subject to nonresponse. Respondents may die, or move, or find that the cost of
responding is high.
4. Short time- series dimension. Typical micro panels involve annual data covering a
short time span for each individual. This means that asymptotic arguments rely
crucially on the number of individuals tending to infinity. Increasing the time span
of the panel is not without cost either. In fact, this increases the chances of
attrition and increases the computational difficulty for limited dependent variable
panel data models.
5. Cross- section dependence. Macro panels on countries or regions with long time
series that do not account for cross- country dependence may lead to misleading
inference.
Notation for panel data
A double subscript is used to distinguish entities (states, family, country, individuals,
etc.) and time periods.
Consider the following simple panel data regression model:
��� = 0 + 1 �1�� + 2 �� + ��� …………………………………………4.1
i =1,…,n, T = 1,…,T
Where i = entity (state), n = number of entities, so i = 1,…,n ,t = time period (year, month,
quarter, etc.), T = number of time periods, so that t =1,…,T
Panel data with k regressors:
��� = 0 + 1 �1�� + 2 �2�� + …����� + ��� …………………………………………4.2
210 | P a g e
the individual may impact or bias the predictor or outcome variables and we need to control
for this. This is the rationale behind the assumption of the correlation between entity’s error
term and predictor variables. FE remove the effect of those time- invariant characteristics from
the predictor variables so we can assess the predictors’ net effect.
Another important assumption of the FE model is that those time- invariant characteristics
are unique to the individual and should not be correlated with other individual
characteristics. Each entity is different therefore the entity’s error term and the constant
(which captures individual characteristics) should not be correlated with the others. If the
error terms are correlated then FE is no suitable since inferences may not be correct and you
need to model that relationship (probably using random - effects.
Entity-demeaned OLS Regression
Think of the following two variables panel regression model in fixed effect form:
��� = � + 1 �1�� + ��� ……………………………………………………. …………4.3
� is called an “entity fixed effect” or “entity effect” – it is the constant (fixed)
effect of being in entity i .
The state averages satisfy:
� �
1 1 1
��� = � + 1 ��� + ���
� � �
�=1 �=1
Deviation from entity averages:
� �
1 1 1
��� − ��� = (� − � ) + 1 �1�� − ��� + (��� − ��� )
� � �
�=1 �=1
1 � 1
��� =1 ��� +��� , �ℎ���: ��� = (�1�� − � � )
�=1 ��
, ��� = (��� − � ��� )
Then we apply OLS ��� =1 ��� +��� to estimate 1 .
211 | P a g e
3.4.2. The Random Effects (RE) Approach
If you believe that some omitted variables may be constant over time but vary among panels,
and others may be fixed among panels but vary over time, then you can apply random effects
regression model. Random effects assume that the entity’s error term is not correlated with the
predictors which allows for time- invariant variables to play a role as explanatory variables. In
random - effects you need to specify those individual characteristics that may or may not
influence the predictor variables.
The basic idea of random effects model is to start with (4.3):
��� = � + 1 ��� + ��� ……………………………………………………. …………4.3�
Instead of treating � as fixed, we assume that it is a random variable with a mean value of
(no subscript i here). And the intercept value for an individual entity can be expressed as:
� = + it, i = 1……n…………………………………………………………………4.4
Where i is a random error term with a mean value of zero and variance of2 .
What we are essentially saying is that the entities included in our sample are a drawing
from a much larger universe of such population and that they have a common mean value
for the intercept (=) and the individual differences in the intercept values of each entity are
reflected in the error term i .
Substituting (4.4) into (4.3a), we get:
��� = + 1 ��� + it + ��� ……………………………………………………. …………4.5
��� = + 1 ��� + wit , where: wit = it + ���
In random effects model (REM) or error component model (ECM) it is assumed that the
intercept of an individual unit is a random drawing from a much larger population with a
constant mean value. The individual intercept is then expressed as a deviation from this
constant mean value. One advantage of ECM over FEM is that it is economical in degrees of
freedom, as we do not have to estimate N cross- sectional intercepts. We need only to estimate
the mean value of the intercept and its variance. ECM is appropriate in situations where the
(random) intercept of each cross sectional unit is uncorrelated with the regressors.
212 | P a g e
3.5. Choosing Between Fixed and Random Effects
If you aren't exactly sure which models, fixed effects or random effects, you should use, you can
do a test called Hausman test. To run a Hausman test in Stata, you need to save the coefficients
from each of the models and use the stored results in the test. To store the coefficients, you can
use "estimates store" command.
The hausman test tests the null hypothesis that the coefficients estimated by the efficient
random effects estimator are the same as the ones estimated by the consistent fixed effects
213 | P a g e
estimator. If they are, then it is safe to use random effects. If you get a statistically significant P
- value, however, you should use fixed effects. In this example, the P - value is statistically
significant. Therefore, fixed effects would be more appropriate in this case.
Fixed Effect Model Random Effect Model
Functional form ��� = ( + it ) + 1 ��� + ��� ��� = + 1��� + (it + ��� )
Intercepts Varying across groups and/or Constant
times
Error variances Constant Varying across groups and/or
times
Slopes Constant Constant
Estimation LSDV ,within effect method GLS,FGLS
Hypothesis test Incremental F test Breusch- Pagan LM test
Other Tests/Diagnostics
Testing for Time -fixed Effects
It is a joint test to see if the dummies for all years are equal to 0, if they are then no time
fixed effects are needed.
We failed to reject the null that all years’ coefficients are jointly equal to zero therefore no time
fixed effects are needed.
214 | P a g e
Testing for Random Effects:
Breusch-Pagan Lagrange Multiplier (LM)
The LM test helps you decide between a random effects regression and a simple OLS regression.
The null hypothesis in the LM test is that variances across entities are zero. This is no significant
difference across units (i.e. no panel effect).
Here we reject the null and conclude that random effects is appropriate. This is, there is
evidence of significant differences across countries, therefore you should run a random effects
regression.
Testing for Cross-Sectional Dependence/Contemporaneous Correlation:
Using Breusch-Pagan LM Test of Independence
According to Baltagi, cross- sectional dependence is a problem in macro panels with long time
series (over 20- 30 years). This is not much ofa problem in micro panels (few years and large
number of cases).The null hypothesis in the B- P/LM test of independence is that residuals
across entities are not correlated.
The command to run this test is xttest2 (run it after xtreg, fe) :
xtreg fatality beertax, fe
xttest2
Testing for Cross-Sectional Dependence/Contemporaneous Correlation:
Using Pesaran CD Test
Pasaran CD (cross- sectional dependence) test isused to test whether the residuals are correlated
across entities*. Cross- sectional dependence can lead to bias in tests results (also called
contemporaneous correlation). The null hypothesis is that residuals are not correlated.
The command for the test is xtcsd, you have to install it typing ssc install xtcsd
xtreg fatality beertax , fe
xtcsd, pesaran/xtcsd, frees/xtcsd, frees
Since there is cross- sectional dependence in our model, it is suggested to use Driscoll and
Kraay standard errors.
215 | P a g e
A test for heteroscedasticity is available for the fixed- effects model using the command xttest3.
The null is homoscedasticity (or constant variance). Above we reject the null and conclude
heteroscedasticity. Serial correlation tests apply to macro panels with long time series (over 20-
30 years). Not a problem in micro panels (with very few years). Serial correlation causes the
standard errors of the coefficients to be smaller than they actually are and higher R- squared.
A Lagrange- Multiplier test for serial correlation is available using the command xtserial.
The null is no serial correlation. Above we reject the null and conclude the data do have first
order autocorrelation.
Testing for Unit Roots/Stationarity
The Levin Lin - Chu (2002), Harris- Tzavalis (1999), Breitung (2000; Breitung and Das 2005),
Im - Pesaran Shin (2003), and Fisher- type (Choi 2001) tests have as the null hypothesis that
all the panels contain a unit root. The Hadri (2000) Lagrange multiplier (LM) test has as the
null hypothesis that all the panels are (trend) stationary. The top of the output for each test makes
explicit the null and alternative hypotheses. Options allow you to include panel - specific means
(fixed effects) and time trends in the model of the data- generating process.
216 | P a g e
217 | P a g e