0% found this document useful (0 votes)
116 views217 pages

Quantitative Methods in Economics With Application

Uploaded by

mesobewerke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views217 pages

Quantitative Methods in Economics With Application

Uploaded by

mesobewerke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 217

MODULE FOR QUANTITATIVE METHODES IN

ECONOMICS WITH APPLICATION.


COMPILED BY:
1.GETACHEW PETROS (ASSISTANCE PROFESSOR)
2. LAMBAMO AREGA (ASSISTANCE PROFESSOR)

WOLAITA SODO, ETHIOPIA

1|Page
Contents
1.1Nature and Importance of Mathematical Economics ............................................... 5
1.3. Advantages of the Mathematical Approaches ....................................................... 5
1.4. Review of Economic Model and Analysis .............................................................. 6
1.5. Static (Equilibrium) Analysis in Economics ........................................................... 6
1.5.1. Partial Market Equilibrium - A Linear Model .......................................................6
1.5.2. Partial Market Equilibrium - A Nonlinear Model ..................................................7
1.5.3. General Market Equilibrium ..............................................................................8
1.5.4. Equilibrium in National-Income Analysis ........................................................... 8
CHAPTER TWO .........................................................................................................9
REVISION ON CALCULUS AND LINEAR ALGEBRA ........................................................9
2.1. Differential Calculus: Fundamental Techniques .................................................... 9
2.1.1. The Concept of the Derivative. .........................................................................9
2.1.2. The Rules or Theorems of Derivates ...............................................................10
a) The constant rule .................................................................................................10
b) The Simple Power rules ........................................................................................10
c) The coefficient rules ............................................................................................. 10
d) The sum or difference rule ...................................................................................10
e) The product rules ................................................................................................ 10
g) Differentiation of a composite function ..................................................................11
i) Derivatives of Logarithmic and Exponential Functions ............................................ 12
2.2. Integral calculus: Techniques and applications of integral calculus ...................... 14
2.2.1. The Indefinite Integral ...................................................................................14
2.2.2. Techniques of Integration & the General Power Rule .......................................14
2.2.3. Definite Integrals .......................................................................................... 16
2.2.4. Continuous Money Streams ............................................................................17
2.2.4.1. Continuous Money Streams within a Period ..................................................17
2.2.5. Improper Integrals ........................................................................................ 18
2.3.1. Matrix Operations ..........................................................................................20
2.3.2. Determinants and Inverse of a Matrix .............................................................26
2.3.2.1. The Concept of Determinants, Minor and Cofactor ........................................26
2.3.2.2. The Inverse of a Matrix ...............................................................................31
2.3.3. Matrix Representation of Linear Equations ...................................................... 32
2.3.4. Input out put analysis (Leontief model) .......................................................... 33
2.3.5. Linear Programming ...................................................................................... 35
3.2.2. Linear Approximation .................................................................................... 39
3.2.6The Intermediate Value Theorem ..................................................................... 48
3.3.2The Multivariate Chain Rule ............................................................................. 53

2|Page
CHAPTER FOUR ...................................................................................................... 55
UNCONSTRAINED OPTIMIZATION ........................................................................... 55
4.1. Functions with one variable ................................................................................ 55
4.1.1. The concept of optimum (extreme) value ........................................................... 55
4.3 Implicit functions and unconstrained Envelope theory ......................................... 71
CHAPTER FIVE ....................................................................................................... 74
5. CONSTRAINED OPTIMIZATION ........................................................................... 74
CHAPTER SIX ......................................................................................................... 96
COMPARATIVE STATIC ANALYSIS .....................................................................96
CHAPTER SEVEN ...................................................................................................108
DYNAMIC OPTIMIZATION. .................................................................................... 108
7.1.1 First –Order Linear Differential Equations .......................................................109
7.2. The Dynamic Stability of Equilibrium ................................................................113
ECONOMETRICS ONE ............................................................................................121
1.1 Definition and scope of econometrics ............................................................... 121
1.5 Methodology of econometrics ...........................................................................123
Chapter Two .........................................................................................................125
2.1 THE CLASSICAL REGRESSION ANALYSIS ..........................................................125
2.1. Stochastic and Non-stochastic Relationships .................................................... 126
2.2. Simple Linear Regression model. .....................................................................127
2.2.1 Assumptions of the Classical Linear Stochastic Regression Model .....................127
2.2.2 Methods of estimation .........................................................................128
2.2.2.1 The ordinary least square (OLS) method .....................................................129
2.2.2.2 Estimation of a function with zero intercept .................................................130
2.2.2.3. Statistical Properties of Least Square Estimators .........................................131
2.2.2.4. Statistical test of Significance of the OLS Estimators (First Order tests) ........ 133
2.6 Confidence Intervals and Hypothesis Testing .................................................... 134
CHAPTER THREE ................................................................................................141
The Multiple Linear Regression Analysis ................................................................. 141
3.1 Introduction ....................................................................................................142
3.2 Assumptions of Multiple Regression Models .......................................................142
3.3. A Model with Two Explanatory Variables ..........................................................143
3.3.1 Estimation of parameters of two-explanatory variables model ....................... 143
3.3.2 Coefficient of Multiple Determination(R2) ....................................................... 144
Chapter Four ........................................................................................................ 150
Violations of basic Classical Assumptions ................................................................ 150
4.3.4.1 Test Based on Auxiliary Regressions: .......................................................... 172

3|Page
4.3.4.3 Test of multicollinearity using Eigen values and condition index: ...................174
4.3.4.4 Test of multicollinearity using Tolerance and variance inflation factor .......... 174
PART TWO ........................................................................................................... 176
ECONOMETRICS TWO ...........................................................................................176
CHAPTER ONE ...................................................................................................... 176
1.2.3. Regression on One Quantitative Variable and Two Qualitative Variables ..........180
1.2.4. Interactions Among Dummy Variables .......................................................... 181
1.2.5. Testing for Structural Stability of Regression Models ......................................182
1.3. Dummy as Dependent Variable ..................................................................... 182
1.3.2. The Logit and Probit Models .........................................................................184
CHAPTER TWO ..................................................................................................... 189
2.6 The Phenomenon of Spurious Regression ......................................................... 197
CHAPTER THREE .................................................................................................. 200
CHAPTER FOUR .................................................................................................... 208

4|Page
CHAPTER ONE
INTRODUCTION TO MATHEMATICAL ECONOMICS

1.1Nature and Importance of Mathematical Economics


Mathematics evolved through the use of abstraction and logical reasoning, from counting,
calculation, measurement, and the systematic study of positions, shapes and motions of physical
objects. Mathematicians explore such concepts, aiming to formulate new conjectures and
establish their truth by rigorous deduction from appropriately chosen axioms and definitions.

1.2. Mathematical Versus Non-Mathematical Economics


Since mathematical economics is merely an approach to economic analysis, it should not and
does not differ from the nonmathematical approach to economic analysis in any fundamental
way. The purpose of any theoretical analysis, regardless of the approach, is always to derive a set
of conclusions or theorems from a given set of assumptions or postulates via a process of
reasoning.

The major difference between these two approaches is that in the former, the assumptions and
conclusions are stated in mathematical symbols rather than words and in equations rather than
sentences. Symbols are more convenient to use in deductive reasoning, and certainly are more
conducive to conciseness and preciseness of statement. The choice between literary logic and
mathematical logic, is a matter of little important, but mathematics has the advantage of forcing
analysts to make their assumptions explicit at every stage of reasoning.

But an economist equipped with the tools of mathematics is like a normal person with motorboat
or ship depending upon his personal inclination to each. As a result, most economic researchers
are extensively using the tools of mathematics to economic reasoning.

The term mathematical economics is also different from econometrics in that the former is
concerned with the application of mathematics to the purely theoretical aspects of economics
analysis, with little or no concern about statistical problems as the error of measurement of the
variables under study, while the later focuses on the measurements and analysis of economic
data; hence it deals with the study of empirical observations using statistical methods of
estimation and hypothesis testing.

1.3. Advantages of the Mathematical Approaches


Mathematical economics is of paramount importance for economic analyses.
Specifically, the approach has the following advantages:
a) The language used is more precise and concise;
b) There is wealth of mathematical assumptions that make things simple and life less
costly; hence the analysis is more rigorous;
c) By forcing us to state explicitly all our assumptions as a prerequisite to the use of

5|Page
mathematical theorems, it keeps us from the pitfall of an unintentional adoption of
unwanted implicit assumptions
d) It helps us to understand relationships among two or more economic variables simply
and neatly with which the geometric and literary approaches are at high probability of
risk of committing mistakes.
e) it allows us to treat the general n-variable case.

1.4. Review of Economic Model and Analysis


As you have covered in the course introduction to economics, an economic theory is necessarily
generalization or abstraction from the real world. The complexity of the real economy does not
make it possible for us to understand all the interrelationships at once; nor, for that matter, are all
these interrelationships of equal importance for the understanding of the particular economic
phenomenon under study. The wise procedure is, hence, to pick out what appeal to our reason to
be the primary factors and relationships relevant to our problem and to focus our attention on
these alone the main aim behind building economic models is to describe how the economy
works and to obtain somehow valid predictions about economic variables. Different models are
built for different purposes. Some of the models are designed to investigate the equilibrium value
of the variable as an ultimate end while some to investigate the movement of a variable (s) over
time. Thus, in economics, we have three types of analyses:
i. Static (equilibrium) analysis
ii. Comparative static analysis
iii. Dynamic analysis
1.5. Static (Equilibrium) Analysis in Economics
In essence, equilibrium for a specific model is a situation that is characterized by a lack of
tendency to change. It is for this reason that the analysis of equilibrium is referred to as static.
The fact that equilibrium implies no tendency to change may tempt one to conclude that
equilibrium necessarily constitutes a desirable or ideal state of affairs.

For the present analysis we will focus on two types of the equilibrium. One is the equilibrium
attained by a market under given demand and supply conditions. The other is the equilibrium of
national income under given conditions of consumption and investment patterns.

1.5.1. Partial Market Equilibrium - A Linear Model


In a static-equilibrium model, the standard problem is that of finding the set of values of the
endogenous variables which will satisfy the equilibrium conditions of the model.

Partial-Equilibrium Market Model-- a model of price determination in an isolated market.

Three variables

Qd = the quantity demanded of the commodity;

6|Page
Qs = the quantity supplied of the commodity;

P = the price of the commodity.

The Equilibrium Condition: Qd = Qs

The model is

Qd = a - bp (a, b > 0)

Qs = -c + dp (c, d > 0)

The slope of Qd = -b, the vertical intercept = a.

The slope of Qs = d, the vertical intercept = -c.

Note that, contrary to the usual practice, quantity rather than price has been plotted vertically

in the Figure.

One way of finding the equilibrium is by successive elimination of variables and

equations through substitution.

From Qs = Qd, we have

a - bp = -c + dp

and thus

(b + d) p = a + c.

Since b + d > 0, then the equilibrium price is

P = a + c/ b+d

The equilibrium quantity can be obtained by substituting p into either Qs or Qd:

Since the denominator (b+d) is positive, the positivity of Q requires that the numerator

(ad-bc)> 0. Thus, to be economically meaningful, the model should contain the additional
restriction that ad > bc.

1.5.2. Partial Market Equilibrium - A Nonlinear Model


The partial market model can be nonlinear. Suppose a model is given by

Qd = 4 - p2

7|Page
Qs = 4p - 1

As previously stated, this system of three equations can be reduced to a single equation by

substitution. 4 - p2 = 4p – 1 or p2 + 4p - 5 = 0 which is a quadratic equation.

In general, given a quadratic equation in the form ax2 + bx + c = 0 (a>0), its two roots can be
obtained from the quadratic formula:

-b+_ √b2-4ac/2a where the “+ " part of the "±" sign yields Xj and the "-" part yields x2.

Thus, by applying the quadratic formulas to p2 + 4p - 5 = 0, we have Pl = 1 and p2= -5, but only
the first is economically admissible, as negative prices are ruled out.

1.5.3. General Market Equilibrium


In the above, we have discussed methods of an isolated market, where in the Qd and Qs

of a commodity are functions of the price of that commodity alone. In the real world, there

would normally exist many substitutes and complementary goods. Thus, a more realistic model

for the demand and supply functions of a commodity should take into account the effects not

only of the price of the commodity itself but also of the prices of other commodities.

Consequently, the equilibrium condition of an n-commodity market model will involve n


equations, one for each commodity, in the form E=Qdi - Qsi = 0 (i=l,2,...,n) where Qdi & Qsi
are the demand and supply functions of commodity i, solving n equations for P: we obtain the n
equilibrium prices P¡—if a solution does indeed exist. And then the Q¡ may be derived from the
demand or supply functions.

1.5.4. Equilibrium in National-Income Analysis


The equilibrium analysis can be also applied to other areas of Economics. As a simple

example, we may cite the familiar Keynesian national-income model,

Y= C + Io + G0 (equilibrium condition)

C = Ca + bY (the consumption function)

where Y and C stand for the endogenous variable’s national income and consumption

expenditure, respectively, Ca stands for autonomous consumption, and I0 and G0 represent the
exogenously determined investment and government expenditures.

Solving these two linear equations, we obtain the equilibrium national income and

8|Page
consumption expenditure: Y=Ca+bY+I0+G0, by collecting like terms we have;

Y=Ca+I0+G0/1-b

C=Ca+b(Ca+I0+G0/1-b)

CHAPTER TWO
REVISION ON CALCULUS AND LINEAR ALGEBRA
2.1. Differential Calculus: Fundamental Techniques

2.1.1. The Concept of the Derivative.


Derivative is defined as a derived function from an original function, say y=f(x) that shows the
rate of change in dependent variable(y) due to very small change in independent variable(x).
If f is a function defined by y=f (x), then the derivative of f (x) for any value of x, is denoted by
dy/dx , y’ or f ’(x) is the instantaneous rate of change of f (X) which is given by:

As you might remember from the previous discussion, the derivative is also the slope of the
tangent line to f (x) at a point. Before we directly go to the discussion of the rules of derivatives,
let’s see some examples.
2
Examples: For f (x)=X
Solution

Note that,

i. The derivative, which is one of the most fundamental concepts in calculus, is the same
as the following two concepts.
a. Slope of a line tangent to a curve at x
b. Instantaneous rate of change of f(x) at x
ii. The process of obtaining f’ (x) from f (x) is known as differentiation and if the derivative

9|Page
exists, then the function is differentiable at a point or over an interval. In the following
section, we will present the rules of derivatives, which will largely simplify our
calculation of the derivative.
2.1.2. The Rules or Theorems of Derivates
In this section, we formally state the different rules of derivatives that are important in several
problems of getting the derivative. These are the constant rule, the simple power rule, the
coefficient rule, sum/difference rule, product rule, quotient rule, chain rule, and others.
a) The constant rule
If c is constant and if f(x)=c, then f’(x)=0
Examples. If f (x)=2, then f ‘(x)=0
b) The Simple Power rules
n-1
If where then = nx
c) The coefficient rules
For any constant C and function m(x),

Example

d) The sum or difference rule

Example If , then the derivative

e) The product rules


Although most of the time a function is given as a sum or difference of terms, there are times
when functions are presented in products.

Example
1.

10 | P a g e
f) The Quotient rule
If u (x) and V (x) are two functions and if f(x) is the quotient,

Example If Y
Solution

g) Differentiation of a composite function

If the function f is a function of g and g is a function of (x), then is: .

Example Y
Solution Let

h) Derivatives of Implicit functions


Implicit functions are those functions whose dependent and independent variables are not
explicitly stated. That is, we don’t know whether a given variable is a cause or an effect variable
without any a priori information when we encounter implicit functions.
When the dependent and independent variables are not explicitly stated, one may not easily come

11 | P a g e
up with . But using implicit differentiation one can calculate .
Suppose we have an implicit function given by
Then, totally differentiating the equation we get

Rearranging terms gives us:

Example: Given a function , find using implicit


differentiation rule.
Solution First, let’s find the first partial derivatives from the given function

Then, using the implicit differentiation rule

i) Derivatives of Logarithmic and Exponential Functions


The concept and rules governing logarithmic and exponential functions were described
previously in the first chapter. In this section, we will review point out how to find the
derivatives of logarithmic and exponential functions.

Derivatives of Logarithmic Functions

If Y

.
See expansion from the previous discussion

12 | P a g e
j. Derivatives of exponential functions

If can easily be found by changing it to the logarithmic function.

U(x)
If Y=a , Using chain rule . As a result, we have

x
The exponential function e is a unique function with special behavior. That is.

f(x)
However, in case we have e , we can use the chain rule to evaluate its derivative. That is let
u
f (x)=u and we will have e .
u
If Y=e and U= f(x)

Thus,

Example: Find in the functions given below

13 | P a g e
Solution

Directly inserting in the formula, we get


2.2. Integral calculus: Techniques and applications of integral calculus

Up to this time, we have been concerned with finding the derivative of a function. However,
there are some problems in economics that require us to come up with the original function given
its rate of change. These problems are common in the area of social welfare, distribution of
income of a country etc. The technique of finding out the original function based on its
derivative is known as integration or anti derivative. More specifically, you will learn that the
concept of integration is the exact opposite of differentiation.
2.2.1. The Indefinite Integral

Given the primitive function F (x) which yields the derivative f (x) , you have the following
definitions.
You have the following notation that denotes the integration of f (x) with respect to x ,

 f ( x)dx
The symbol  is called the integral sign. The f (x) part is known as the integrand (that is the
function to be integrated). The dx part tells that the operation is to be carried out with respect to
the variable x .
One important point that you should note is that while the primitive function F (x) invariably
yields a unique derivative f (x) , the derived function can have an infinite number of possible
primitive functions through integration. Why? Because, if F (x) is an integral of f (x) , then so
is F (x) plus a constant for the derivative of a constant is zero.

2.2.2. Techniques of Integration & the General Power Rule


In this section, I will briefly discuss the rules of integration which shall refer to increasingly from
time to time while we are dealing with the techniques of integration.

a. The constant rule: Given constant K & C,


Example:

1.

b) Constant Multiple Rule: for


Example:

1.∫

14 | P a g e
c) Sum or Difference Rule: For
Example:

1.

d) Simple Power Rule: For for n -1.


e) Integration of Logarithmic & Exponential Functions
Like what we have done in relation to the derivative of non-algebraic functions, here we will
present the integration of the logarithmic functions first accompanied by the integration of
exponential functions. Concerning the integration of logarithmic functions, let’s begin with
example. The general power rule of integration applies only when n -1; i.e.

= iff n ≠ -1.
But, the following formula applies when n = -1. That is

= ln |u| + C

We use absolute value to protect it from being negative & the whole logarithm undefined. This

formula is a direct result of the fact that


Thus,

Similarly, when U is a function of X,

.
Example: Evaluate the indefinite integral of the following.

1)
Solution
1) Let 4x + 1 = U

15 | P a g e
Thus, ∫
The second type of non-algebraic function we saw was exponential functions. In
relation to exponential function, we have seen that

Thus,
Moreover, we have seen that when U is a function of X, that is,

give .

2.2.3. Definite Integrals

Let f(x) be a function and “a” and “b” be real numbers. The definite integral of f(x) over the
interval from X=a to X=b denoted by is the net change of an anti-derivative of f(x) over the

interval. Thus, if F(x) is an antiderivative of f(x), then we have

Note that it is used to find definite integral and not antiderivative of a function.

Now it is time to look at some examples so that the idea of definite integral and fundamental
theorem of calculus will be well planted in your mind.

Example: Evaluate the definite integrals of the following


Solution

16 | P a g e
2.2.4. Continuous Money Streams
You will formally be introduced with the notion of continuous compounding of interest in
monetary economics. If P dollars is invested at annual rate r compounded continuously, then its
value F at the end of t years is given by equation
. Solving this equation for P in terms of F gives the

Written in this from, the equation tells us the present value P of F dollars that will be received t
years from now, assuming that interest is compounded continuously at the annual rate r.
Example
Bethel knows that she will need to replace her car in 3 year. How much would she have to put in
the bank today at 8% interest compounded continually in order to have the $12,000 she expects
to need 3 years from now?
Solution
We apply the boxed formula, remembering that the interest rate of 8% must be written in its
decimal form.
.
2.2.4.1. Continuous Money Streams within a Period

There are many situations in business and industry where it is useful to think of money as
flowing continuously into an account. For example, the ABC Company plans to buy an
expensive machine which it estimates will increase the company’s net income by $10,000 per
year. But this income won’t come at the end of each year; it will come in dribbles throughout the
year. As a model for what will happen, it is convenient to think of the machine as if it will
produce income continuously. Our next example raises an important question for the company
to answer.
Suppose that the annual rate of income of an income stream is R (t).
Suppose further that this income stream will last over the period 0 < t < T. If interest is at rate r
compounded continuously, then the present value PV of this income stream is given by

Example
Suppose the ABC Company takes a more optimistic view and estimates that the new machine
will produce income at the rate R (t) = 10,000 + 200t. What is the present value of this income
stream, again assuming interest at 9% compounded continuously and a lifetime of 8 years?
Solution

17 | P a g e
Let us modify the problem in the above example in another way by assuming that the machine
will last indefinitely. More generally, let us ask for the present value PV of a perpetual (meaning
infinitely long) income stream which produces income at an annual rate R(t), assuming that
interest is at an annual rate r compounded continuously. The result in this case is

Here, we face a new kind of integral, one with an infinite limit. Such integrals are called
improper integrals and must be given a clear definition. We do this by defining

This point leads us to the concept known as improper integrals which we will handle right now!
2.2.5. Improper Integrals
With the counter-revolution of new classical macroeconomics, there has been a shift by
economists to consider an infinitely lived agent. This agent is widely used in perfect foresight-
model, rational expectation model, Ricardian equivalence theorem, endogenous economic
growth models, among others. Hence, saving and investment decision of households is governed
by the utility function agents maximize throughout, that is,

Where
U(c) is the utility function the social planner maximizes,
-B is the discount rate agents assign to future consumption,
C is percapita consumption
L is the size of population
And δ is the inverse of the intertemporal elasticity of substitution.

Now we have faced some what new, but definite integral, problem in this function. Such type of
integral is known as improper integrals. An improper integral is an integral which is bounded by

either , , or ) where c is a constant. This limit of integration is


infinite. To find out the value of this integral use the formula:

18 | P a g e
a) For

This holds if f is continuous over the internal [a, )

b) If f is continuous on the interval (- b], then

c) If f is continuous on interval ( , then

=
Note: In each three cases of above, if limit exists, then the improper integral is said to converge;
otherwise the improper integral diverges.

Examples

1. Find the value of and determine its status of convergence.

Solution

Since limit does not exist, the integral diverges.

2. Find ∫

Solution

Since limit does exist, the integral converges.

19 | P a g e
3. Now it is time for you to check whether you have grasped the concept or not. To this end, why

don’t you handle this problem? Use your paper and pen right now!

Definitions
 A matrix is defined as a rectangular array of numbers, parameters, or variables. It is usually
enclosed in brackets and sometimes in parentheses or double vertical lines.
 The members of the array are referred to as the elements of the matrix.
 The number of rows and the number of columns in a matrix together define the dimension of
the matrix. If a matrix has m rows and n columns, then that matrix is said to be of
dimension (m  n) .

Consider the above matrices. As you can see, matrix A consists of three rows and three columns.
Therefore, it is said to be of dimension (3  3) . Coming to matrices x and d , you can see that both
are of dimension (3  1) . That is both have a single column and three rows. You will have a
discussion such type of matrices in the next sub Section. You should remember that writing the
dimension of a matrix, the row number always precedes the column number.
In general, if
 a11 a12  a1n 
a a 22  a1n 
A  21
,
    
 
a m1 a m 2  a mn 
then the matrix is said to have m rows and n columns and it is of dimension (m  n) . The
subscripts of under each element of the matrix show the column and the row where each element
belongs.
2.3.1. Matrix Operations
a) Equality of Matrices
Before going into matrix operations (addition, subtraction and multiplication), let us first study
what is meant by equality of two matrices. Suppose you have two matrices. The two matrices are
said to be equal if and only if they have the same dimension and identical elements in the
corresponding locations in the arrays. Consider the following matrices.
1 3 5  2 1 1  1 3 5 

A   4 1 2  
B   4 1 2  C  4 1 2
2 3 0 2 3 0 2 3 0
It can be conclude that A  C because the two matrices have the same dimension and identical
elements. Though, matrices A and C have the same dimension, they are not equal since the

20 | P a g e
elements in their first row are different. Thus, we conclude A  B .
b) Addition and Subtraction of Matrices
Two matrices can be added if and only if they have the same dimension (i.e. they have the same
number of rows and columns). If this condition is satisfied, then the two matrices are said to be
conformable for addition. The sum of the matrices is obtained by adding the corresponding
elements of the matrices.
Similarly, subtraction of a matrix from another matrix is possible if and only if both matrices are
of the same dimension (i.e. they have the number of rows and columns). The difference between
two matrices is obtained by subtracting each corresponding element.

Generally, if
 a11 a12 a13  b11 b12 b13 
A  a 21 a 22 a 23  and B  b21
 b22 b23 
 a31 a32 a 33  b31 b32 b 33 
Then
 a11  b11 a12  b12 a13  b13 
A  B  a 21  b21 a 22  b22 a 23  b23 
 a 31  b31 a 32  b32 a 33 b33 
 a11  b11 a12  b12 a13  b13 
A  B  a 21  b21 a 22  b22 a 23  b23 
 a 31  b31 a 32  b32 a 33 b33 
Examples
a) Compute A  B and A  B given
2 4 3 1 4 5
A  and  
 1 3 1 1 2 2
Solution
Since both matrices have the same dimension, they are conformable for addition and
subtraction. Thus, we can compute A  B and A  B .

Hence,
2 4 3 1 4 5 2  1 4  4 3  5 3 8 8
A B      
1 3 1 1 2 2 1  1 3  2 1  2  2 5 3

2 4 3 1 4 5 2  1 4  4 3  5 1 0  2
A B      
1 3 1 1 2 2 1  1 3  2 1  2  0 1  1 

21 | P a g e
b) Find A  B if
2 1 1 2
A  and B   
2 1 1 2
Solution
Since the matrices have identical dimension, you can compute A  B . Hence,
2 1 1 2 2  1 1  2 3 3
A B      
2 1 1 2 2  1 1  2 3 3
b) Matrix Multiplication
i) Multiplication by a Scalar
If k is a real number and A is a matrix, then multiplying the matrix A by the number k is referred to
as the multiplication of matrix A by scalar. In this case, every element of the matrix is multiplied
by the constant k .

Examples
4 2 2 10
2 0 7 2 
a) If k  3 and A   , then find k  A .
6 5 4 7
 
1 3 1 0
Solution
4 2 2 10 3  4 3  2 3  2 3  10 12 6 6 30
2 0 7 2  3  2 3  0 3  7 3  2   6 0 21 6 
k  A  3   
6 5 4 7  3  6 3  5 3  4 3  7  18 15 12 21
     
1 3 1 0   3 1 3  3 3 1 3  0   3 9 3 0 

1  2 3 9
b) If k  and A    , find k  A .
2  6 4 5
Solution
1 1 1   3 9
1  2 3 9  2  2 2
3 9
2 
1
2 2
k A  
2 6 4 5  1  6 1 1   5
4  5 3 2 
2 2 2   2

ii) Multiplication of Matrices


Whereas a scalar can be used to multiply a matrix of any dimension, the multiplication of
matrices is dependent on the satisfaction of a requirement. Suppose that you have two matrices,
A and B , and you want to find the product AB . Then, the product of the two matrices is defined
if and only if the number of columns of matrix A is the same as the number of rows of matrix B .

22 | P a g e
In other words, in order to multiply two matrices, the first matrix (the “lead” matrix) must have
the as many columns as the rows of the second matrix (the “lag” matrix). Matrices that fulfill this
condition are said to be conformable for multiplication.
If matrix A is of dimension (m  n) and matrix B is of dimension (n  p ) , then multiplication of
matrix A by matrix B is possible because the number of columns of matrix A (that is n ) equals the
number of rows matrix B .

What is the dimension of the resulting matrix? The dimension of the resulting matrix is (m  p ) .
This implies that it has the same number of rows as the first matrix (that is A ), and the same
number of columns as the second matrix (that is B ).
Symbolically, A( mn )  B( n p )  AB( m p ) where the subscripts denote dimension

How can you multiply two matrices? So far, you have seen the conditions that must be fulfilled
for matrix multiplication. Now, you will discuss the procedure of multiplication of matrices. In
multiplying two matrices, you add the products of elements of the rows of the 1st matrix and
elements of the columns of the 2nd matrix. The sum of the products of elements of the 1st row of
the 1st matrix and elements of the 1st column of the 2nd matrix yields the 1st row-1st column
element for the product matrix. Similarly, the sum of products of elements of 1st row of the 1st
matrix and elements of the 2nd column of the 2nd matrix forms the 1st row-2nd column element for
the product matrix and so on. In other words, matrix multiplication is row by column.

For instance, you have two matrices


b11 b12 
a a12 a13 
A   11  and B  b21 b22  ,

a 21 a 22 a 23  ( 23)
b31 b32  ( 32 )
Since the column number of matrix A equals the row number of matrix B , they are conformable
for multiplication. Hence,
b11 b12 
 a11 a12 a13  c c12 
A B     b21 b22  =  11

a 21 a 22 a 23  ( 23) b

 c 21 c 22  ( 22 )
 31 b32  ( 32 )
The product AB yields another matrix with dimension (2  2) . The elements of the product matrix
are obtained as follows.
c11  a11 (b11 )  a12 (b21 )  a13 (b31 ) (that is multiplying the first row elements of matrix A
with the first column corresponding elements of matrix B and then adding the products)
c12  a11 (b12 )  a12 (b22 )  a13 (b32 ) (that is multiplying the first row elements of matrix A
with the second column corresponding elements of matrix B and then adding the products)

23 | P a g e
c 21  a 21 (b11 )  a 22 (b21 )  a 23 (b31 ) (that is multiplying the second row elements of
matrix A with the first column corresponding elements of matrix B and then adding the products)
c 22  a 21 (b12 )  a 22 (b22 )  a 23 (b32 ) (that is multiplying the second row elements of
matrix A with the first column corresponding elements of matrix B and then adding the products)

Examples
a) Given
 2 1 3 1
A  and B    , find AB.
1 3 ( 22 ) 2 0 ( 22 )
Solution
AB is defined since the condition for multiplication is satisfied. That is the column
number of the first matrix (that is A ) equals the row number of the second matrix (that is B ).The
 2 1 3 1
product matrix will have a dimension (2  2) . Therefore A B     
1 3 ( 22 ) 2 0 ( 22 )
2(3)  1(2) 2(1)  1(0) 8 2
A B    
1(3)  3(2) 1(1)  3(0)  8 1 

b) Given
1 2 
2 2 0
A  3 0 and B    , find AB.
 3 1 4  ( 23)
2 1  ( 32 )
Solution
The matrices are conformable for multiplication and the resulting matrix has a
dimension (3  3) . Thus,
1 2 
2 2 0
A  B  3 0 
3 1 4 ( 23)
2 1  ( 32 ) 

1(2)  2(3) 1(2)  2(1) 1(0)  2(4)  8 4 8 


A  B  3(2)  0(3) 3(2)  0(1) 3(0)  0(4)  6 6 0
 2(2)  1(3) 2(2)  1(1) 2(0)  1(4)  7 5 4
c) Given

24 | P a g e
 
1 1 0  0 1 0

A  0 2 1  and B  0 0 1 find, AB.
 1 
1 0 0 ( 33) 1 0
 2  ( 33)
Solution
The conformability condition for multiplication is satisfied. Hence,
 
1 1 0  0 1 0

A  B  0 2 1   0 0 1
 1 
1 0 0 ( 33) 1 0
 2  ( 33)
 1 
1(0)  1(0)  0(1) 1(1)  1(0)  0( 2 ) 1(0)  1(1)  0(0)  0 1 1
 1   1 
A  B  0(0)  2(0)  1(1) 0(1)  2(0)  1( ) 0(0)  2(1)  1(0)  1 2
 2   2
1(0)  0(0)  0(1) 1(1)  0(0)  0( ) 1(0)  0(1)  0(0)  0
1 1 0
 2 
d) Given
2 1  3
A  B  , find AB .
2 1 ( 22 ) 2 ( 21)
Solution
The conformability condition is satisfied. Hence,
2 1 3 
A B     
2 1 ( 22 ) 2 ( 21)

2(3)  1( 2)  8
A B    
2(3)  1( 2)  8
iii) Multiplication of Vectors
If A is an ( m  1) column vector and B is a (1  n) row vector, then their product yields a matrix of
dimension (m  n) .

Example
a) Given
1 
A  2 and B  4 5(12 ) , find AB .
3 ( 31)
Solution

25 | P a g e
1 
A  B  2  4 5(12 )
3 ( 31)

1(4) 1(5)   4 5 
A  B  2(4) 2(5)   8 10
3(4) 3(5)  12 15

Therefore, the product of vectors A and B yields another matrix with dimension (3  2) .
2.3.2. Determinants and Inverse of a Matrix
In the previous section, you have been introduced to the basic concepts of matrix algebra. In this
section, you will study the concept of a determinant and its use in determining the inverse of a
matrix.
2.3.2.1. The Concept of Determinants, Minor and Cofactor
a) Determinants
The determinant of a square matrix is a uniquely defined scalar (number) associated with that
matrix. The determinant of a matrix A is denoted by A . Determinants are defined only for square
matrices. The determinant of an (n  n) matrix is known as determinant of order n . How can we
find the determinant? Here are the answers.
i) Determinants of Order One
Let matrix A  a11  . That is it matrix with only one element. Then the determinant of
matrix A is A  a11 .

ii) Determinants of Order Two


Suppose matrix A is a (2  2) square matrix and it is given as
 a11 a12 
A
a 21 a 22 

The determinant of this matrix is defined as the difference of two terms as shown below:
a11 a12
A  a11 a 22  a 21 a12
a 21 a 22

That is it is obtained by multiplying the two elements in the principal diagonal of matrix A and
then subtracting the product of the two remaining elements. Since matrix A is a (2  2) matrix, the
determinant is called a second order determinant (determinant of order two).

26 | P a g e
Examples
a) Find the determinant of the matrix
10 4
A 
 8 5
Solution
Using the formula given above, the determinant of matrix A is
10 4
A  10(5)  8(4)  18
8 5

iii) Determinants of Order Three


A determinant of order three is associated with a (2  2) matrix. Given
 a11 a12 a13 
A  a 21 a 22 a 23 
 a 31 a 32 a 33 

the determinant is defined as


a11 a12 a13
a 22 a 23 a 21 a 23 a 21 a 22
A  a 21 a 22 a 23  a11  a12  a13
a 32 a 33 a 31 a 33 a 31 a 32
a 31 a 32 a 33

Here, you have a sum three terms. The vertical bracket in the first term is the determinant of the
matrix obtained after removing the 1st row and the 1st column. The second bracket is the
determinant of the matrix obtained by removing the 1st row and the 2nd column. The vertical
bracket in the third term is the determinant of the matrix obtained by removing the 1st row and
the 3rd column. The elements in each term are the 1st, 2nd and 3rd elements of the first row of
matrix A . In other words, each term is a product of a first row element and a particular second
order determinant. Therefore, you have
A  a11 (a 22 a33  a32 a 23 )  a12 (a 21a33  a31a 23 )  a13 (a 21a32  a31a 22 )
After rearranging, you get
A  a11 a 22 a33  a12 a 23 a31  a13 a 21a32  a31a 22 a13  a32 a 23 a11  a33 a 21a12

Laplace Expansion
Determinants of a (3  3) matrix can be obtained alternatively by using the Laplace expansion. In
this approach, any two rows of the matrix are written again and from an expanded matrix. The
illustration that follows shows how the determinant can be found using this approach. Consider
a (3  3) matrix given below.

27 | P a g e
 a11 a12 a13 
A  a 21 a 22 a 23 
 a 31 a 32 a 33 

Writing down the first two columns again, you get

_ _ _

a11 a12 a13 a11 a12

a 21 a 22 a 23 a 21 a 22

a 31 a 32 a 33 a 31 a 32

  

Then, you multiply the elements along the three lines falling to the right (along the downward
pointing arrows) and give all these products a plus sign. Thus, you have
a11 a 22 a 33  a12 a 23 a 31  a13 a 21 a 32
Similarly, you multiply elements along the three lines rising to the right (along the upward
pointing arrows) and give all these products a negative sign. As a result, you get
 a 31 a 22 a13  a 32 a 23 a11  a 33 a 21 a12
The sum of the above two terms is exactly the same as expression for the determinant of
matrix A shown above. That is
A  a11 a 22 a33  a12 a 23 a31  a13 a 21a32  a31a 22 a13  a32 a 23 a11  a33 a 21a12
Examples
a) Given

2 1 3
A  4 5 6 , compute the determinant.
7 8 9
Solution
2 1 3
5 6 4 6 4 5
A  4 5 6 2 1 3
8 9 7 9 7 8
7 8 9
A  2(5  9  8  6)  1(4  9  7  6)  3(4  8  7  5)

28 | P a g e
A  2(45  48)  1(36  42)  3(32  35)
A  2(3)  1(6)  3(3)
A  9

b) Minor and Cofactor


Let us begin with the concept of minor.
Suppose matrix A is given as
 a11 a12 a13 
A  a 21 a 22 a 23 
a31 a32 a33 
Hence,
a11 a12 a13
A  a 21 a 22 a 23
a31 a32 a33

Deleting the 1st row and the 1st column of the determinant of matrix A gives the
a 22 a 23
subdeterminant . This sub determinant is called the minor of element a11 and it is
a32 a33
denoted by M 11 . Therefore, the minor of element a11 is written as
a 22 a 23
M 11   a 22 a33  a32 a 23
a32 a33
In general, if matrix A is a square matrix of dimension (3  3) or more, then canceling the i th row
and the j th column of the determinant of that matrix gives the minor of the element a ij which is a
sub determinant denoted by M ij .
Having discussed what a minor is, it is time to move onto the concept of cofactor. The cofactor
denoted by Cij is obtained by
Cij  (1) i  j M ij where M ij is the minor of the element a ij obtained by canceling the
i th row and the j th column of the determinant of a particular matrix. i and j refer respectively to
the i th row and the j th column.
Therefore, this suggests that a cofactor is a minor with a prescribed algebraic sign attached to it.
For instance, considering matrix A given above, the cofactor C11 is

29 | P a g e
a 22 a 23
C11  (1)11 M 11  M 11   a 22 a33  a32 a 23
a32 a33
Similarly, the cofactor C 23
a11 a12
C 23  (1) 23 M 23   M 23    (a11 a32  a31a12 )  a31a12  a11 a32
a31 a32
Note that the minor M 23 is obtained by canceling the 2nd row and the 3rd column of the
determinant matrix A .
The determinant of a matrix can be found by using the concept of the cofactor. In this case, the
determinant of a matrix is obtained by first taking either a row or a column of that matrix and
multiplying each element in that row or column by its cofactor. Then adding these products
yields the determinant of the matrix. For instance, you can calculate the determinant of
matrix A given above as follows. Suppose you take the first row. Thus, the determinant of
matrix A is
A  a11C11  a12 C12  a13C13

Since
a 22 a 23
C11  (1)11 M 11  M 11 
a32 a33
a 21 a 23
C12  (1)1 2 M 12   M 12  
a31 a33
a 21 a 22
C13  (1)13 M 13  M 13 
a31 a32
Therefore,
a 22 a 23 a 21 a 23 a 21 a 22
A  a11  a12  a13
a32 a33 a31 a33 a31 a32

A  a11 (a 22 a33  a32 a 23 )  a12 (a 21a33  a31a 23 )  a13 (a 21a32  a31a 22 )


After rearranging, you get
A  a11 a 22 a33  a12 a 23 a31  a13 a 21a32  a31a 22 a13  a32 a 23 a11  a33 a 21a12

This exactly what you had when we defined the determinant of a (3  3) matrix.

Examples
Compute the cofactor of the element at the 2nd row-2nd column of matrix

30 | P a g e
1 2 1
A  3 4 0
2 1 1

Solution
First you need to determine the minor of the element at the 2nd row-2nd column. Thus, the
minor in this case is obtained by canceling the 2nd row-2nd column. Therefore,
1 1
M 22   (1)(1)  (2)(1)  1
2 1

Hence, the cofactor will be


1 1
C 22  (1) 2 2 M 22  M 22   (1)(1)  (2)(1)  1
2 1
2.3.2.2. The Inverse of a Matrix
If matrix A is a square matrix, then the inverse of that matrix denoted by A 1 exists if it satisfies
the property that AA 1  A 1 A  I . If the inverse exists, then matrix A is said to be invertible.
Matrix A is also an inverse of A 1 and hence both matrices are inverses of each other. Note that it
is only square matrices that can have inverses.
How do you find the inverse of a matrix? The answer follows. Suppose you have the following
matrix:
 a11 a12 a13 
A  a 21 a 22 a 23 
a31 a32 a33 
In finding the inverse of matrix A , the first step is to obtain the determinant. That is A . Secondly,
you need to find the cofactor of all elements of matrix A . Arranging them as a matrix gives you
what is known as a cofactor matrix denoted by C . That is
C11 C12 C13 
C  C 21 C 22 C 23  where C11 ,..., C nn denote the cofactors of the elements in
C31 C32 C33 
matrix A .
Thirdly, you take the transpose of the cofactor matrix C . This transpose matrix is known as the
adjoint of matrix A which is denoted by adj ( A) . Hence, we have
C11 C 21 C31 
adj ( A)  C   C12 C 22 C32 
C13 C 23 C33 

31 | P a g e
The last step in finding the inverse of matrix A is dividing the adjoint of matrix A by the
determinant. Therefore, it follows that
C11 C 21 C 31 
1 
1
A   C12 C 22 C 32 
A
C13 C 23 C 33 
The inverse of a matrix, say A , exits if and only if the determinant of that matrix is different
from zero i.e., A  0 . Otherwise, with A  0 , the inverse becomes undefined and hence, no need
to proceed with the subsequent steps of matrix inversion.
Here is the summary of the steps involved in matrix inversion. Given matrix A
1) find the determinant of A and proceed to the next steps if A  0
2) find the cofactor matrix for A
3) obtain the adjoint of matrix A
1
4) multiply the adjoint of matrix A by and this gives you the inverse of matrix A
A
Examples a) Given
3 2
A  , find the inverse.
1 0
Solution
First, find the determinant. Thus, you have
3 2
A  (3)(0)  (1)(2)  2 (Since A  0 , the inverse for matrix A is defined and
1 0
hence, you can proceed to the next steps)
Then, determining the cofactor matrix, you get
C11 C12   0  1
C  
C 21 C 22   2 3 

Note that since the matrix is a (2  2) , the minor of each element in this case is
a (1  1) determinant.
Transposing the cofactor matrix, we get the adjoint matrix
 0  2
adj ( A)  C    
 1 3 
Lastly, dividing the adjoint matrix by the determinant gives us the inverse matrix. That is
1 1  0  2  0 1 
A 1  adj ( A)     1 3
A 2  1 3    
2 2
2.3.3. Matrix Representation of Linear Equations

32 | P a g e
Having studied the basics of matrix algebra and matrix inversion, it is now time to revise and you
come across the application of matrix algebra in solving systems of linear equations. You are
expected to revise the Inverse method, the Cramer’s Rule, and the Gauss-Jordan Elimination
technique as your reading assignments from linear algebra course.
Matrix representation
Suppose you have the following system of linear equations.
a11 x1  a12 x 2  a13 x3  b1
a 21 x1  a 22 x 2  a 23 x3  b2
a 31 x1  a 32 x 2  a 33 x3  b3
The system can be expanded into matrix form as:

 a11 a12a13   x1   b1 
a a 22 a 23    x 2   b2 
 21
 a 31
a 32 a 33   x3  b3 
Now, you can calculate the unknown values by the following equation using any the above three
method:
2 x1  x 2  3
2 x1  2 x 2  4
2.3.4. Input out put analysis (Leontief model)
Input – output analysis attempts to establish equilibrium conditions under which industries a an
economy have just out put to satisfy each other demands in addition to final (outside) demand.
Commonly speaking the production of one good required the input of many other goods as
intermediate inputs in the production process. Hence, the, total demand for a product is the
summation all the intermediate demand for the product plus the final demand for the product
arising from consumers, investors, government, exporters, etc as ultimate users.
Assume in hypothetical economy there are n- numbers of sectors that produce their own
homogenous products x1 x2, -----, Xn.
Therefore, total demand (x1) = demand from inter mediate users in the economy + ultimate users
demand for their final consumption.
i,e. x1 = x11 + x12 + …….. + x1n + f1
X2 = x21 + x22 + ……. + X2n + f2
“ “ “ “
“ “ “ “
Xn = xn1 xn2 + …. + xnn + fnn
Where, xi - refers total output produced by firm i
Xij – refers total output part produced by sector i and used by sector j or
the delivery of sector i’s output to sector j as input
fi – refers output requirement of sector i for final demand.

33 | P a g e
- Let‘s introduce a new variable in to the model.
xij
aij = , where aij – refers coefficient matrix elements or
xj
an input requirement by sulfur j from sector j to produce one unit B xj
Therefore, from the above relationship
 x1  a11 a12 .... a1n 
x     x1   f1 
a a ... a 2n  
   21 22
2
 x2   f 2 
.  . 
   .   . 
.
   .  .  . 
.  .    
     xn   f n 
 x2  an1 an 2 .... ann 

When we condense the above equation, we get


x AX F
X  AX  F
I  A X  F
F
X 
IA
X= (I-A)-1F
So by using the above formula we can calculate the total out put produced by each sectors that
meet needs of inter industry (intermediate) and final users.
Example
Given a technical coefficient matrix and the vector of final demand as

 0 .3 0 .2  15 
A=   , F=   Find the equilibrium output levels of the sectors outputs
 05 0.6  20 
Solution
 x1 
x    I  A * F
1

 x2 
1 0 0 .3 0. 2
I  A      
0 1 0. 5 0.6 
 0 . 7  0 .2 
I  A 
 0 .5 0.4 

I  A1  Adj I  A
I A
So the determinant (I – A) = 0.18

34 | P a g e
 0 .4 0 .5 
C  
 0 .2 0 .7 
 0 .4 0 .2 
C'   
 0 .5 0 .7 
1  0 .4 0 .2 
Thus , I  A  1 
0.18 0.5 0.7 
Adj (I –A) = cofactors transpose (CT) . x  I  A , F
1

X1  1 0.4 0.2 15 


X      
 X 2  0.18 0.5 0.7  20
0 .4 0 .2
X1  X 15  X 20  55.5
0.18 0.18
0 .5 0 .7
X2  X  X 20  119 .4
0.18 0.18
2.3.5. Linear Programming
Up until now we have been dealing with the classical approach in order to solve optimization
problems – concepts in calculus Here up on we introduce non – classical approach of
optimization called mathematical programming. Linear programming deals with mathematical
programming problems in which both objective function and constraints are all linear.
The objective of such programming is to determine the optimal allocation of scarce resources
among competing activities in the economy. It deals commonly with optimization of linear
function subject to more than one linear inequality constraints.
In any linear programming problem, there are three essential ingredients.
A. The objective function
B. A set of technical constraints, and
C. A set of non-negativity restrictions.
In solving Linear programming problems, we have three commonly used methods
1. Graphic approach
2. Simplex method
3. Duality approach
It is known that Graphic approach is applicable only for the case of two unknowns, but simplex
method is an algebraic method of finding the extreme points of non – graph able constraints
which are supposed to result in an optimal value of the objective function.
Duality approach is frequently used method to solve minimization problems. Every
maximization /minimization problem has a corresponding minimization / maximization problem.
The original problem is called the primal and the corresponding problem is called the dual.
Every dual problem can be transformed in to its primal counter part vice versa.
Thus, for given linear programming problem we can use either of the approaches to calculate the
optimal values.

35 | P a g e
Dear Distance learners! In order to refresh your memory solve the following linear programming
problem.
Example:-
Given a Primal problem
Min C = 20Z1 + 30Z2 +16 Z3
Subject to: 2.5Z1 +3Z2 +Z3 >= 3
Z1+ 3Z2 +2Z3 >= 4
Z1, Z2, Z3 > =0
Form the dual counter part of the problem
Solution
The dual problem is = Max C* = 3x1 + 4x2
Subject to 2.5X1 + X2 <= 20
3X1 + 3X2 < =30
X1 + X2 < =16
X1, X2 >= 0 .
CHAPTER THREE
DERIVATIVE IN USE.

3.1. Higher Order Derivatives:


3.2.1. Concavity and Convexity of a Function
Since the derivative of a function is also a function, we can write dy  f ' ( x) and its derivative,
dx
d [ f ' ( x)] . We call f ' the first derivative function of the function f and the derivative of the
dx
first derivative,
d  dy 
 dx  or d  f ' ( x) or d y or f ' ' ( x) the second derivative function. Since the second
2

dx dx dx 2
derivative is also a function, we can also find its derivative.
We call d [ f ' ' ( x )] / dx the third derivative function and write it as f ' ' ' ( x ) or f ( 3) ( x )
to indicate that this function is found by three successive operation of differentiation, starting
with the function f . Of course, this process may continue indefinitely and so we use the general
(n)
notation f ( x) to indicate the n th derivative of the function f .
Example: Find the first four derivatives of the function f ( x )  x 4 .
if the first two derivatives of a function exist, we say the function is twice differentiable.
Concentrating at first and second derivative leads us to the concept of concavity and convexity.
Consider the function f ( x )  x 2 for the domain x  0 . On x  0 , this function is upward slopping
and we can see that its slope increases as x increases. This means that the first derivative function
is increasing in x, and so the derivative of this, the function’s second derivative, must be positive

36 | P a g e
valued for all x  0 . Up on finding the derivatives, we get f ' ( x )  2 x and f ' ' ( x)  2 . Thus the
second derivative is indeed positive for any value of x.
Figure 3.4: Function� � = �� (� ≥ �) and its Derivatives

�(�) � � = �2 �'(�) �' '(�)


�' � = 2�

�'' � = 2
2

� � �

Now consider the graph of the same function defined on x  0 . On this set of values the function
is negatively sloped.
f ' ( x)  2 x  0 on x  0 . The greater the value of x , the less steep is the curve in absolute values.
Figure 3.5: Function � � = �� ��� � < 0 and its derivatives

� � = �2 �(�) �' (�) �'' (�)

�'' � = 2

� � �

�' � = 2�

Defining this function on the domain R, we see that the second derivative is positive throughout.
A function with this shape, as determined by the second derivative being positive, is convex.
Definition: A twice differentiable function f (x) is convex if at all points on its
domain f ' ' ( x)  0 .According to the above definition, a linear function is convex. To exclude
linear functions, we come to the concept strict convexity. This is done by replacing the week
inequality   with strict inequality (>).
Definition: A twice differentiable function f (x) is strictly convex if f ' ' ( x)  0 except possibly at
a single point.
Notice that the function f ( x )  x 4 has the second derivative f ' ' ( x )  12 x 2 which is positive for
all x except x  0 where the second derivative becomes zero. This function is, however, strictly

37 | P a g e
convex, and hence the qualification in the above definition requirement f ' ' ( x)  0 except
possibly at one point.
We may have strictly convex functions with monotonic increasing, monotonic decreasing or
non-monotonic functions.

Figure 3.6: Illustrations of Monotonic and Non-monotonic Functions

�(�) �(�) �(�)

� � �

�' � < 0 �' � �ℎ����� ����


�' � > 0
. (���������� ����������) (��� − ��������)

Definition: A twice differentiable function f (x) is concave if f ' ' ( x)  0 on all points of is
domain.
As for the case of convex functions, a linear function satisfies the definition of concavity
( f ' ' ( x )  0 for all x ). To exclude it, we should redefine it as strict concavity.
Definition: A twice differentiable function f (x) is strictly concave if f ' ' ( x)  0 on all points of
its domain except possibly at a single point.
Alternatively, since multiplying a function by (-1) reverses the inequality, we could say that f(x)
is (strictly) concave if –f(x) is (strictly) convex.
A function whose second derivative is sometimes positive and sometimes negative is neither
convex nor concave everywhere. However, we can sometimes find intervals over which the
function is convex or concave.
1
Example: f ( x )   x 3  3 x 2  5 x  10 on x  0
3
Solution
f ' ( x)   x 2  6 x  5 and
f ' ' ( x)  2 x  6
Thus, since f ' ' ( x)  0 for x  3 and f ' ' ( x)  0 for x  3 , the function is convex on [0,3]and
concave [3, +  ].
−2 3
Activity: Show that the production function � � = 3
� + 10�2 + 5� has both a concave and a
convex section.

38 | P a g e
Note: Production functions are concave means additional labor units yield lesser and lesser
product and hence are the reflection of diminishing marginal product low. If production
functions are convex, it becomes difficult to explain it in economic background.
Example: A single input, x, is used to produce output y. show that if the production function is
1
y  x 3 , x  0, then the cost function, c(y) is convex while the production function is concave.
Solution
1
Production function f ( x )  x 3

1 23
First derivative f ' ( x )  x
3
5
2  2
Second derivative f ' ' ( x)   x 3   5  0.
9 9x 3
Thus, if x is greater than zero, so the production function is concave.
Take the short-run cost function
C ( x)  c 0  rx in terms of fixed cost c0 and variable cost x needed to produce output y. But
1
since y  x 3
 x  y 3 and we can write the cost function in terms of y.
C ( y )  c 0  ry 3
C ' ( y )  3ry 2
C ' ' ( y )  6ry  0 for y  0
And, so the cost function is convex.
Example Given , determine the slope of marginal cost at x=7.
Solution:
2
c’(x)=3x -18x, measures the slope of total cost curve and to find the slope of marginal cost
curve take the derivative of MC, which is the second order derivative of total cost function.
C”(x) =6X-18
C” (7) =6[7]-18=24 .

3.2.2. Linear Approximation


To avoid working with a complicated function we sometime try to find a simpler function that
can approximate the original one.
Since linear functions are easy to work with, we first try to find a linear approximation to a
function.
Suppose the function y  f (x) is differentiable at x  a , what is the equation of the tangent to
the graph of the function at the point (a, f (a )) .

Figure 3.7: Illustrating Graph for Linear Approximation

39 | P a g e
�(�)
Tangent Line
(�, � � )
�(�) � = �(�)


Given y  f (x) and  x1 , y1 


y  y1  b( x  x1 ) where b is slope
y  y1
b
x  x1
This is the relation for the tangent straight line.
The equation of a straight line passing through ( x1 , y1 ) and having a slope
y  y1
b is y  y1  b( x  x1 ) .
x  x1
Therefore, the equation of a tangent to the graph of y  f (x) to the point (a, f (a )) is
y  f ( a )  b( x  a )
y  f (a )  f ' (a )( x  a )
y  f (a )  f ' (a )( x  a )
If we approximate the graph of f (x) by its tangent line at x  a , the resulting approximation is
called Linear Approximation.
Examples
1) Find the linear approximation of a function f ( x )  3 x about x  1.
Solution.

40 | P a g e
1 23
f ' ( x)  x
3
1 2
f ' (1)  (1) 3  1
3 3
f (1)  1
y  f (1)  f ' (1)( x  1)
f ( x)  1  1 ( x  1)
3
1 2
f ( x)  x 
3 3
If x  1.03
1
3
x  1  ( x  1)
3
1
 1  (1.03  1)
3
 1.01
But the actual value of the original function at x  1.03 is f ( x )  3 x .
f (1.03)  3 1.03
f (1.03)  1.0099
f (1.03)  1.01
2) Find the linear approximation of a function
1
f ( x )  (1  3 x  1 x 2 ) 2
at x  0.
2 2
Answer: f ( x )  1  3 x
4
3) Find the linear approximation value for (1.001) 50 .
It can be written as a function of x as;
f ( x)  x 50 at x  1
To linearly approximate it we use the formula � � = � � + �' � (� − �)
� � = �50 ⇒ � 1 = 150
�' � = 50�49 ⇒ �' 1 = 50 149 = 50
� � = 1 + 50 � − 1 = 50� − 49
Evaluating this result around � = 1 such as x = 0.001, we get a linear estimation of
� � = 50� − 49
� 1.001 = 50 1.001 − 49 = 1.05.
But, the actual value or the correct value of � � = �50 at x = 1 is 1.0512.
3.2.3. The Differential Function

41 | P a g e
Given � = � � , then we know that �� �� = �' � . It follows that �� = �' � �� −
this is the differential of the function �(�). It measures the change in the value of Y resulting
from a small change in the value of �.
Let’s compare �� ��� ∆�.
�� and �� are proportional to each other since �� = �' � ��.
Let � changes from � → � + ��, �ℎ�� ∆� = � � + �� − � � .
Given the linear Approximation formula � � ≅ � � + �' � � − � x close to a ,
Substituting a by x and x by x + dx, we have
� � + �� = � � + �' � (� + �� − �)
� � + �� − � � = �' � ��.
∆� ��
.Therefore, in linear approximation, we approximate (estimate) ∆� by ��.


Q

R ∆� (The actual change of y)


P

��


� � + ��

When � changes by ��, the actual change in � is given by ∆� (movement along the curve). But
when we consider a tangent or linear approximation or if we approximate the function by its
tangent at point �, the change in the value of � become �� (movement along the tangent line
from � to �.
�� ≅ ∆�
The line segment �� measures the distance that shows the difference between ∆� and �� (error
term). The error term tends to be zero as �� → 0.

Rules of Differentials
All the rules of differentiation can be applied in the case of differentials.
Given two differential functions (� and �) the following rules hold true:
1) � � ∓ � = �� ∓ ��
2) � �� + �� = ��� + ���
3) � �� = ��� + ���

42 | P a g e
� ���−���
4) � � = �2
��� � ≠ 0.

All other rules of differentiations, discussed in your Calculus for Economists course you have
taken earlier, are all applicable in this context too.
Example:
Given � = (2� − 5)2 , then
�� = �' � ��
�' � = 2 2� − 5 2 = 4 2� − 5
�� = 4 2� − 5 ��.
3.2.4. Polynomial Approximation
Approximation by a linear function may be insufficiently accurate, so it is natural to try quadratic
approximation or approximation by polynomial of higher order.
1. Quadratic Approximation
We have a general quadratic formula
� � = � + � � − � + �(� − �)2 (��� � ����� �� �)
In the above equations we have three unknown coefficients A, B and C. We impose three
conditions on the polynomial to determine the value of the three unknown variables.
Assume at � = �,
� � =� �
�' � = �' �
�'' � = �'' (�)
Given � � = � + � � − � + � � − � 2
�' � = � + 2�(� − �)
�'' � = 2�
When � = �,
� � =�
�' � = �
1
�'' � = 2� ⇒ � = �'' (�)
2
Hence, the quadratic approximation of �(�) is given by
� � ≅� � =�+� �−� +� �−� 2
1
= � � + �' � � − � + �'' (�)(� − �)2
2
This is just similar to the linear approximation except one additional term
1 ''
� (�)(� − �)2
2
Examples
Find the quadratic approximation to � � = 3 � about� = 1.

43 | P a g e
Solution
1
� � =� 3
� 1 =1
1 −2
�' 1 = � 3 = 1 3
3
−5
�'' 1 = −2 9 � 3
= −2 9
Then, the quadratic approximation will be
1 1
� � = 1 + � − 1 − (� − 1)2
3 9
3
If � = 1.03, then 1.03 is quadratically approximated to be equal to
3 1 1
1.03 = 1 + 1.03 − 1 − (1.03 − 1)2 = 1.0099 which is nearer to the exact value than the
3 9
previous linear approximation.
Higher Order Approximation
Functions in higher order derivatives can be better approximated near a point by using
polynomial of higher degree.
Suppose we want to approximate a function � � around � = � of the form
� � = �0 + �1 � − � + �2(� − �)2 + �3 (� − �)3 + … + �� (� − �)�
�+1 ������������

It has � + 1 coefficients, so we impose � + 1 conditions on the polynomial to determine the


value of � + 1 unknown coefficients.
� � =� �
�' � = �' �
�'' � = �'' �
� � � = � � (�)
The implication is �(�) and its n-derivatives are equal to �(�) and its n-derivatives.
� � = �0 + �1 � − � + �2 (� − �)2 + �3 (� − �)3 + … + �� (� − �)�
�' � = �1 + 2�2 � − � + 3�3 (� − �)2 + … + ��� (� − �)�−1
�' ' � = 2�2 + 6�3 (� − �) + … + �(� − 1)�� (� − �)�−2
�' '' � = 6�3 + … + � � − 1 (� − 2)�� (� − �)�−3 .
�(�) � = � � − 1 � − 2 … 2 (1)��
When � = �,
� � = �0 = 0! �0
�' � = �1 = 1! �1
�'' � = 2�2 = 2! �2
�''' � = 6�3 = 3! �3

44 | P a g e
� � � = � � − 1 � − 2 …. 2 1 �� = �! �� .
Therefore,
�(�)
� � = 0! �0 = � � ⇒ �0 =
0!
�'(�)
�' � = 1! �1 = �' (�) ⇒ �1 =
1!
�''(�)
�'' � = 2! �2 = �'' � ⇒ �2 =
2!
�'''(�)
�''' � = 3! �3 = �''' � ⇒ �3 =
3!
Type equation here.
�(�) (�)
.� � � = �! �� = �(�) � ⇒ �� = �!
Substituting the values on the coefficients we get the following approximation to � � �� � =
� by a nth degree polynomial.

�' � (� − �) �' ' � (� − �)2 �' '' � (� − �)3 �(�) � (� − �)�


� � ≅ �(�) = � � + + +…+
1! 2! 3! �!

�������: Find the third degree polynomial approximation of


f x = 1 + x about x = 0.
Solution:
� 0 =1
� 0 =1 2
'

�'' 0 = −1 4
�''' 0 = 3 8
'' � �−� 3
�' � � − � �' ' � � − � 2 �'
� � =� � + +
1! 2! 3!
1 1
− 4 2 3
=1+ �+ � + 8 �3
2 2 6
1 1 2 1
Therefore, � � = 1 + � − � + �3
2 8 16
3.2.5. Estimation of Functions (Mclaurin and Taylor series.)
This is expansion of a function �(�)by Mclaurin and Taylor series.

(a) Mclaurin Series


This is expansion of the function � � around � = 0. Let’s consider an expansion of ��ℎ degree
polynomial by Mclaurin series.

45 | P a g e
� � = �0 + �1 � + �2 �2 + �3 �3 + … + �� ��
Successive differentiation of a function yields:
�' (�) = �1 + 2�2 � + 3�3 �2 + … + ��� ��−1
�'' (�) = 2�2 + 6�3 � + … + �(� − 1)�� ��−2
�'' '(�) = 6�3 + … + � � − 1 (� − 2)�� ��−3
�(�) (�) = � � − 1 � − 2 � − 3 … 3 2 (1)��
Evaluating the Maclaurin expansion around � = 0 yields
�(0)
� 0 = �0 = 0! �0 ⇒ �0 =
0!
�'(0)
�' 0 = �1 = 1! �1 ⇒ �1 =
1!
�''(0)
�'' 0 = �2 = 2! �2 ⇒ �2 =
2!
�'''(0)
�''' 0 = �3 = 3! �3 ⇒ �3 =
3!
(�)
� (0)
� � 0 = �! �� ⇒ �� =
�!
Therefore, the above polynomial function can be rewritten as
� 0 �' 0 �' ' 0 2 �' '' 0 3 �(�) 0 �
� � = + �+ � + � +…+ �
0! 1! 2! 3! �!
This power series representation is called Mclaurin Series.
Note: Expansion using Mclaurin series yields exactly the same function as the original one.
Example 1: Find the Mclaurin series for the following function
� � = 8 + 36� + 12�2 + �3
��������
Given the above function, evaluating it around � = 0 based on Mclaurinseries,
we expand until the third derivative, because the highest power is 3.
� 0 =8
� 0 = 36 + 24 0 + 3 0 2 = 36
'

�'' 0 = 24 + 6 0 = 24
�''' 0 = 6
In the Maclaurin series
� 0 �' 0 �' ' 0 2 �' '' 0 3
� � = + �+ � + �
0! 1! 2! 3!
8 36 24 6
= + � + �2 + �3
0! 1! 2! 3!
� � = 8 + 36� + 12� + �3 .
2

Taylor’s Series
This is expansion of a function around � = �0 . Suppose now we want to expand an nth degree
polynomial at some arbitrary point � = �0 .

46 | P a g e
� � = �0 + �1 � + �2 �2 + �3 �3 + … + �� ��
Rewriting the function in terms of the variable (� − �0 ), we get
� � = �0 + �1 (� − �0 ) + �2 � − �0 2 + �3 � − �0 3 + … + �� (� − �0 )�
Let’s find the successive derivatives as follows
�' � = �1 + 2�2 (� − �0 ) + 3�3 � − �0 2 + … + ��� (� − �0 )�−1
�'' � = 2�2 + 6�3 (� − �0 ) + … + �(� − 1)�� (� − �0 )�−2
�''' � = 6�3 + … + � � − 1 (� − 2)�� (� − �0 )�−3
.

� � = � � − 1 � − 2 …. (3) 2 1 ��

We then find the derivatives at � = �0 as follows:


� �0
� �0 = �0 = 0! �0 ⇒ �0 = 0!
�' �0
�' �0 = �1 = 1! �1 ⇒ �1 = 1!
�'' �0
�'' �0 = 2�2 = 2! �2 ⇒ �2 = 2!
�''' �0
�''' �0 = 6�3 = 3! �3 ⇒ �3 = 3!
.

� � �0
� � = � � − 1 � − 2 …. 3 2 1 �� = �! �� ⇒ �� =
�!
Let’s substitute the values of the coefficients on the original equation:
�(�0 ) �' �0 �'' (�0 ) 2
�''' (�0 )
� � = + � − �0 + (� − �0 ) + (� − �0 )3 + …
0! 1! 2! 3!
�(�) (�0 )
+ (� − �0 )�
4!
This power series representation is known as Taylor’s series representation.
Example: Find the Taylor’s series expansion for � � = �� ������ �ℎ� ����� �0 = 0.
Solution
Given � � = �� evaluating the function at �0 = 0, we get
� 0 = �0 = 1
The derivative of �� = �� and never changes for successive derivatives so that
�' 0 = �0 = 1, �'' 0 = 1 and continues.
The Taylor’s series expansion will be
�(�0 ) �' �0 �'' (�0 ) �(�) (�0 )
� � = + � − �0 + (� − �0)2 + … + (� − �0 )�
0! 1! 2! 4!
� 0 �' 0 �'' 0 �� 0
= + � + � 2+…+ � �
0! 1! 2! 4!
�2 �3 ��
=1+�+ + +…+
2 6 �!
Expansion of Arbitrary Function (Taylor’s Theorem)
47 | P a g e
According to Taylor’s theorem, given an arbitrary function ∅(�) , which is continuous with a
finite power n, its expansion around the point �0 yields.
∅(�0 ) ∅' �0 ∅'' (�0 ) ∅(�) (�0 )
∅ � = + � − �0 + (� − �0 )2 + … + (� − �0 )� + ��
0! 1! 2! 4!
∅ � = �� + �� �ℎ��� �� �� ���������� ������������� ��� �� �� ���������.
If the arbitrary function ∅ � is an ��ℎ degree polynomial and the expansion is in to another ��ℎ
degree polynomial is to be expanded in to a lesser degree the latter can only be considered as an
approximation and the remainder is bound to appear.
Activity: Expand the function � � = 2 + 3� + 4�2 + 3�2 by using Taylor' s series
to a third degree polynomial around the point � = 1.
3.2.6The Intermediate Value Theorem
Let �(�) be a function that is continuous for all � in the interval [�, �]and assume that �(�) ≠
�(�). As � varies between � and �, so �(�) takes on every value between �(�) and �(�).

�(�)

�(�)
� �

� �


� � �

The theorem implies that the graph of the continuous function �(�) must intersect the line � =
� = �(�) at least at one point (�, �).
An important result of the theorem is let �(�) be a function continuous in (�, �) and assume that
�(�) and �(�) have different signs, then there is at least one � ∈ �, � such that � � = 0.
Newton’s Method
The intermediate value theorem shows that a given equation �(�) has a solution in a given
interval; however, it doesn’t provide additional information about the location of the solution.

48 | P a g e
�(�)

�(�0) (�0 , � �0 )

�0 �1 �

By constructing a tangent at (�0 , � �0 ) , we obtain �1 . Repeat the procedure by constructing


another tangent line at (�1 , � �1 ) and then equation �1 derived by the Newton’s Method.
What is the equation of the tangent at (�0 , � �0 )?
� = � �0 ; �0 = � �0
� − �0 = �(� − �0 )
At �0 , � �0 , we have
� − � �0 = � � − �0 but � = �' �0 so that we can rewrite it as
� − � �0 = �' �0 � − �0
And at � = �1 , the tangent line intersects the x axis and hence the value of y assumes zero, so
that it can be written as
0 − � �0 = �' �0 �1 − �0
− � �0 = �1 �' �0 − �0 �' �0
�1 �' �0 =− � �0 + �0 �' �0
� �0
�1 = �0 −
�' �0
If we compute for �2 , �3 and continue in the same way, we get
�(�1 )
�2 = �1 − '
� (�1 )
�(�2 )
�3 = �2 − '
� (�2 )
In general the points generated by Newton’s method are given by
�(�� )
��+1 = �� − '
� (�� )
Example: Find an approximate value of the function �15 = 2 using Newton’s method twice.
Solution

49 | P a g e
First of all, for an intermediate solution we need to equate � � = 0 so that our reformulated
equation is written as
� � = �15 − 2 = 0
Besides, we have to try manually to find the nearest integer that makes our function to be
approximately equal to zero and that integer is 1 so that our �0 = 1.
Finding and evaluating the function and its first derivative at the initial vale of � (�0 = 1), we
get:
� 1 = �15 − 2 = 115 − 2 =− 1
�' � = 15�14 ⟹ �' 1 = 15(1)14 = 15
Hence, we find for �1 using the Newton’s method formula once:
� �0 −1 16
�1 = �0 − =1− =
�' �0 15 15
16
Finding and evaluating the function and its first derivative at (�1 = 15 ), we get:
16 15
16 15
� =� −2= − 2 = 0.633
15 15
16 16
�' � = 15�14 ⟹ �' = 15( )14 = 37.025
15 15
Once we find �1 it is easier to find �2 using Newton’s method formula twice as follows:
� �1 16 0.633
�2 = �1 − = − = 1.05
�' �1 15 37.025
Therefore, the approximate value of � in the equation �15 = 2 using Newton’s method twice is
found to be equal to 1.05 to the hundredth digit.
3.3. Multivariate Calculus
Calculus for Multivariate Functions
Though functions of singe variable have large simplicity, they are not commonly used in
economics. They are not common since the functions that the real economic world consisted of
are multivariate. For example, our rational consumer will not consume one commodity only;
rather, s/he consumes different combinations to maximize her/his utility. The representative
production function of a firm is not only a function of labor but it is also a function of capital.
The overall income of nations is not a function of physical capital stocks only. It is also a
function of human capital, factor productivity, institutions, and government policies. Hence, we
have to extend the concepts of univariate calculus to multivariate ones. For the time being, let’s
begin with a function of two independent variables so that we can simply extend it to
n-variables case.
Let’s take the Cobb Douglas production function of a hypothetical firm.

Where:
Y-is the level of output
A-is the state of technology, which reflects productivity of factors
50 | P a g e
L & K represent the units of labor and capital
are elasticity of y with respect to labor and capital respectively
Example

Suppose for a firm , then


a. How many units output will be produced if the firm uses 81 labor and 16 capital
b. Show that the production function is constant returns to scale
Solution
3/4 1/4
a. Y=f (L, K)=60 (81) (16) =3240

b. To check for returns to scale, add the exponents of labor and capital. If the sum is
greater than one it is increasing returns to scale, less than one it is decreasing returns to scale and
if it is equal to one it is constant returns to scale. Since 3/4 +1/4=1, it is constant returns to scale.
For the proof, you can see the following clarification.

Proof: For a general production function, when we change all the inputs by a given proportion if
output changes by equal proportion, then the production function illustrates constant returns to
scale. If it changes by greater/smaller proportion, the production function illustrates
increasing/decreasing returns to scale respectively.

For Yand, Let Yo=


Assume labor and capital have changed by λ, the new labor and capital will be λL, and λK
respectively.
So the new production function will be,

α α β β
=Aλ L λ K
α+β
=A(λ )
α+β
=(λ )A
α+β
=(λ )Y0

If α+β >1, it shows increasing returns to scale, if α+β <1, it shows decreasing returns to scale,
and if α+β =1, it is constant returns to scale.

51 | P a g e
3.3.1. Partial Differentiation
The format � = �(�) represents a function of one variable. Given � = �(�, �) , which is a
function of two variables.
First order partial derivatives are given as:
��
�� = = �� (x, y)
��
��
�� = = �� (x, y)
��
Second order partial derivatives
We have two types of second order partial derivatives
1. Direct Second Order Partial Derivatives
���
��� = = ��� (�, �)
��
���
��� = = ��� (�, �)
��
2. Mixed (Cross) Second Order Partial Derivatives
���
��� = = ��� (�, �)
��
���
��� = = ��� (�, �)
��
Young’s Theorem
The mixed (cross) partials for a given function will always be equal if both cross-partials exist
and are continuous.
Given a function �(�1 , �2 ), then young’s theorem asserts that �12 = �21 or alternatively can be
written as ��1�2 = ��2�1 .
Example: Given a function � = � �1 , �2 = �1 ���2, show that Young’s theorem is true.
Solution
The first task will be to find the first and second derivatives of the functions one by one and
secondly to check the equality of the second mixed partials.
�1
�1 = ���2 �2 =
�2
− �1
�11 = 0 �22 = 2
�2
−1 1
�12 = �21 =
�2 �2
So, as Young’s theorem predicts that �12 = �21 and our result also confirmed that is the
case.Activity 1: Given the production function is given by � = � �, � = �� �� . Prove that
Young’s theorem is true.
2
Activity : For the function� = � �1, �2 = �1 ��1+�1 , show that�12 = �21 .

52 | P a g e
3.3.2The Multivariate Chain Rule
Many economic models involve composite functions. Differentiations of composite functions
require the application of chain rule.
��
(i) For functions of one variable � = � � and � = � � then �� is given by
�� �� ��
= .
�� �� ��

(ii) Multivariate Functions (a simple case)


Suppose � = � �, � , � = � � and � = � � , then the multivariate chain rule is given as
�� �� �� �� ��
= . + .
�� �� �� �� ��
��
Example: Suppose � = ln (�1 + �2 ) where �1 = 2� ��� �2 = �2 , find �� .
Solution
�� �� ��1 �� ��2
= . + .
�� ��1 �� ��2 ��
1 1
= 2 + (2�)
�1 + �2 �1 + �2
1
= 2 + 2�
�1 + �2
��
To write the result of �� in terms of only �, substitute � for �’�:
1
= (2 + 2�)
2� + �2
Another alternative method to find this result is by direct substitution, that is:
� = ln (2� + �2 )
��
Then, we can directly find �� using direct differentiation as follows:
�� 1
= (2 + 2�)
�� 2� + �2
��
Activity: Suppose � = � �, � = �2 �� ��� � = ���, � = �2 . Find �� using the chain rule.
3.3.2. Homogeneous Function and Euler’s Theorem
Definition: A function � = �(�1 , �2 , …, �� ) is said to be homogeneous of degree r if and only if
� ��1 , ��2 , …, ��� = �� � �1 , �2 , …, �� = �� �.
Multiplying all the independent variable by a factor � will multiply the value of the function by
the factor �� .
� 2�
Example 1: Given� �, �, � = � + 3�, check whether the function is homogeneous or not.
Solution
First, let’s multiply the function by t:

53 | P a g e
�� 2��
� ��, ��, �� = +
�� 3��
� � 2�
= + = �0 � �, �, �
� � 3�
Therefore, the function is homogeneous of degree zero.
Example 2: Consider the Cobb-Douglas production function � = � �, � = �� �1−� .
� ��, �� = �� � �� 1−�
= �� ���1−� �1−�
= ��+1−� �� �1−�
= �1 ���1−�
= �1 �(�, �)
Therefore, this function is homogeneous of degree 1. Such functions are also called linear
homogeneous functions.
Activity: Check the homogeneity of � �, � = 3�2 � − �3 .
In economics, the degree of homogeneity of a function has one important implication. That is,
given � ��1 , ��2 , …, ��� = �� � �1 , �2 , …, ��
If � = 1 ⇒ Constant returns to scale
If � > 1 ⇒ Increasing returns to scale
If � < 1 ⇒ Decreasing returns to scale
Euler’s Theorem
Euler’s theorem states that if a homogeneous function has a degree r, then
�� �� ��
. �1 + . �2 + … + � = ��(�1 , �2 , …, �� )
��1 ��2 ��� �
Where, � measures the degree of homogeneity.
Example:
Suppose � �, � = �4 + �2 �2 , then check the Euler’s theorem.
Solution:
According to Euler’s theorem �� . � + �� . � is equal to some value (r) times the original function
and the question is what is this value.
��
= �� = 4�3 + 2��2
��
��
= �� = 2�2�
��
Then, �� . � + �� . � = 4�3 + 2��2 � + 2�2 � �
= 4�3 + 2�2 �2 + 2�2�2
= 4�3 + 4�2 �2
= 4 �3 + �2 �2
= 4(� �, � )

54 | P a g e
Hence, since the function is homogeneous of degree one, we call it linear homogeneous function
CHAPTER FOUR
UNCONSTRAINED OPTIMIZATION
4.1. Functions with one variable
Given any function Yas a function of x= f(x), then the derivative version of
the function (First Order Condition or FOC) , as described earlier, is stated in
the form dy/dx , y’ or f ’(x) is the instantaneous rate of change of f (X) which

is given by:

4.1.1. The concept of optimum (extreme) value


An extreme value of a function is the maxima or minima value of that function. We can have
two types of extreme values.

a. Relative extreme values (Local extreme)


b. Absolute extreme values (Global extreme)

Relative Extreme Value


Relative extreme value: is an extreme value when compared with the values near to it. It can be
relative maxima (when it is higher than any other point near to it) or relative minima (when it is
lower than any value near to it)

For example, consider the function .

Its graph is:

55 | P a g e
In general, if a function f has relative maxima at if there exists an interval around x0 on

which for all x in the interval

If a function f has relative minima at if there exist an interval around on which


for all x in the interval
It is easy to find out relative extreme values using the derivative. Previously we have seen that a
function changes from increasing to decreasing and vice versa, when its first derivative is zero or
undefined. Since a function change from decreasing to an increasing at a relative minimum, we
have is undefined at relative minima or maxima.

Hence, we can find relative maxima and minima for a function by finding the values of x for
which is undefined. Once we get the critical value, we can use the behavior
of f ‘(x) near the extreme point to know whether it is a relative minima or relative maxima point.

To find out relative minima or maxima,


st
1 Find
nd
2 Solve for the value of x which will make
rd
3 Use the above value to get the extreme value of the function (i.e, insert the value you got in
nd
the 2 step in f(x) to get the extreme value).
th
4 Evaluate the first derivative function at some values of x to the teft and the right of each
critical point

a. If to the left and to the right of the critical value of f(x), the critical
point is relative maximum
b. If to the left and to the right of the critical value of f(x), the critical point is relative
minima.
Example:

Find the relative extrema as of the following functions

Solution

56 | P a g e
=x -

=-7

Thus, the critical values are ( . To find out which one relative minima or
maxima, use the number line.

Thus, ( is relative minima.


Thus, is relative maxima.

b.

57 | P a g e
Critical values are (1,4) and (-1,8).

+ 0 - - 0 +

-1 1

Relative maxima at(-1,8) Relative minima at (1,4)


nd
Note that instead of taking the number line to check for maxima or minima, we can use the 2
derivative. If  is the critical value which makes , then the point (  ,f(  )) is relative
nd
maxima if and relative minima if . But when we can’t use the 2
derivative test and we have to use the behavior of the first derivative to check for relative optima.
nd
Example: Use the 2 derivative test of the previous function to know whether they are relative
maxima or relative minima.

a.

b.

c. .

a. critical values are (-1,11/3) and (3,-7)

is relative maxima

is relative minima.

b.

critical values x=1 (1,3)

Use the number line in this case.

58 | P a g e
C. ,

Critical values (1,4) and (-1,8)

Use the first derivative test in this case.

Absolute Extrema
Absolute extreme value: is an extreme value that the function attains throughout its
domain. The possible largest value that the function attains in its domain is called absolute
maxima of the function. Similarly, the smallest possible value that the function attains
throughout its domain is known as the absolute minima of the function. Note that a function may
(or may not) have absolute extreme value like there may or may not exist relative extreme value.

For example consider a function y=2x+1. As we increase the value of x, the value of y increases
continuously. Again, when we decrease the value of x, the value of y also decreases
continuously. However, if the domain is restricted to a closed interval [a, b] which means and
the function is continuous everywhere within the interval, the function will have an absolute
minima and absolute maxima within the interval. They may be located either:

a. At the end points; i.e., a and b or


b. At the interior points
To find absolute extrema within an interval

a. Find the critical values and critical points and check whether they are within the
interval
b. Calculate f(a) and f(b)
c. The largest value and smallest value obtained from the above steps represent the
absolute maxima and the absolute minima of the function over the interval
respectively.
Example
1. Find the absolute maxima and minima of over the interval
Solution

f’(x)=0

59 | P a g e
There is absolute maxima at x=0 and absolute minima at x=3 within [0, 5].

2. Find the absolute extrema of Over the interval

[-4, 2]

Solution

Absolute maxima at x=2 (2, 24) and absolute minima at x=-4 (-4, -12)

4.2. Optimization of a Function with More Than One Variable

Dear student! You have studied optimization of a function in section 4.1 in which case the
discussion was restricted to a function with a single explanatory variable. But, economic
phenomena in reality involve the study and analysis of large number of variables simultaneously.
Therefore, in this section we extend the discussion of optimization to a function with more than a
single explanatory variable.

4.2.1. First and Second Order Conditions for a Function with Two Explanatory Variables

This section is a simple extension to your study on optimization of a function with a single
variable. You will begin your study with the first order necessary condition for optimization of a
function of two explanatory variables and then discuss the second order sufficient condition.
Before going into the discussion, you should note that a maximum and a minimum in this section
refer to a relative maximum and a relative minimum.

a) First Order Condition for a Function with Two Explanatory Variables


What do you think about the first and second order conditions for a function with two
explanatory variables? You can use the space below to write your answer.

60 | P a g e
Consider the function z  f ( x, y ) . In this case, the first order necessary condition for an
extremum (either a maximum or a minimum) requires the first partial derivatives of the
function z  f ( x, y ) equal to zero simultaneously.

This involves finding the values of x and y so that the first partial derivatives are zero at same
z z
time. That is f x  f y  0 or   0.
x y

Can you make a conclusion based upon this first order condition for an extremum? Or, is it
possible for you to conclude whether a particular point is a maximum or a minimum on the basis
of the first order condition? The answer is no because of the presence of points which satisfy the
first order condition but neither a maximum nor a minimum. One good example is the case of an
inflection point. Therefore, similar to what you have studied in the case of a single variable
optimization, the first order condition is only a necessary but not a sufficient condition. As a
result, you need to make use of additional criteria. With this comes the second order test.

b) Second Order Condition.


The second order condition for the function z  f ( x, y ) involves restrictions on the signs
of the second partial derivatives: f xx , f yy and f xy . The second order condition which is
also known as the sufficient condition follows below.

Let ( x0 , y 0 ) be the critical values of the function (i.e., where f x  f y  0 ) and the second partial
derivatives are evaluated at these points.

a) f ( x0 , y 0 ) is a relative maximum if and only if f xx , f yy  0 and f xx f yy  f xy2 .


b) f ( x0 , y 0 ) is a relative minimum if and only if f xx , f yy  0 and f xx f yy  f xy2 .
c) f ( x0 , y 0 ) is a saddle point if f xx f yy  f xy2 , and f xx and f yy have different signs.
d) f ( x0 , y 0 ) is an inflection point if f xx f yy  f xy2 , and f xx and f yy have the same signs.
e) f ( x0 , y 0 ) could be a relative maximum, a relative minimum, or a saddle point
if f xx f yy  f xy2 . That is the second order test fails to tell the behavior of the function.The
table below summarizes both the first and the second order conditions for relative
extremum of the function z  f ( x, y ) .
Table 4.1 Conditions for Relative Extremum

Condition Maximum Minimum

61 | P a g e
First order necessary condition fx  fy  0 fx  fy  0

Second order sufficient condition

f xx , f yy  0 and f xx , f yy  0 and f xx f yy  f xy2


f xx f yy  f xy2

Generally, in finding the relative maximum/minimum of a function with two explanatory


variables, you need to find first the critical values of that function using the first order necessary
condition and then carry on the second order test to determine if the critical values are either
maximum, minimum, saddle or inflection points.

Examples

a) Find the critical values of the function z  x 2  2 y 2 and determine whether they are maximum
or minimum.

Solution

Using the first order condition, you can find the critical values. Thus, you need to find the first
partial derivatives, f x and f y .

z
fx   2x
x

z
fy   4y
y

Then, find the values of x and y (the critical or optimal values) which satisfy the
condition f x  f y  0 . Hence,

2 x  4 y  0 for x  0 and y  0 , i.e., the critical point in this case is (0,0)

Now, you go to the second order test. Here, you need to find the second order partial derivatives.

2z
f xx  2
x 2

2z
f yy  2 4
x

62 | P a g e
2z
f xy  0
x 2

Evaluating the second partial derivatives at the critical value, you have

f xx (0,0)  2

f yy (0,0)  4

f xy (0,0)  0

2
Therefore, f xx  0 and f yy  0 , and f xx f yy  f xy . Hence, you conclude that the critical
value (0,0) is a relative minimum.

b) Given the function z  2 x 2  xy  3 y 2 , find the stationary values and examine whether they
are maximum or minimum.

Solution

First order condition

The first partial derivatives are

f x  4x  y

f y  6y  x

The first order condition requires f x  f y  0 . Thus, you have

4x  y  0

6y  x  0

Solving the above equations simultaneously, you get x  0 and y  0 and hence, the critical value
is (0,0) .

Second order condition

The second partial derivatives are

f xx  4

f yy  6

63 | P a g e
f xy  1

2 2
Since f xx  0 and f yy  0 , and f xx f yy  f xy (because f xx f yy  24 and f xy  1 ), the critical
value (0,0) is a relative minimum.

4.2.2. Deterimental Test for Sign Definiteness:

Second Order Condition Determinant Test

A form of a polynomial expression in which each component term has a unique degree

i.e the sum of components in each term is uniform

e.g. W = f(x, y, z) = ax –by +z- is linear function

w = f(x, y, z) = ax2+bxy +cz2 – quadratique differential

d2z=fxxdx2+2fxydxdy+fyydy2- is total defferential

Let g = d2 z , U = dx, V = dy, a = fxx , b = fyy, h= fxy = fyx

g= au2 + 2h uv + bv2

Different cases

2. If g>o – If positive definite, Sufficient condition for minimum


3. If g>o - Positive semi definite , Necessary condition for minimum
4. If g<o - If Negative definite, Sufficient condition for maximum
5. If g<o - If negative semi definite, necessary condition for maximum
If g changes sings when the variables assume different values, g is said to be indefinite

We can check whether the function (g) is definite or indefinite using determinant test –
determinant test for sign definiteness.

g = a u2 + 2hu+bv2

= au2+huv+ bv2

= u(au+hv) + v(hu+bv)

Note for two matrices to be multiplied, conformability condition must hold true Thus,

64 | P a g e
a h  u 
 u v 1x 2  h b   v 
  2 x 2   2 x1
u 
 ua  vh uh  vb  
v 
 u 2  huv  uhv  bv 2
a h

h b

Therefore, the above determinant is called Hessian Determinant.

The first principal minor of Hessian (H1) is the matrix which contains the first one principal
diagonal element; the second principal minor of Hessian (H2) is a matrix which contains the first
two principal diagonal elements, etc.

a h f xx f xy

h b f yx f yy
H 1  f xx  f xx
f xx f xy
H2   f xx f yy   f xy , f yx 
f yx f yy

Notes

1. the second order differential d2z>0 iff /H1/ >o, /H2/ >o, it is positive definite and shows
the minimum point
2. d2z < o, iff /H1/ <o, but /H2/ >o, - negative definite and it refers maximum point
3. d2z is indefinite iff /H2/ <o
Example

Optimize the following function by using the Hessian determinant and state sign definiteness.
Find the maximum or maximum value.

Z = f(x, y) = 6x2 – 9x-3xy-7y + 5y2

Solution

F.O.C : Zx =12x-9-3y =0

Zy = -3x -7 +10y = 0

BY solving simultaneously, the critical values will be:

65 | P a g e
x=1,y=1

S.O.C : Zxx = 12

Zxy = Zyx = -3

Zyy = 10

Zxx Zxy 12  3
H 
Zyy Zyy  3 10
H 1  Zxx  12  o
Zxx Zxy
H2   Zxx Z  Zxy , Zyx 
Zyx Zyy
 12 x10  (3 x  3)
 120  9
 111  0

Since both /H1/and /H2/are greater than Zero/ the function is at minimum point.

The sign definiteness is Positive, because /H1/ >0, /H2/> 0

The minimum value = 6(1)2- 9 (1) -3(1x1) – 7(1) + 5 (1)2 = -8

The case of three variables Quadratic form

This approach can also be applied to more than three choice variables. Given Z = f(X1, X2, X3),
f ll f 12 f 13
H  f 21 f 22 f 23
f 31 f 32 f 33
H 1  f 11  f 11
then the form of Hessian Determinant will be: .
H 2  f 11 f 22  f 21 f 12
f 11 f 12 f 13
H  H 3  f 21 f 22 f 23
f 31 f 32 f 33

d 2 z -is positive definite iff /H1/>0, /H2/ > 0,/H3/>0

Shows minimum point


66 | P a g e
d 2 z -is negative definite iff /H1/< 0, /H2/ > 0, /H3/< 0

Shows maximum point

For the Case of n - variables, Z = f(x1, x2, x3 ...Xn)

f 11 f 12 f 13 .......... f 1 n
f 21 f 22 f 23 ...... f 2 n
H 
f 31 f 32 f 33 ....... f 3n
f n1 f n2 f n 3 ....... f nn

d 2 z -is positive definite iff, /H1/>0, /H2/ > 0 ,/H3/>0 …. /Hn/= /H/

The function is at its minimum point

d 2 z -is negative definite iff, /H1/<0, /H2/ >0,/H3/ <0 …./Hn/ = /H/

Where the function is at maximum

Example

A firm produces three types of products Q1, Q2, and Q3. And a profit function is given by:

Л = F (Q1, Q2, Q3) = 180Q1 + 200 Q2 +150Q3-3Q1Q2 -2Q2Q3-2Q1Q3-4Q12-5Q22-4Q32.

Try to optimize the function and find the critical values which make the function at maximum or
minimum point and also the maximum or minimum value.

Tell type of definiteness.

Solution

1st – find FOC

Л1 = 180-3Q2 -2Q3 -8Q1 = 0 8 8Q1 – 3Q2 -2Q3 = -180

Л2 =200-3Q1 -2Q3 -10Q2 = 0 3Q1 – 10Q2 -2Q3 = -200

Л3 = 150-2Q2 -2Q1 -8Q3 = 0 2Q1 – 2Q2 -8Q3 = -150

2nd_ Form a cœfficient matrix, A

67 | P a g e
 8  3  2  Q1  180 
A=  3  10  2 Q    200
 2  
 2  2  8  Q3  150 

Here if is difficult to use simultaneous method to solve the critical values. Thus, instead we use
Cramer’s rule.

68 | P a g e
3rd _ find the determinant of coefficient matrix,
 8  3  2  Q1   180 
A   3  10  2 Q 2   200
 2  2  8  Q3   150 
A   880  4   324  4   26  20 
 876   60  28
  608  88
  520  0
 180  3  2 
A1
Q1  , A1   200  10  2
A
 150  2  8 
A1   180 (80  4)  3(1600  300)  2 (400  1500)
  180 (76)  3900  2200
  13,680  6100
  7580
A1 7580
So Q1   14.58
A  520
 8  180  2 
A1
Similarily , Q2  , A2  3  200  2
A
 2  150  8 
A2   6900
A2 6900
Q2    13.27
A  520
8 3 180 
A
Q3 , A3  3 10 200
A
2 2 150 
A3   6130
A2 6900
Q2   13.27
A  520
8 3 180 
A. / A3 /
Q3  , A3  3 10 200
A
2 2 150 

69 | P a g e
A3   6130
A3  6130
Q2    11 .79
A  520

Therefore the optimal values that optimize the profit function are

Q1 = 14.58, Q2 = 13, 27 and Q3 = 11.79.

4th – S, O, C – to check the profit is really maximized or not

 11  8  12  3 ,  13   2
 21  3  22   10,  23   2
 31   2,  32   2 ,  33   8

5th -Form Hessian matrix

 11  12  13
H1   21  22  23
 31  32  33
8 3 2
  3  10  2
2 2 8
H1   8   8  0
8 3
H2   71  0
 3  10
8 3 2
H 3  H   3 10  2
2 2 8
H 3  H  A   520  0

Since /H1/ <0, /H2/>0 and /H3/ <0, we can conclude that profit is at maximum when evaluated at
optimal values & Q1 = 14.58 Q2 = 13.27 and Q3 = 11.79 and implies negative definiteness.

The maximum profit is obtained by substituting the optimal values in the original profit function
(π)

π=180(14.58)+200(13.27)+150(11.79)-3(14.58x13.27)-2(13.27x11.79) 2(14.58x11.79)-
4(14.58)2 -5 (13.27)2 -4(11.79)2
70 | P a g e
=2624.4+2654+1768.5-580.43-312.91-343.80-850.31-880.46-556.02

=7046.90-3523.93

=3522.97.

4.3 Implicit functions and unconstrained Envelope theory


4.3.1. Implicit functions
Consider two choice variables optimization model: Y = f(x1,x2) – explicitly shows that the
optimization of Y depends only on x1 and x2. However, in reality optimization of Y is affected by
some implicit parameters and it is possible to optimize Y by assuming that these parameters are
held constant.

Optimization problems in economics usually involve functions that depend on a number of


parameters like prices, tax rates, etc. Although these parameters are held constant during
optimization they do vary according to the economic situations.

Let’s consider the effect of change in one parameter on the optimal solution of a given equation
Y = f(x1 x2, α) where x1 x2 are choice variables and α is the parameter.

The FOC (f1= 0, f2 = 0) yields critical values for x1 and x2 which result in the stationary value of
y. But this is possible if and only if x is fixed.

Thus, the critical values of the choice variables are functions of α are x1* =f(α) and

x2* = f (α). i.e. if we change the value of α we will find different optimal values for x1 and x2 .So
the objective function is also a function of parameter.

Thus, л (x) = f(x1*(α), x2*(α), α) be the indirect objective function. This function shows the effect
of α on the objective function Y.

For instance, if α changes to αo, we may have different optimal values x1 = x10 and

x2 = x20 if x changes. And we maximize f(direct objective function) using x10 , x20 and α0.

Similarly, if α changes from α0 to α1, we will have another new equilibrium(Optimal) values for
x1 and x2 , x1 = x11 and x2 = x21 . The л (α) is a function which will be tangent to f(x1,x2, α) when
x1 and x2 are used optimally for a given α. For a given α if f is not optimized, the tangency will
not occur. Thus, we have a single point where f is equal to л and; hence, the inequality f(x) <= л
(α) .This implies л (α) is an envelope of the function f(x) at different values of α.

4.3.2. Envelop Theorem

71 | P a g e
This theorem is used to determine the change in the optimum value of an optimum function due
to change in the value of a parameter.

 Let Z  f x , y ,   be an objective function with a parameter d. The Foc for


maxima/ minima are f x  f y  0
 Then, the stationary values are : X *  x a  and y *  y a  and the optimum value of
the objective function: Z *  f x a  , y a  , a 
 Taking total derivative of Z* with respect a, we get
d z* d dy dz *
 fx . x  f y .  f a , but f x  f y  0 at optimum  fa
da da da da

 Therefore: - The rate of change of the optimum value of Z* as d changes is equal to the
partial derivative of the objective function with respect d.
o The above result is known as the envelop theorem.
Example1: Determine the effect of an increase in the value of  on the optimum value of the
function Y  2 x 2  d x  a 2

Foc:-

dy
 4x    0 , x*  a
dx 4

dy
 4  0 .................. y is minimum when x  a
dx
2 4

y*  2 a  4 2
 a a 4  a2  2.
a2
16

a2
4
 a2 
7a 2
8

dy * dy * 7a
 
da da 4

dy * dy a 7
Or  / x  x a    x  2a   2a  a
da da 4 4

7a
Y * increase by units as a increases.
4

Example 2 : A firm producing a certain product Q wants to maximize its profit. Suppose a tax, t
birr per unit is imposed on the function of α.

72 | P a g e
A. optimize the profit function

B. Find the effect of the change in t on л

Solution

A.We know that л = TR –TC

Л = TR –TC , TR = R(Q), C = C(Q) +tQ

= R(Q) - C(Q) – tQ = O

FOC

Л’(Q)=R’(Q)-C(Q)-t=0

= MR – MC – t = 0

t= MR – MC

SOC

л" (Q) = R"(Q) –C’’(Q)<0

= R’’(Q) < C’’ (Q)

=slope of MR < Sope of MC

B. л*= R(Q*) – C(Q*) – tQ*

d * ' dQ * dQ * dQ *
R Q *  C ' Q *  Q * t
dt dt dt dt
( R ' (Q*)  C ' (Q*))dQ * / dt  Q * tdQ * / dt
Q * Q *
t t
t t
 Q *

73 | P a g e
CHAPTER FIVE

5. CONSTRAINED OPTIMIZATION
5.1 Constrained optimization with equality constraint:
Functions of Two Variables and Equality Constraints:
5.1.1 Techniques of Constrained Optimization
The Techniques of constrained optimization presented in this chapter are built on the method for
identifying the stationary point in unconstrained optimization.
a) Substitution Method
One method in constrained optimization is to substitute equality constraints in to an objective
function. This is to convert a constrained optimization in to unconstrained optimization problem
by internalizing the constraint / constraints directly in to the objective function.
The constraint is internalized when we express it as a function of one of the arguments of the
objective function and then substituting that argument by using the constraint. We can then solve
the internalized objective function by using the unconstrained optimization technique. Such a
technique of solving constrained optimization problems is called the substitution method.
In order to develop a procedure for the determination of constrained extrema of a function, we
consider the maximization of the utility function, U  xy , of a consumer subject to the budget
constraint, 400  20 x  25 y .
Given the budget constraint 400  20 x  25 y , we note that now x and y cannot take values
independently of each other. For example, if x  5 ,then the value of y must be 12 etc.
One way of tackling this problem is to eliminate one of the independent variables, x or y, from
the utility function, by the use of given constraint. Then, U can be maximized or minimized in
the usual unconstrained manner.
To eliminate y (say) from the utility function, we first solve the constraint for y. on adjusting the
4
terms the constraint can be written as y  16  x . substituting this value of y in the utility
5
 4  4
function, U  xy we have U  x 16  x   16 x  x 2
 5  5
dU 8
Thus,  16  x  0 (for maxima)  x  10.
dx 5

Substituting x  10 in the constraint, the value of y  8 . Thus U  xy is maximum when


consumer purchases 10 units of x and 8 units of y, under the condition that his income is $400
d2 U
and price of x = $20 and price of y = $25. Since  0 , the second order condition is also
dx 2
satisfied. This technique of constrained optimization is dubbed as substitution method. Below an

74 | P a g e
attempt will be made to introduce you with the various techniques of constrained optimization,
including the substitution method.
Example 1
Let us consider a consumer with a simple utility function
U  x1 x2  2x1
Subject to budget constraint
4 x1  2 x2  60
The constraint can be rewritten as
x2  30  2 x1 .......... .......... .......... .......... .......... .......... .......... ..(1)
Substituting (1) in to an objective function
U  x1 , x2   x1 30  2 x1   2 x1
 32 x1  2 x12  2 x1 ……………..Internalize objective function
Optimize the internalized objective function.
Solution
First Order Conditions [FOC]:
U U
 0 and  0
 x1  x2
 32  4 x1  0
x1*  8

Substituting x1 *  8 in the constraint function we will have,


4 x1  2 x2  60  4 8   2 x2  60
x2 *  14
Second Order Condition:
2 U
 4  0
 x12
Thus :- U 8 , 14   128 is maximum when a consumer purchases 8 units of x1 and 14 units of
x2 under the condition that his income is $60 and price of x1 and x2 are $4 and $2, respectively.
Example 2
Maximize the function, f  x1 , x2   2 x1  1 x2 subject to the linear constraint that
2
4 x1  x2  20
Solution:-
The constraint can be rewritten as x2  20  4 x
Internalizing the above function,
1
f  x1   2 x1  20  4 x1 …. Internalized objective function
2

75 | P a g e
Solve for the internalized objective function. Proceed through the first and second order
conditions.
First order condition:-
1 1
f ' x1     0 Solving for x1 yields x1*  4
x1 20  4 x1

Substituting x1 *  4 in to the constraint function:


4 4   x1 *  20
x2 *  4
Second order condition:
 1 3 2
f ' '  x1    2 20  4 x1  2
3
x1
2
Evaluating f ' ' ( x1 ) at x1  4 we will have the following.
f ' ' 4    5  0
16

Thus, the maximum value for the function f ( x, y ) subject to the constraint
4 x1  x2  20 is 2 4  1 4  5
2
b). The Lagrange Multiplier Method
When the constraint is a complicated function or when there are several constraints under
consideration, the substitution method could become very cumbersome. This has led to the
development of another simpler method of finding extrema of a function, the lagrangean method.
This method involves forming a lagrangean function that includes the objective function, the
constraint function and the variable,  , called the Lagrange multiplier. The essence of this
method is to convert a constrained extremum problem in to a form such that the first order
conditions of the unconstrained optimization problem can still be applied.
We may note here that the necessary condition, obtained above under the substitution method,
can also be obtained from an auxiliary function, to be termed as Lagrange Function. This
function is formed by the used of objective function and the constraints. The Lagrange function
corresponding to the objective function Z  f ( x, y ) and the constraint g ( x, y )  0 , can be
written as
L  f  x , y    g  x , y  , where  is an undetermined multiplier known as
Lagrange Multiplier.
We note that, since g ( x, y )  0 , the value of L at a point is same as the value of Z  f ( x, y ) at
that point. Thus, the exterema of Z  f ( x, y ) and V occurs at the same point. The necessary
conditions [first order conditions] for exterma of L are:

76 | P a g e
L
 f x   g x  0 ………………………………….. (1)
x
L
 f y  g y  0 …………………………………… (2)
y
L
 g  x , y   0 ……………………………………… (3)

The simultaneous solutions of (1), (2) and (3) gives the stationary (or critical or equilibrium)
point.
Note that the stationary point obtained above will satisfy equation (3), i.e. the constraint. Thus,
the unconstrained extrema of L is equivalent to the extrema of z  f ( x, y ) subject to the
constraint g ( x, y )  0 .
On eliminating  from (1) and (2), we get
fx fy
 ………………………………………………. (4)
gx gy
This is same condition as obtained earlier. Equation (3) and (4) can be simultaneously solved to
get the coordinates of the stationary point.
c). Total Differential Approach
While we discuss un constrained optimization of Z = f (x , y) , the Foc may be stated in terms of
total differential.
d z  fx dx  f y d y  0
This statement remains valid after a constraint g ( x, y )  c .
d c  d g x , y   0
d g x , y   g x d x  gy dy  0
 gx
dy  . dx
gy
 gx
 Substituting d y  . d x in to the objective function
gy
g 
d z  fx dx   x f y  dx  0
g 
 y 

gx
fx dx  . f y . dx
gy
gx  0
fx fy fx g
 Or  x is the first order condition g y  0
gx gy fy gy

77 | P a g e
Note: This total differential approach yields the same result (first order condition) as of the
lagrangian method.
L  x , y ,    f  x , y    c  g  x , y  
L fx
 fx  gx   0 , fx  gx  ,  
x gx
L fy
 fy  gy   0 ,  
y gy

   fx fy

gx gy

What is the difference: -


The total differential approach yields only the value of X and Y , the Lagrange method also
gives  . as  provides a measure of the sensitivity of Z to the shift of the constraint, the
Lagrange method other an advantage of certain built – in comparative static information of the
solution.
Interpretation of the Lagrange Multiplier
It is a measure of the effect of a change in the constraint via the parameter “C” on the optimal
value the objective function.
We can demonstrate it by using the chain rule.
Consider the optimal values of the arguments of function f (x , y) are
X *  C  and Y *  C 
The “C” in the bracket reflects the fact that the optimal values are functions of the
constraint.
g  X * C  , Y * C    C
Taking differential of each side with respect to “C”.

 g  x * c  . Y * c    g  x *  c  . Y * c 
. dx * c   d y * c   dc
x y

 g  x * c  . Y * c   dx * c   g  x * c .Y * c   dy * c  dc
 .  . 
x dc y dc dc
Also, totally differentiate the objective function
 f *  x * c , y * c   f *  x * c . y * c 
d f  x * c  , y * c    . d x * c   . d y * c 
x y

78 | P a g e
d f  x * c . y * c   f  x * c . y * c  d x * c   f  x * c . y * c  d y * c 
Equ (1)   .  .
dc x dc y dc

The Foc require that


 f  x * c  . y * c   g  x * c  . y * c 

x x
 f  x * c . y * c  g  x * c . y * c 
 
y y
 Substituting the Foc in to equation (1)
d f  x *c . y *c    g  x *c . y *c  d x *c  g  y *c . x *c  d y *c  
  .  . 
dc   x d c  y dc 
But, the term in the bracket is equal to 1. Thus.
f  x * c  . y * c 
d  
dc

i.e the Lagrange multiplier represents the change in the optimum value of the objective
function with a small change in the constraint.

Generalization:-
Langrange Function with Multiple Constraints
We can solve the solution to a constrained optimization problem with more than one constraint
by using the lagrangian function with a Lagrange multipliers corresponding to each constraint.
Let an “n” variable function be subject, simultaneously, to the two constraints
g  x1 , x2 , ..... xn   C and h  x1 , x2 , ...... xn   d
Then adopting  and  as the two un constrained multipliers, we may conduct the Lagrange
function as:-
L x1 , x2 , ..... xn ,  ,    f  x1 , .... xn    c  g  x1 , ... xn    d  h  x1 , ... xn  
L
Foc :-  C  g  x1 , .... xn   0

L
 fi   gi   hi  0 i  1 , 2 , ...... h 
 xi
L
 d  h  x1 , x2 .....   0

 The extension of this model to the case of more than two arguments in the objective function
is straight for ward. In this the lagrangean function takes the form

79 | P a g e
L  x1 , x2 , ....... xn ,    f  x1 , x2 , ..... xn    C  g  x1 , x2 .... xn  
L
 0 for i  1 , 2 , ................ n
 xi
L
 C  g  x1 , x2 , ....... xn   0 , g  x1 , ............ xn   C


The Lagrangian function for a constrained optimization problem with an objective function of
“n” variable subject to m equality constraints is written as:-

 L  c 
m
L x1 , ... xn   f x1 , x2 , .... xn   i i  g i x1 , ... xn 
i 1

Where g i  x1 , .... xn   ci represent the ith constraint.


5.1.2. Sufficient Conditions for optimization
a) Substitution method
Example: - Consumption Problem
Suppose you have $ 6.00 to spend on a lunch of soup and salad. A restaurant at which you dine
sells both soup and salad by weight. An ounce of soap costs $ 0.25 and an ounce of salad costs
$ 0.50. How many ounce of each will you purchase to maximize your satisfaction if the utility
function is,
U  x1 , x2   1 In x1  1 In x2
4 2
Solution:-
The constraint function is given by,
1 1
x1  x2  6 …………………… budget constraint. It
4 2
shows the different combination of salad and soup (by weight) that costs you just $6. In order to
solve the optimization problem through the substitution method, let us rewrite the constraint
function as
x1  24  2 x2
Internalizing this in to the objective function yields,
1 1
U  x2   In 24  2 x2   In x2
4 2
Solving the internalized objective function, we have
2 1
U ' x2     0
4 24  2 x 2  2 x2
 x2 *  8
Substituting x2  6 and solving the constraint function for x2, we have

80 | P a g e
1 1
x1  x2  6
4 2
1 1
x1  8  6
4 2
x1 * 8
Second Order Condition:
2 2
U ' '  x2    2
24  2 x2 2
x2
Substituting x 2 *  8 for x2 in the second order derivative we will get the following.
2 2
   0 Therefore, the optimal amount of salad
24  2 82 64
(x1) and soup (x2) to be purchased is 8 and 8 ounces, and the maximum satisfaction from
consuming these is
1 1
U 8 , 8  In 8  In 8  1.56
4 2
Graphically:-

x1 C
T
B
24 A
U  2.29

T U  1.56
8
U  1.04

There are 3 indifference curves consistent with the utility function:


i. A: Shows all combination of8 x1 and x2 that provide12 a levelx2of utility equal o 1.04.
 All combinations are attainable but sub – optimal at which the consumer will not
maximize his utility.
ii. C: Shows all combination of x1 and x2 that provide a level of utility equal to 2.29.
 Given the budget constraint, all combinations are not affordable.
iii. B: All combinations except “T” is not affordable under the budget constraint.
 Point “T” is not only attainable but also an optimal combination.
Proof

81 | P a g e
We know that, the consumer will maximum his utility at a point where the budget line is tangent
to the highest possible indifference curve, where
 P1  x1  P1 M U x2
  
P2  x2 P2 M U x1

 P1 1 1 1
 , MUx1  and M U x2 
P2 2 4 x1 2 x2

1
M U x2 2 x1 2 x2 x2
 
M U x1 1 4 x1 2 x1
4 x2
1 x2
Then  at Po int T  x1  12 , x 2  6 
2 2 x1

1  1
2 2
Or at the optimal point, any decrease in utility due to a small reduction in the consumption of one
good would be just matched by the increase in utility due to an increase in constraint of the other
good.
If salad constraint increase by d x2, then the utility decrease by
M U x2 .  x2  d x2 . 1  1 d x2 at x2  6
2 x2 2
When evaluated at optimal point where MRS [the marginal rate of substitution] is -2, the
1
reduction in salad allows the constraint of soup to increase by d x1 because
2
d x1
 2
d x2
d x1
And, utility rise by M U x1 .  M U x1 . 2 d x2
2

1 1
 . 2 d x2  d x2 at x1 *  12
2 x1 12
From the first order conditions, after some rearrangement we can have,
 M U x2 .  x2  M U x1 .  x1 i.e., du  0

We may note here that when the constraint is a complicated function, or when there are several
equality constraints to be considered, the substitution method might be difficult or simply
impossible to carry out in practice.

82 | P a g e
In that case, you need to use other techniques. This has led to the development of alternative
methods of finding extrema of a function under constraints.
One such method is the method of the Lagrange Multiplier which will be discussed in the
following sub section. For now, study the following examples that demonstrate the use of the
substitution technique to solve constrained optimization.

b) Second – Order Total Differential


Taking differential of Z  f ( x, y ) , we have
dZ  f x dx  f y dy
 dZ   dZ 
Further, d 2 Z  dx  dy
x y
  dy    dy  
  f x x dx  f x y dy  f y  dx   f xy dx  f yy dy  f y  dy
 x   y 
Here, the constraint g (x , y) = c
g 
dg  g x dx  g y dy  0 dy    x  dx …………. (1)
/
 g y

In this case, Both dx and dy are no longer arbitrary.


We may take dx as an arbitrary change but dy must be regarded as dependent on dx (equation
(1) ).
Once dx is specified, dy will depend on gx and gy but the latter derivatives in turn depend on
x and y. hence, dy will also depend on x & y.
To find an appropriate new expression for d2 z :-
Z  f  x , y  / d z  f x dx  f y dy

 
d2 z  dz  dx  dz  dy
x y

 
 f x dx  f y dy  dx  f x dx  f y dy  dy
x y

 
  f xx dx  dy

 f y    f y .  dy  dx   f xy dx  dy .   f y  f y  dy 
  x  x   y  y 

Because dy is  variable also.


   dy     dy  
d 2 z   f xx f x   f xy dy  f y .  dx   f yx dx   f y y dy  f y .   dy
  x    y  

83 | P a g e
  dy  dy  
 f xx dx 2  2 f xy dx dy  f y y dy 2  f y  dx  dy 
 x y 
 dy   dy 
But : dx  dy  d dy 
x y
Then :- d2 Z = fxx dx2 + 2 fxy dx dy + fyy dy2 + fy d (dy)
= fxx dx2 + 2 fxy dx dy + fyy dy2 + fy d2 y ………… (2)
Here : d2 y  dy2 , thus the presence of d2y in the above equation d is qualifies d2 z as a
quadratic form.
However, d2 z can be transformed in to a quadratic form by virtue of the constraint g (x , y) = c.
d2 g = gxx dx2 + 2 gxy dx dy + gyy dy2 + gy dy2 = 0

 g xx dx 2  2 g xy dx dy  g y y dy 2 
2
dy     ………………… (3)
 gy 
Substituting equation (2) in to equation (3):-
g g xy g yy 2
d 2 z  f xx dx 2  2 f xy dx dy  f yy dy 2  xx dx 2  2 . f y  dxdy  . f y dy
gy gy gy
 fy   fy   fy 
d 2 z   f xx  f xx  dx 2  2  f xy  g xy  dx dy   f y y  g yy  dy 2
 gy   gy   gy 
     
But,   f y / g y
d2 z d2 z
And  Z xx  f xx   g xx /  Z xy  f xy   g xy
dx 2 dx dy
d2 z
 Z yy  f yy   g yy
dy 2

d 2 z  Z xx dx 2  Z xy dx dy  Z yx dy dx  Z yy dy 2
…………………… (4)

(Here we have taken dx as an arbitrary increment and y as a function of x because of the


constraint g ( x, y )  0 .
 dy  dy  
d 2 Z  f xx dx 2  2 f xy dx dy  f yy dy 2  f y  dx  dy 
 x y 
 f xx dx 2  2 f xy dx dy  f yy dy 2  f y d dy 
 f xx dx 2  2 f xy dx dy  f yy dy 2  f y d 2 y ………………………… (5)

84 | P a g e
We note that d 2 y is different from dy 2 . dy 2 can be eliminated from the above equation with the
help of second order differential of the constraint as shown below.
dg  g x dx  g y dy
Following the procedure used to obtain equation (5), we can write
d 2 g  g xx dx 2  2 g xy dx dy  g yy dy 2  g y d 2 y  0

2
 g xx dx 2  2 g xy dx dy  g yy dy 2 
 d y  
 gy 
Substituting this value in (5), we have
 fy   fy   fy 
d 2 z   f xx  g xx  dx 2  2  f xy  g xy  dx dy   f yy  g yy  dy 2 …....(6)
 gy   gy   gy 
     
fy
From equation (2) we have,   
gy
 d 2 z   f xx   g xx  dx 2  2  f xy   g xy  dx dy   f yy   g yy  dy 2
C) The Second – Order Condition for Lagrangian function
The discussion of the Lagrange multiplier method has focused on necessary conditions for
identifying an extreme point. But, the necessary conditions do not distinguish between maxima
and minima.
In this section, we present sufficient conditions for determining whether an optimum set of
values represent a maximum / minimum of the objective function in the context of the Lagrange
multiplier method.
The sufficient conditions for casing the stationary point of a multivariate function in the free –
extremum case is based on whether the Hessian matrix of the function is positive definite or
negative definite when evaluated at the stationary point.
Similarly, the sufficient condition for a constrained optimization problem depend up on whether
the lagrangian function is positive definite or negative definite when evaluated at the stationary
point.
But, we must be careful not to apply the soc developed for the unconstrained problem. As we
shall be, the new conditions must be stated in terms of the second order total differential, d2 z.
For a constrained extremum of Z = f (x , y) , subject to g (x , y) = c, the second order necessary
and sufficient conditions still revolve around the sing of d2 z , evaluated at the stationary point.
But, in the present context, we are concerned with the sign definiteness/ semi definiteness of d2 z,
not for all possible values of dx and dy but only for those dx and dy values (not both zero)
satisfying the linear constraint:-
g x dx  g y d y  0
The second – order necessary conditions are :-

85 | P a g e
For maximum of z : d2z is negative semi definite
d 2 z  0  , s to dg  0
For minimum of z :- d2 z is positive semi definite d 2 z  0  s to dg  0
The second order sufficient condition :-
For maximum: - d2z negative definite s to dg = 0
For minimum:- d2 z positive definite d 2 z  0  s to d g  0
5.1.3 The concepts of Border Hessian and Jacobian Determinants
a) The Border Hessian(H)
To determine the conditions under which d 2 z  0 , we consider the quadratic form:
d 2 Z  f xx dx 2  f yy dy 2  2 f xy dx dy subject to g x d x  g y d y  0

Let a = f xx b  f yy , h  f xy , U  d x V  dy d  gx   gy

a U 2  2 h u V  b v 2 s to d U   V  0 / V    d  U
 

d2 U2
q  a U 2  2 h     u 2  b 2 u 2  2
   
a 2
 2 h d  b d 2 

2
To since U  0 the sign of  is determined by the sing of   2  2 h d   b d 2
2
which can be written in determinant :-

0 d 
 a h  2h d   d  2  b d 2
 h b

N. B 
 a  2  2hd   bd 2   2 h d   d  2  d b2 .
Thus :
0 d 
  ve definite   0
q is   iff d a h
 ve definite   0
 h b

86 | P a g e
0 gx gy
q  gx f xx f xy
gy f xy f yy
Since the determinant can be obtained by bordering the determinant of the socs, for (the Hessian
determinants un constrained extrema, by a row and a column, it is called Border Hessian and
denoted by H 2 .
Given the Lagrange function L  f ( x, y )  g ( x, y ) we can write
Lx  f x  g x
Lxx  f xx  g xx
Lxy  f xy  g xy
Ly  f y  g y
Lyy  f yy  g yy
Thus, d 2 z  Lxx dx 2  2 Lxy dx dy  L yy dy 2

Note that: Lagrangian can be symbolized by L , V or Z


We recall that if d 2 Z  0 , the stationary point will be a maxima. But, if d 2 Z  0 , the stationary
point will be a minima. To determine the conditions under which d 2 Z  0 or d 2 Z  0 , we consider
the quadratic form Q = au2 + 2huV + bv2 subject to  u   v  0. substituting the value of

V       u  from the constraint in Q , we get


   
 2 2  u2 
Q  au 2  2h u  b 2 u 2  a 2  2h   b 2   2  .
   
Thus the sign of Q will be same as the sign of a 2  2h  b 2 .
0  
We note that  a h  2 h    a 2  b 2
 h b
0   0  
Thus Q is positive if  a h  0 and Q is negative if  a h  0.
 h b  h b

87 | P a g e
Thus, taking Vxx  a , Vxy  h , V yy  b ,   g x ,   g y , u  dx and V  dy , we conclude that
the stationary point corresponds to a maxima (or minima) if the sign of the determinant
0 gx gy
g x Vxx Vxy is positive (or negative).
g y Vxy V yy
Since this determinant can be obtained by bordering the determinant of the second order
conditions, for the unconstrained extrema, by a row and a column, it is also called a Bordered
Hessian and is denoted by H2 .
The Multivariable Case
When the objective function takes the form
Z  f  x1 , x2 , ......... xn  s to g  x1 , ............ xn   C

The soc still hinges on the sign of d2z.


The positive or negative definiteness of d2 z again involve a bordered Hessian. But, this time this
conditions must be expressed in terms of the bordered principal minors of the Hessian.
The bordered Hessian is given by :-

0 g1 g2 . . . . . gn
g1 z11 z12 . . . . . z1n
g2 z21 z22 . . . . . z2 n
H2  . . . . . . . . .
. . . . . . . . .
        
gn zn1 zn 2 . . . . . znn
Its successive border principal minors can be defined as:-
0 g1 g2 g3
0 g1 g 2
g z11 z12 z13
H 2  g1 z11 z12 H2  1 etc
g2 z21 z22 z23
g 2 z21 z22
g3 z31 z32 z33

Hn  H

H 2  Implies the second principal minor of the Hessian.

88 | P a g e
Remarks: Given the objective function z  f  x1 , x2 ........ xn  and the constraint
g  x1 , x2 , .......... xn   0 , the second order conditions are:

i) If H 2 , H 3 , ....... , H n  0 , then the stationary point is a minima.

H 2  0 ; H 3  0 , H 4  0 , ............ ,  1
n
ii) If H n  0 , the stationary point
is a maxima, where
0 g1 g2 g3
0 g1 g2
g1 V11 V12 V13
H2  g1 V11 V12 , H 3  , and so on
g 2 V21 V22 V23
g 2 V21 V22
g 3 V31 V32 V33
0 g1 g2 . . . gn
g1 V11 V12 . . . V1n
g 2 V21 V22 . . . V2 n
Hn  . . . . . . .
. . . . . . .
. . . . . . .
g n Vn1 Vn 2 . . . Vnn

Condition Maximum Minimum


Foc Z1  Z 2  ...  Z n  Z  o Z   Z1  ........ Z n  0
Soc H 2 0 : H 3 0 : H 4  0 : H 2  H 3  .... H n  0

….....  1
n
Hn  0

Sufficient condition for a maximum with one constraint:-


If the determinant of the border Hessian of that lagrangian function evaluated at the stationary
point has the same sign (-1)n and the largest n -1 leading principal minors alternate in sigh, then
d2 z is negative definite and the stationary point represent maximum.
Sufficient condition for a minimum with one constraint:-
If all of the largest n – 1 leading principal minors of its bordered Hessian evaluated at the
stationary point are negative, including the determinant of bordered Hessian it self, then d2z is
positive definite on the constraint and the stationary point represents a minimum.
Example 1:
Find the maxima or minima of the function, z  5 x 2  6 y 2  xy subject to 24 = x + 2y.

89 | P a g e
Solution
While forming the Lagrange function, it should be kept in mind that the constraint should be first
expressed in implicit form i.e. g ( x, y )  0 form. We can write
V  5 x 2  6 y 2  xy   24  x  2 y 
First order conditions:
V
 10 x  y    0 …………………………………….. (1)
x
V
 12 y  x  2  0 …………………………………… (2)
y
V
 24  x  2 y  0 …………………………………..… (3)

Eliminating  from (1) and (2)
12 y  x  20 x  2 y or 14 y  21x or 2 y  3 x

Substituting this value of y in equation (3), we get 24  x  3x  0 or x  6


3 3
Also y  x  6   9. Thus the stationary point is ( x, y )  (6,9) .
2 2
Second order condition:
2 V 2 V 2 V
Vxx   10 , Vxy    1 , Vyy   12
 x2 x y y 2

Taking differential of the constraint, we can write 0  dx  2dy .


 g x  1 and g y  2
0 1 2
Thus, H 2  1 10  1   56  0. Thus, the function has minima at (6, 9).
2  1 12
The minimum value of the function [calculated at (6, 9)] is 612.
Example 2:
A firm produces radio sets at two difficult locations. If x1 and x2 are the no of radio sets produced
in location I and II respectively. The joint cost function of the firm is given by

C  0.1 x12  0.2 x22  0.2 x1 x2  180 x1  60 x2  25 ,000


If the firm has to supply an order of 1000 radio sets, how many sets should it produce at each
location to minimize cost? Find the minimum cost and marginal cost?
Solution:-
The constraint: 1000 = x1 + x2

90 | P a g e
L  0.1 x12  0.2 x22  0.2 x1 x2  180 x1  60 x2  25,000   1000  x1  x2 
L
 0.2 x1  0.2 x2  180    0
 x1

  0.2 x1  0.2 x2  180   ……………………………. (1)

L
 0.4 x2  0.2 x1  60    0 :   0.4 x2  0.2 x1  60  
 x2

0.2 x1 x1  0.2 x2  180  0.4 x2  0.2 x1  60

120  0.2 x2 , x2 *  120  600


0 .2

x1  x2  1000 , x1 *  400
Soc :-
C x1  0.2 x1  0.2 x2  180 C x2  0.4 x2  0.2 x1  60

C x11  0.2 C x22  0.4


C x21  Cx12  0.2 (Young’s Theorem)
g x1  1  g x2

0 1 1
H2  1 0 .2 0 .2   0 .2  0
1 0 .2 0 .4
 The stationary point is a minima.
b) Jacobian Determinant (J)
A Jacobian determinant permits testing for functional dependence for both linear and non-linear
case. It is composed of all the first order partial derivation of a system equation arranged in
ordered sequence.
If the equitation’s are linear the Jacobian determinant is same as Hessian determinant. However,
for non-linear equations the normal Hessian determinate is not applicable.
Let Y1=f1 (X1,X2,X3), Y2=f2(X1,X2,X3) .
Therefore, the Jacobian determinant is for partial derivatives is as follows

91 | P a g e
 Y 1 Y 1
Y 1 
 X 1 X 3 
X 2
 Y 2 Y 2 
Y 2
/J/=  
 X 1 X 2
X 3 
 Y 3 Y 3
Y 3 
 X 1 X 3 
X 2
Example1: Linear case
Y1=6X1+4X2
Y2=4X1+5X2
Solution
6 4  X 1   Y 1
The Jacobian determinant form is give as,/J/    = 
4 5  X 2 Y 2
Thus, /J/=/H/=30-16=14≠ 0, which implies there is unique solution for X and Y, so you can find
the solution by applying cramer’s rule.
Example 2: Non- linear case
Y1=4X1-X2
Y2=16X12 +8X1X2+x22
Y11 Y12   4 1 
/J/=   =  , implies /J/≠0 .
Y 21 Y 22 32 x1  8 x2 8 x1  2 x2 
5.1.4. Constrained Optimization and Envelope theorem
Let Z  f  x , y , d  be the objective function and g  x , y , d   0 be the constraint. the
Lagrange function is L  f  x , y , a    g  x , y , a 
Foc
L
 fx   gx  0
x

L
 fy   gy  0
y

L
 g x , y , a   0


Solving the equations, we get the stationary values of


x *  x a  , y *  y a  and  *   d  and the optimum value

z*  f x a  , y a  ,  a  

92 | P a g e
Differentiating with respect d
d z* d dy
 fx . x  f y .  f a ………………………………… (1)
da da da

At optimum : g  x a  , y a  , d   0 and differentiating with respect d.

dx dy
gx  gy  g a  0 ………………………………………… (2) .
da da
Multiplying equation (2) by "" and add to equation (1).
Then.
=0

dz * d dy  d dy 
 fx . x  f y .  f a   g x x   g y   ga 
da da da  da da 

=0
=0
dz * dx dy
  fx   gx    fy   gy   fa   ga
da da da

dz * L
 fa   ga 
da a
x  x a 
y  y a 
   a 
Example:
Let U  f  x1 , x2  be utility function of a consumer and M  x1 P1  x2 P2 be his budget
constraint, where M , P1 , P2 are parameters. Determine the rate of change of optimal utility
with respect each of the three parameters.
Solution:-
L  f  x1 , x2    M  x1 P1  x2 P2 

93 | P a g e
du*
   U * increase by  units with increase in M.
dM
dU *
   x1  U * increase by  x1 units with increase in P2.
d P1
5.2. Constrained Optimization with Inequality Constraints
Sometimes optimization problems may have constraints that take the form of inequalities rather
than equalities. Constrains may be more naturally expressed as inequalities rather than equalities.
For example, consumption of a good is non – negative.
Maximization problems
Let us maximize   f  x1 , x2 , ..... xn  subject to the constraints g i  x1 , x2 , ... , xn   Ci ,
where Ci s are constants. This can be rewritten clearly as below.
Maximize   f  x1 , x2 , ..... xn 
Subject to g1  x1 , x2 , ....... xn   C1
g 2  x1 , x2 , ..... xn   C2
.

g n  x1 , ..... xn   Cm
[A case with m constraints, n choice variable non negativity restriction, x1 , x2 , .........xn  0 Thus,
we have 3 ingredients; the objective function, the subject function and the non negativity
constraints.
Dear students! What if you have a constraint in  sign?
If we encounter optimization problem with ≥ sign, we can simply multiply both sides of it by (-1)
and change the sign in to  inequality. We then proceed as usual.
5.2.1. The Kuhn – Tucker Condition
The Kuhn Tucker condition, unlike the classical optimization, may not be the necessary
condition. If we can make adjustments, it can be both necessary and sufficient conditions.
The three conditions to solve inequalities using Kuhn-Tucker approach are;
1. Non- negativity constraint
2. Inequalities constraint
3. Complimentary slackness condition
Each step will be discussed below;
1. Effect of Non – negativity Restriction
Assume the objective function with only one variable as
Max   f  x1  Subject to the non – negativity restrictions x1  0
There are 3 possibilities for maximum

94 | P a g e
.Thus `from the above discussion, the condition for maximization is
f '  x1   0 and x1  0 
x1 . f  x1   0


Generalization:- fi  0 , xi fi  0 where fi  i  1 , .... n
 xi

  f  x1 , ..... x n s to x1 , x 2 ........ x n  0

2. Effect of Inequality Constraints:-


The maximization problem with inequality constraints and non- negativity restrictions on the
choice variables will be clear under.
Let us optimize   f  x1 , x2 , x3 
subject to the constraint g1  x1 , x2 , x3   c1
g1  x1 , x2 , x3   c2
g1 x1 , x2 , x3  0
Step 1: Change the inequality constraint in to equality constraint. To do so, dummy (slack)
variables should be introduced to the constraint function. Because, the value in he left hand side
is less than the value in the right side.
The appropriate dummy variable for maximization problem is slack variable.
g1  x1 , x2 , x3   s1  c1
g 2  x1 , x2 , x3   s2  c2
Example: Maximize π=5X1+3X2
s.t 6X1+2X2≤36
5X1+5X2≤ 40
2X1+4X2≤ 28
X1, X2≥0.
Change inequality to equality by adding slack variables
6X1+2X2+S1=36
5X1+5X2+S2=40
2X1+4X2+S3= 28
Step – 2 write the Lagrangian function:-
L  f  x1 , x2 , x3   1 c1  g1  x1 , x2 , x3   s1   2 c2  g 2 x1 , x2 , x3   s2 
Step-3 set a partial first order derivates with respect to the choice variables less than or equal to
zero (≤0) and those with respect to the lagrangian multipliers≥0
Note that for minimization counter part deduct the surplus variable from the constraints and
apply all above steps.
Example: Maximize Y= 10X1-X12+180X2-X22
95 | P a g e
s.t. X1+X2≤ 80
X1, X2 ≥0
Solution
L=10X1-X12+180X2-X22+  (80-X1-X2)
LX1=10-2X1-  =0 ,  =10-2X1 ………………………………………………….(1)
LX2=180-2X2-  = 0,  = 180- 2X2…………………………………………………… (2)
L  =80-X1-X2=0 , 80= X1+X2 …………………………………………………………(3)
From equations 1 and 2  = 
10-2X1 = 180- 2X2
X1=X2-85…………………..……………………………………….(4)
Substitute Eq 4 in Eq 3
80=(-85+X2)+X2
80+85= 2X2
165=2X2
X2=82.5 ………………………………………………………………………….(5)
Substitute Eq 5 in Eq 4
X1=82.5-85
=-2.5, which is not meaningful in economics. so X1 = 0………………………….(6)
As a result the value of X2 also changed. Hence, 80=0+X2, X2=80 …………………….(7)
To find the value of 
Substitute Eq 7 in Eq 2  = 180-2(80)
 = 20.
CHAPTER SIX
COMPARATIVE STATIC ANALYSIS
6.1 The Nature of Comparative Statics

Comparative statics is concerned with the comparison of different equilibrium states that are
associated with different sets of values of parameters or exogenous variables. We have two
classes of variables.

1. Choice variables- are also called endogenous variables are variables that are determined
within the model. They are under the control of decision makers or firms.
2. Exogenous Variables or Parameters-are variables outside the model. They are out of the
control of the firms (decision makers). They are determined by factors outside the model.
In comparative statics, there is a need for point of reference or starting point so as to make the
comparison clear and visible. For example, initial equilibrium states are most commonly used as
starting points.

Example: In the theory of demand and supply price is exogenous: �� = � � & �� = �(�)

96 | P a g e
In both cases � is exogenous in the sense that it is not influenced by the actions of decision
makers. It is rather determined by factors outside the control of both consumers and suppliers in
the market.

The initial (pre-change) equilibrium may be represented by equilibrium, � , and the


corresponding quantity, � . If P changes, quantity demanded and quantity supplied can change
and hence the initial equilibrium will be upset (disturbed). If the new equilibrium state relevant
to the new price can be defined and attained, the question to be posed in comparative statics is
‘how would one compare the new equilibrium with the old one?’

In comparative statics, we disregard the process of adjustment of the variables i.e. we simply
compare the pre-change and the post-change equilibrium states and the way through which the
change has come is out of the concern of the field. Hence, comparative statics is essentially
concerned with finding the rate of change of the equilibrium value of the choice or endogenous
variables with respect to the change in a particular parameter or exogenous variable.

From example; what happens to the quantity demanded of a good when the price of the good, the
price of relative commodities or income changes is a representative question in comparative
statics.

A comparative static analysis can be either qualitative or quantitative in nature. If the interest is
to known the direction of change of the endogenous variables as a result of the change in the
parameters, the analysis is qualitative type. However, if the concern is both on direction and
magnitude of the change, the analysis will be quantitative.

In economics, theories are commonly tested on the basis of changes in the variables which may
or may not result in relation of assumptions

6.2. Differentiations and its Application to Comparative Static Analysis

In comparative statics we make use of derivatives in assessing the rate of change of one
endogenous variable as a result of the change in one or more parameters or exogenous variables.

We can consider different models to illustrate this point.

1. Let R(x) =total revenue as a function of out put, x.


C(x) =total cost as a function of out put, x.
tx=total tax paid and t is a per unit tax as a parameter beyond the control of the firm.
Here, while x is an endogeneous variable, t is exogeneous.
Let’s define the profit function,
� = � � − � � − ��

Foc: �' = �' � − �' � − � = 0

97 | P a g e
If the firm is in perfect competitative market, � = ��; but from the above,

�� = �� + � ⇒ � = �� + �. This implies that the firm chooses the level of output such that
�� = �� + �.

SOC: �'' = �'' (�) − �'' (�) < 0 ⇒ �'' (�) < �'' (�) i.e., the slope of the marginalrevenue should
be less than that of marginal cost.

Any way X is a function of t. A change in t change the optimal level of the out put
produced⇒ � = �∗ � �� � = �(�). If this is the case, we can insert this definition in the FOC.

��' (�) �(�' (�∗ )) ��∗ �(�' (�∗ )) ��∗ ��


= , − . − =0
�� ��∗ �� ��∗ �� ��
��∗ ��∗
�'' �∗ − �'' �∗ =1
�� ��
��∗ '' ∗
� � − �'' �∗ =1
��
��∗ 1
��
= �'' �∗ −�'' �∗
, but look at the SOC and for � to be minimized �'' �∗ − �'' �∗ < 0

��∗ 1
⇒ = '' ∗ <0
�� � � − �'' �∗
��∗
< 0 , implies optimal level of output and tax rate are negatively related.i.e. Out put of the
��
firm decreases as the tax rate of the firm faces incerases and viceversa.

Conclusion: A prediction about the size and direction of the change of the choice variable (out
put in the above case) can be made by looking at the change in the parameter facing the decision
maker-a goal in compartive statics.

2. Maxmize � � = � � − � � = �� − �(�)
Here the assumption is that P is an exogeneous variable.i.e. P is taken as a parameter beyond the
control of decision maker.
FOC: �' � = � − �' � = �…. . (�)
SOC: �' ' � =− �' ' � ≤ �…. . (�),�'' � > 0 the MC must be upward slopping.
But, we know the optimal level of x is a function of P i.e. � = �∗ (P) and from(1)

�' � = �� = �
� = �∗ (P) Shows the supply function of the firm starting from the point where MC=P.
� = �∗ (P) tells us how much the firm offers to the market for every market determined level of
price. Inserting � = �∗ (P) in FOC � − �' (�∗ (P)) =0 and applying the SOC:

98 | P a g e
�� ��' (�∗ ) ��∗
. =0
�� ��∗ ��
��∗ ��∗ 1
1 − �'' �∗ . ��
=0⇒ �� �'' (�∗)
and �'' �∗ > 0
��∗
>0 ⇒
��
This shows there is a positive relationship between the parameter P and the endogenous variable
output. In other words the supply function of a competitive firm is up ward slopping.i.e, � =
�∗ (�) has a graph of the slope.

3. Market model: let us consider a market model for a single commodity.


�� = � − �� �, � > 0
�� =− � + �� �, � > 0
We have two endogenous variables (P & Q) and four parameters (a, b, c & d). At equilibrium,
�+�
��= �� ⇒− � + �� = � − �� ⇒ � =
�+�
Assume � and � are initial equilibrium levels of the P and Q, comparative statics helps us to
compute the change in the initial equilibrium values.
�� 1
= >0
�� � + �
�� −(� + �)
= <0
�� (� + �)2
�� 1
= >0
�� � + �
�� −(� + �)
= <0
�� (� + �)2
NB: There is an inverse relationship between the price of the commodity and the slopes of
demand and supply.
i) An increase in ‘a’ indicates an upward shift of the demand curve, that is, as � ↑⇒ � ↑.

Q
S

�2

�1

�'
D

P
�1 �2
99 | P a g e
ii) Similarly, an increase in ‘c’ shifts the supply curve to the right. That is, as � ↑⇒ � ↑.

Q
S

�'
�1

�2

P
�1 �2

100 | P a g e
iii) An increase in ‘b’ increases the slope of the demand curve. That is, as � ↑⇒ � ↓.

Q
S

�1

�2
D

�'
P
�2 �1

iv) An increase in‘d’ increases the slope of the demand curve. That is, � ↑⇒ � ↓.

Q �'

�1 S
�2

D
0
�2 �1 P

Activity: Express the relationship between the parameters and the equilibrium level of the
output both graphically and algebraically.

4. National Income Model: The discussion of comparative static analysis can also be made on
national income model. We have
� = � + � 0 + �0
� = � + � � − � ; � > 0; 0 < � < 1
� = � + ��; (� > 0; 0 < � < 1)
�ℎ���,

101 | P a g e
� − �ℎ��� �ℎ� ���������� ����������� �����.
� − �������� �ℎ� �������� ����������� �� ����������� ��� .
� − ��������� �ℎ� ��� − ������ ��� ����.
� − ������ ��� ����.
���, �−T is usually referred to as disposable income (that is, income after tax). Y and C stand
for the endogenous variables national income and consumption expenditure respectively.
�0 ��� �0 are exogeneously determined investment and government expenditures. The first
equation is an equilibrium condition (National Income = Total Expenditure), while the second
and third are behavioral equations, that is, consumption and tax functions. Moreover, the
equations show that the model is of closed type because the trade terms are not incorporated.
NB: The equations are neither functionally dependent nor inconsistent with each other.
Thus, we can determine the equilibrium levels of the endogeneous variables, Y, C and T in terms
of the exogenous variables. �0 ��� �0 and the parameters �, �, � ��� �.
Substitution of the third equation in the second and then the second in the first, we have
� = � + � � − � + �� = � + � � − � − �� = � − �� + 1 − � �
� = � + �� − �� − ��� + �0 + �0
� + ��� − �� = � − �� + �0 + �0
� 1 + �� − � = � − �� + �0 + �0
� − �� + �0 + �0
�=
1 + �� − �
� − �� + �0 + �0
�=�+�
1 + �� − �
The interest in comparative static is to see the effect of the change in one of the exogenous
variable on the endogenous variables. To do so we make first order derivatives.
For example:
�� 1
��
= 1+��−� > 0 indicate the government expenditure multiplier.
��
=− 1 + �� − � −2 � − �� + �0 + �0 �
��
−� � − �� + �0 + �0
=
1 + �� − � 1 + �� − �
−��
= <0
1 + �� − �
This result indicates income tax multiplier.

�� ��
Activity: Find �� ��� ��
, and determine their signs.

102 | P a g e
6.3 Jacobians and Hessian Determinants

Both follow the same procedure in their computation. That is, we take the coefficients of first
order derivatives as elements of the determinant of both the Jacobian and Hessian. Or, simply we
evaluate the second order derivative of the Lagrangian function with respect to the choice
variables and take them as elements of the determinants.
The two differ in their purpose, while the Jacobians are used to check whether a given system
has a solution before an attempt, the Hessians are used as sufficient conditions to determine a
given point as a relative maximization or minimization point.
Note: The detail concepts and mathematical application of Jacobian and Hessian determinants
addressed in the previous chapter. The relevance of this sub topic here is that in order to remind
the students to relate the concepts with comparative statistics application.

103 | P a g e
6.4 Comparative Static and General Function Model
In all the above models the equilibrium values of the endogenous variables could be explicitly
expressed in terms of the exogenous variables. Accordingly the technique of simple partial
differentiation was all we needed to obtain the desired comparative static information.
However, when a model contains functions expressed in general form, explicit solutions are not
available, i.e., the equilibrium values of the endogenous variables can not be expressed explicitly
in terms of parameters and/or exogenous variables. In such cases a new technique must be
adopted that makes use of concepts such as total differentials, implicit function rule e.t.c.
Let’s try to illustrate the point with a market model. Consider a market model where �� is a
function of both price and an exogenously determined income �0 , and �� is a function of price
alone.
�� ��
�� = � �, �0 where < 0 and > 0.
�� ��0
��
�� = � � where > 0.
��
And, the equilibrium position of the market is defined by �� = �� . This implies
� �, �0 = � �
� �, �0 − � � = 0.
Even though the above equation can not be solved explicitly for the equilibrium price, � , we
assume that there does exist a static equilibrium before and after the change in the exogeneous
variable �0 .
Say we have obtained� , if income of consumers changes, the whole equilibrium will be upset.
This indicates that the optimal value of � which sets the market at equilibrium is a function of
the exogenous variable �0 . That is, � = �(�0 ).
The change in income upsets the equilibrium by causing a shift in the demand function implying
that every value of �0 yields a unique value of �.

Since � �, �0 − � � = 0 is an implicit function, we can taket it as an identity: � �, �0 = 0.


The comparative static analysis of such a model is concerned with how a change in Y will affect
the equilibrium position of the model. Thus, we can raise two equations:
��
(1) What is the effect of the change of �0 on � ��0
?
��
(2) What is the effect of the change of �0 on � ��0
?
From� �, �0 − � � = 0, we can answer question number one.

Applying the implicit function rule:

104 | P a g e
��
�� − �� �� −
��0 �������� ��������
0
= = = > 0.
��0 �� �� �� �������� − ��������
�� −
�� ��
��
Hence, �� > 0. This indicates that an increase in income results in an increase in equilibrium
0
price and vice versa.
��
To answer the second question, that is �� :
0

At equilibrium, �� = �� = �. We know that �� = �� = � � but � = �(�0 ). Thus, substituting


we get:

� = �(�(�0 ))

Applying the chain rule, we have


�� �� �� �� ��
��0
=
��
. �� �ℎ���
��
> 0 ��� ��0
.
0

��
Therefore, �� > 0 implying that an increase in income increases the equilibrium level of output.
0

Generally, the comparative static results convey the proposition that an upward shift of the
demand curve (due to a rise in Y), results in a higher equilibrium price and equilibrium quantity.
The above derivation of the relationship between � and � and the exogenous variable �0 can be
carried out by having a simultaneous equation approach.

�� = � �, �0 ; �� = � �

At equilibrium, �� = �� = � ⇒ �� − � = 0 ��� �� − � = 0

Writing these in general form:

� �, �0 − � = 0
� � −�=0

Converting these equations in to an identity form:

�1 �, �, �0 = � �, �0 − � = 0
�2 �, �, �0 = � � − � = 0

105 | P a g e
In the above two equations, we have two endogenous variables, P and Q and one exogenous
variable, �0 .

Let’s apply the partial differentiation rule on �1 and �2 , we can form a Jacobian, one row the �1
and the other for �2 :

��1 ��1 ��
�� �� −1 �� �� �� ��
� = = �� = − > 0 because > 0 and < 0.
��2 �� 2 �� �� �� �� ��
−1
�� �� ��

Since � ≠ 0, we can obtain equilibrium P and Q, that is, � = �(�0 ) and

� = �(�0 ).

Thus, the equilibrium condition can be written in the form of the identity � �, �0 − � =
0 and � � − � = 0.

Taking the total differential of these equations with respect to �0 , we have

�� ��
. �� + . ��0 − �� = 0
�� ��0

��
. �� − �� = 0
��
Dividing both sides of both equations by ��0 yields:

�� �� �� ��
. − =−
�� ��0 ��0 ��0

�� �� ��
. − =0
�� ��0 ��0
�� ��
Now, taking �� and �� as variables and writing the above equations in matrix form:
0 0

�� ��
−1 �� ��
�� 0 −
= ��0
�� ��
−1 0
�� ��0

In order to solve for the variables we can apply the Cramer’s rule, but before that let us
check � ≠ 0.

106 | P a g e
��
��
−1 �� ��
� = ��
= �� − �� > 0 which is different from zero. Hence, it is possibe to
−1
��

continue find the solutions of the problem.

��
− −1
��0 ��
�� 0 −1 = ��0 > 0
=
��0 � �� ��

�� ��
�� ��

�� ��0
�� �� �� �� ��
0 0− . .
�� �� ��0 �� ��0 ��
= = = >0
��0 � �� �� �� ��
− −
�� �� �� ��
�� ��
Therefore, both �� and �� are greater than zero, identical with results obtained before.
0 0

6.5 Limitations of Comparative Static Analysis

Comparative static is helpful in finding out how a dis-equilibriating change in a parameter, will
affect the equilibrium state of a model. However, by its very nature, it has the following
limitations.

1. It ignores the process of adjustment from the old equilibrium to the new one.
2. It neglects the time element involved in adjustment process from one equilibrium to another.
3. It assumes always a change in a parameter and/or exogenous variable results in a new
equilibrium. It disregards the possibility that new equilibrium may be attained even because
of the inherent instability of the model.

107 | P a g e
CHAPTER SEVEN
DYNAMIC OPTIMIZATION.
7.1 Definitions and Concepts

In static analysis an economic variable is assumed to be a function of another variable in the


same time period.
Example: consumption is the function of income for the same time period.
� = �(�)
2) Supply is the function of price for the same time period
� = �(�)
But, dynamic analysis introduces the explicit consideration of time in to the picture.
Example 1) Current consumption may depend on past incomes.
�� = � �� − 1 �� � � = �(� � − 1)
2) Quantity supplied in one period may depend on the price of the previous period. A good
example is agricultural food supply. The time paths of variables can be studied in two ways:

I. Discrete –time case: - Where time can be considered as a discrete variable, in which (case) the
variable undergoes a change only once within a period of time. This utilizes the methods of
difference equations.

Example :

Interest compounding per year, month e.t.c. The curve is not smooth here.

II. Continuous-time case: -where time is considered as a continuous variable, in which case
something is happening to the variable at each point of time. Here, integral calculus and
differential equations are used. Example population growth.

108 | P a g e
7.1.1 First –Order Linear Differential Equations
Differential equations are equations involving derivatives (differentials). They express the rates
of change of continuous functions over time. The objective in working with differential
equations is to find a function, without a derivative (differential) which satisfies the differential
equation. Such a function is called the solution or integral of the equation.
Note: The order of a differential equation is the order of the highest derivative in the equation.
The degree of a differential equation is the highest power to which the derivative of the highest
order is raised.
��
Example: 1) ��
= 2� + 6 : First order, first degree
�2� �� 3
2) ��2
+ ��
+ �2 = 0: Second order, first degree
7 5
�2� �3 �
3) ��2
+ ��3
= 75y: Third order, fifth degree
��
General form: ��
+ �� = � where V and Z are constants or functions of time.
��
When constants⇒ ��
+ 2� = 3
��
When functions of time ⇒ ��
+ � � � = �(�)
��
Note: For a first order linear differential equation ��
and y must be not higher than the first
��
degree and no product � �� may occur.

General solution: � � = ��−�� + �
Where, A ≡is an arbitarary constant.
��−�� ≡ �� ⇒ the complementary solution.

= �� ⇒Particular integral

Proof: Given the general linear differential equation:
�� ��
+ �� = � ⇒ = � − ��
�� ��
��
Separating the variables:�−�� = ��
��
Integrating both sides: �−��
= ��
−1
ln � − �� = � + �1

ln � − �� =− Vt + ( − V�1 )
�n � − �� =− Vt + C2 Where, �2 =-V�1
−��+�2
� − �� = � Since if ln � = � ⇒= ��
� − �� = �−�� ��2 = �−�� � Where c=��2
−�� = ��−�� − �
−� −�� � −�
�=

� +

Where �=

109 | P a g e
Hence, for the above general linear differential equation, the general solution is given as � =

��−�� + . This general solution works for both V and Z as a constants and functions of time.

Finding a Definite solution for a Differential Equation


Given a general solution of a differential equation, if an initial condition (boundary value) is
given, A can be specified, in which case a definite solution is possible. Hence, the arbitrary
constant in the general solution (i.e. A) can be definitized by means of an initial condition.

Given � � = ��−�� + �
� � �
At � = 0, � 0 = ��−�(0) + � = � + � ⇒ � = � 0 − �

Thus, the definite solution is formed by replacing ‘A’ by � 0 − � in the general solution as
follows.
� �
� � = � 0 − �−�� +
� �
Note: the particular integral (��) represents the intertemporal equilibrium level of �(�) . The
complementary solution �� denotes the deviation of the time path from the equilibrium level. For
�(�) to be dynamically stable, �� must approach zero as t approaches infinity (∞).
��
Example: Find the general solution for a differential equation ��
+ 4� = 12
Method 1: since V=4 and Z=12, the general solution will be � � = ��4� + 3
Method 2: using the equation in separated form, an explicit solution can be found as
�� ��
follows: �� + 4� = 12 ⇒ ��
= 12 − 4�
��
Separating the variables: 12−4� = ��
��
Integrating both sides: 12−4� = ��
−1
4
ln( 12 − 4� ) = � + �1
ln( 12 − 4� ) =− 4� − 4�1⇒ ) = 12 − 4� = �−4�−4�1
�−4�1
−4� = �−4� �−4�1 − 12 ⇒ � = �−4� +3
−4
−1 −4�
The general solution is given by: � = ��−4� + 3 where � = � 1
4
7.1.2 First Order Linear Difference Equations

First order linear difference equations are used to analyze changes with respect to time when
what happens in one period depends up on what happened in the previous period. A difference
equation expresses a relationship between a dependent variable and a lagged independent
variable which changes at discrete intervals of time. Example: consumption in one period
depends on the previous income. That is, �� = � ��−1 �� = for discrete case,

110 | P a g e
�(�) = for continous case

Note: The order of a difference equation is determined by the greatest number of periods lagged.
A first order difference equation expresses a time lag of one period; second order difference
equation a two period time lag and so on.

Example: The change in � as � changes from � to � + 1 is called the first difference of �

Δ�
�. �. = Δ�� = ��+1 − �� first order
Δ�
Most of the time delta (Δ)is omitted, and we write like this

�� = � ��−1 − ��−2 second order

�� = � + ���−1 first order

��+3 − 9��+2 + 2��+1 + 6�� = 8 third order

The solution of a difference equation defines y for every value of t and does not contain a
difference expression.

The general form of a linear difference equation is given by

��+1 − ��� = � or in some cases �� = ���−1 + � �ℎ��� � and � are constants.

Note: The dependent variable does not appear raise to a power higher than one or as a cross
product.

Methods of Solving Difference Equations

1. Iterative method
The first order difference equation describes the pattern of change of y between two consecutive
periods only. Hence, once the difference equation is specified and an initial value �0 is given, it
is possible to find y, from the equation. Similarly, once �1 is found, �2 will be immediately
obtained, and so on, b y repeated application (iteration) of the pattern of change specified in the
difference equation. The results of iteration will enable us to inter a time path or the variable
under consideration.
Examples:
Find the solution of a difference equation Δ �� = 2 assuming an initial value of

� = 15.
��������:
�� =��+1 − �� , Δ �� =2⇒ ��+1 = �� + 2
since Δ Then by successive substitutions of � = 0,1,2,3, �. �. � we obtain
111 | P a g e
� = 0: �1 = �0 + 2
� = 1: �2 = �1 + 2 = �0 + 2 + 2 = �0 + 4 = �0 + 2(2)
� = 2: �3 = �2 + 2 = �0 + 2(2) + 2 = �0 + 6 = �0 + 3(2)
� = 3: �4 = �3 + 2 = �0 + 3(2) + 2 = �0 + 8 = �0 + 4(2)
For any period � = �� = �0 + �(2)

⇒ �� = 15 + 2� this is the time path of the variable .


2 Find the solution of the difference equation ��+1 =
��� given that the initial value of � is �0.
Solution:
Given ��+1 = ���
At � = 0: �1 = ��0
At � = 1: �2 = ��1 = � ��0 = �2 �0
At � = 2: �3 = ��2 = � �2 �0 = �3 �0
At � = �: �� = �� �0
(The solution of the difference equation)
The solution of the differential equation:
� � = ��−�� (it is natural logarithm expreses constant rate of continuous growth)
The solution of the difference equation:
�� = ��� exponential equation expreses constant rates of discrete growth
2. General Method:
Given the first order difference equation: ��+1 − ��� = � its general solution will consist of the
sum of a complimentary function (�� ) and a particular integral (��).
Definition: �� equals the intertemporal equilibrium level of y and �� represents the deviation
of the time path from that equilibrium level. For the time path to be dynamically stable, �� must
approach zero as t approaches infinity.
�� -is the general solution of the reduced (homogeneous) equation.
��+1 − ��� = 0
�� -is the general solution of the complete or non-homogeneous equation.
��+1 − ��� = �
a) First find �� for the homogeneous equation;
Given: ��+1 − ��� = 0 ⇒ ��+1 = ���
The general solution will be of the form �� = ��� (see example 2 above in iterative method)
where,
��� ≠ 0 , for otherwise �� becomes a horizontal straight line on t-axis. In that case the
complimentary solution will be �� = ��� .
b) Next, find the particular integral for the non-homogeneous equation:
Given ��+1 − ��� = �, we can choose any solution particular integral ( �� ). Thus, we take the
simplest form of the trial solution: �� = � (a constant). Then, since y maintains the same constant

112 | P a g e
value over time, we must also have ��+1 = �. Then, substitute these values in to the complete
equation:

��+1 − ��� = � ⇒ � − �� = � ⇒ � 1 − � = � ⇒ � =
1−�

Hence, the particular solution becomes �� = � ⇒ �� = 1−� , � ≠ 1.

Note that since 1−�
is a constant, a stationary equilibrium is indicated in this case.
Since �� is undefined at b=1, we need to find some other solution for the non-homogeneous
equation. So, let’s try a solution of the form �� = �� which indicates a moving equilibrium,
�� = �� implies ��+1 = �(� + 1).
Substituting these values in the complete equation, we obtain
��+1 − ��� = � ⇒ � � + 1 − � �� = �
⇒ � � + 1 − �� = �, since � = 1
⇒ � � + 1 − � = � ⇒ � = � thus, �� = �� ⇒ �� = ��, for � = 1
c) Finally, adding �� ��� �� we arrive at the general solution

�� = �� + �� ⇒ �� = ��� + ……. 1 General solution when � ≠ 1.
1−�
�� = ��� + �� ⇒ �� = � + ��………. 2 General solution when � = 1.
Eliminating the arbitrary constant, the definite solution will be written as:
� �
From (1) at � = 0, �0 = ��0 + 1−� ⇒ � = �0 − 1−�

� �
�� = �0 − �� + 1' Definite solution, when � ≠ 1
1−� 1−�
From (2) at � = 0, �0 = � + � 0 = �

�� = �0 + �� 2' Definite slution, when � = 1


Example: Solve the first order difference equation��+1 − 5�� = 1, (�0 = 7 4 ).

Solution: Using the definite solution formula (� ≠ 1)


� �
�� = �0 − �� +
1−� 1−�
1 1
Since � = 5, � = 1 ⇒ � = 7 � 4− (5)� −
1−5 4
1
�� = 2(5)� − .
4

7.2. The Dynamic Stability of Equilibrium

113 | P a g e
In the continuous- time case, the dynamic stability of equilibrium could be done using stability
conditions. Stability condition using the general formula for the difference equation �� = �0 −
� �
1−�
�� + 1−� ,it can be expressed in a more general formula as:

� �
�� = ��� + �, � = �0 − &�=
1−� 1−�
The time path �� will be dynamically stable only if the complimentary function ��� →
0 �� � → ∞ i.e.

For difference equation (discrete function)



When � ≠ ∞ (for a continuous function)

�� = � � �=∞

Assuming for the moment � = 1 and � = 0 , the exponential expression �� generates seven
different time paths depending on the value of b (which can range from -∞ to ∞).

1. If � > 1 , �� increases at an increasing rate as t increase. Thus, the time path will explode
(diverge) and move farther and farther away from the horizontal line.

Example: If � = 3, the value of �� at different time periods will be

� 0 1 2 3 4

�� 1 3 9 27 81

2. If � = 1, �� =1 for all values of t, thus the time path is represented by a horizontal line.
3. If −1 < � < 1 , (i.e. b is a positive fraction), �� decreases as t increases, thus the time
path would be damped and moves towards the equilibrium line (horizontal line).
Example: if � = 1 3

T 0 1 2 3 4

�� 1 1 1 1 1
3 9 27 81

4. If � = 0, �� = 0 for all value of t.

114 | P a g e
5. If −1 < � < 0 i.e. (b is negative fraction), �� oscillates between positive and negative
values, and the time path will draw closer and closer to the horizontal line (equilibrium).
Example: if � =− 1 3
T 0 1 2 3 4

�� 1 −1 −1 −1 −1
3 9 27 81
6. If � =− 1, �� oscillates between +1 and -1.
Example:
T 0 1 2 3 4 5

�� 1 -1 1 -1 1 -1

7. If � <− 1, �� will oscillate and move farther and farther away from the horizontal line
equilibrium.:

.7.3. Economic Applications of Dynamic Optimization


7.3.1. Uses of Differential Equations in Economics

Differential equations are used to determine the conditions for dynamic stability in micro-
economic models of market equilibria and to trace the time path of growth under various
conditions in macro-models.

Given the growth rate of a function, differential equations enable the economist to find the
function whose growth is described. Furthermore, from point elasticity they enable him to
estimate the demand function.

Examples:

1. Given the demand function �� = � + �� and the supply function �� = � + ℎ� determine


the condition for price stability in the market (i.e. under what conditions p (t) will converge
to � (equilibrium price) as time→ ∞).
�� = � + ℎ�, � < 0 ��� ℎ > 0 �� = � + �� , � > 0, � < 0
Either are correct.
�� =− � + ℎ�, �, ℎ > 0 �� = � − �� , �, � > 0

Solution:
�−�
First find equilibrium price (�): �� = �� ⇒ � + �� = � + ℎ� ⇒ ℎ − � � = � − � ⇒ � = ℎ−�

Price (P)
Excess Supply Supply

115 | P a g e
� Demand=Supply (Equilibrium)

Excess Demand
Demand
0 � Quantity

��
Assume that the rate of change of p in the market �� is a positive linear function of excess
demand(�� − �� ):

��
�� = �(�� − �� ), �ℎ��� � > 0
� ≡ adjustment coefficient substituting the values parameters of �� and �� ; we get:

��
�� = � � + �� − � − ℎ� = �(� − � + (� − ℎ)�)
��
Rearranging to fit the general format of differential equation: �� + �� = �
��
�� + � ℎ − � � = � � − �
Then, since v= � ℎ − � �, and� = � � − � , the general solution becomes
� �−�
(� � = ��−�� + � ): � � = ��−�(ℎ−�)� + ℎ−�

The definite solution will be:


� − � −�(ℎ−�)� � − �
� � = (� 0 − )� +
ℎ−� ℎ−�
�−�
Using the relation � = ℎ−� : � � = (� 0 − �)�−�(ℎ−�)� + �

Let � = �(ℎ − �) ⇒ � � = (� 0 − �)�−�� + �

Whether � � tends to � as � → ∞ depends on whether the exponential expression �−�� →


1
0 as � → ∞. since � 0 and � are constants and � > 0 � = � ℎ − � > 0 , �−�� = �−��

0 �� � → ∞.

Consequently, the time path will indeed lead the price towards the equilibrium position. In this
case, the equilibrium is said to be dynamically stable.

116 | P a g e
But, depending on the relative magnitude of � 0 and � , the above solution yields three
possible time paths.

Cob Web Model

To illustrate the use of difference equations in economic analysis, we use a market model of a
single commodity (cobweb model).

For many products (such as agricultural commodities) which are planted a year before marketing,
current supply depends on last year’s price. ��,�+1 = � �� �� ��,� = �(��−1).

When such a supply function interacts with a demand function of the form: ��,� = �(�� )
interesting dynamic price patterns will result.

Using the linear version of the lagged supply and un-lagged demand function we get the market
model with the following equations:

��� = � + ��� (� > 0, � < 0)

��� = � + ℎ��−1 (� < 0, ℎ > 0)

��� = ��� equilibrium conditions

Using the last equation, the model can reduced to a single first order difference equation:

��� = ���

� + ��� = � + ℎ��−1 ⇒ ��� = � − � + ℎ��−1

�−� ℎ ℎ �−�
�� = + ��−1 ⇒ �� − ��−1 =
� � � �
ℎ �−�
Shifting the periods forward by one period: ��+1 − � �� = �

� �
The definite solution is given as �� = �0 − 1−� �� + 1−� :

�−� �−�
�� = �0 − � ℎ �
( �) + �
1−ℎ 1−ℎ
� �
�−� ℎ � �−� �−�
�� = �0 − ( ) + ��� � =
�−ℎ � �−ℎ �−ℎ
117 | P a g e

Hence, �� = �0 − � ( )� + �

Two points may be observed in regard to this time path:

i. The significance of the expression (�0 − �) : (same as A in ��� )


a) Its sign tells us whether the time path commences (starts) above or below equilibrium
(mirror effect)
b) Its magnitude helps us to determine how far above or below the time path is (scale
effect)

ii. Under ‘normal’ demand and supply condition b < 0 ��� ℎ > 0 so that �
< 0,
indicating that the time path is oscillatory.

a. �� ℎ > � , �
> 1, and �� explodes diverges .

b. �� ℎ = � , �
=− 1, and �� oscillates uniformly.

c. �� ℎ < � , �
< 1, and �� converges and approaches �.
In short, for stability the supply curve must be flatter than the demand curve. (or, h<b in
magnitude). Assuming �0 > �, the time path of price and quantity can be illustrated as follows.

Graphic Illustration
�� ��
��

� �

ℎ > � ⇒ The supply ℎ = � ⇒ The supply ℎ < � ⇒ The supply


curve is steeper than the curve is as steep as the curve is flatter than the
demand curve. Panel (a) demand curve. Panel (b) demand curve. Panel (c)
Q Q Q
S D
S S
�3 �1
�1 �1 �3

� � �

�2 �2
�2
�4
D D
�3 �1 � �0 �2 P �1 � �0 P �1 � �2 �0 P

Fig (a): assume that the intersection of D and S yields the intertemporal equilibrium price � .
Given an initial price (�0 ) (where �0 > � ), quantity supplied in the next period (period 1) will

118 | P a g e
be�1 . But, at this point (�1 ), supply exceeds demand, so that there is a tendency for price to fall.
Consequently, the market clears at �1 .

-Assuming �1 to prevail, period 2’s supply will be limited to �2 (�2 < �1 ).

-At�2 , demand exceeds supply, so there is a tendency for p to rise. As a result the market clears
at �2.

- Again, assuming �2 to prevail, the supply in period 3 will be �3 (where�3 > �2 ). At �3 , since
supply>demand, the market clears at �3 . Replacing this reasoning we can trace out the prices
and quantities in subsequent periods by following the arrowheads, thereby, spinning a cob web
around the demand and supply curves.

The time path of price is oscillatory and explosive when S is steeper than D or (h>b).

Fig (b): �� oscillates uniformly so that equilibrium is unstable. �� Neither diverges nor converts.

Fig (c): �� is oscillatory and convergent when S is flatter than D (h<b) so that equilibrium of
time path ( �� ) will be dynamically stable.

Examples:

1. Given the demand function


�� = 20 − 2 �� and the supply function �� =− 5 + 3 ��−1 ,
a. Find �� ��� �� �ℎ�� �0 = 4
b. Find � and �
c. Find �1 , �1 , �2, �2 . . �. �. �.
d. Using cobwebs comment on the stability of the system.
�−� −5−20
Solution: �=
�−ℎ
=
−2−3
=5

ℎ 3
Hence, �� = �0 − � ( )� + � ⇒ > 1
� 2
25 3
a. From ��� = ��� ⇒ 20 − 2 �� ==− 5 + 3 ��−1 ⇒ �� = 2
− 2 ��−1
Shifting the time periods forward one period and rearranging:
3 25
��+1 + �� =
2 2
Definite solution:
25 25
�� = 4 − 2
( − 3 2 )� +
=(4-5) ( − 3 2 )� + 5
2
1+3 2 1+3 2

�� =− ( − 3 2 )� + 5 �� �� = 5 − ( − 3 2 )�
Substation in the demand function yields:

119 | P a g e
� �
�� = 20 − 2 5 − − 3 2 = 10 + 2( − 3 2 )� �� �� = 2 − 3 2 + 10

b. If there is an equilibrium (i.e., if the price is constant at any period meaning �� = ��−1 =
��−2 = … = ��−� ), then it follows that
�� = ��−1 = � and Qt = Q
Then, from ��� = ���

20 − 2� =− 5 + 3� ⇒ � = 5

From the demand function, � = 20 − 2 5 ⇒ � = 10


3 3
c. Using �� = 5 − ( − 2 )� ��� �� = 2( − 2 )� + 10
0
3 3
�0 = 5 − − =4 �0 = 2( − )0 + 10 = 12
2 2
1
3 3
�1 = 5 − − = 6.5 �1 = 2( − )1 + 10 = 7
2 2
2
3 3
�2 = 5 − − = 2.75 �2 = 2( − )2 + 10 = 14.5
2 2
3 3 3
�3 = 5 − − 2 = 8.38 �3 = 2( − )3 + 10 = 3.25
2

We use these points to plot the diagram.

d. Using the Cobwebs

Quantity Supply

�2
�0


�1
�3

Demand

P2 P0 � P1 P3 Price

The equilibrium is unstable because the time path is oscillatory and divergent.

120 | P a g e
PART I

ECONOMETRICS ONE
Chapter 1
Introduction
1.1 Definition and scope of econometrics
What Is Econometrics?
 Literally, econometrics means “economic measurement”. It is basically concerned with
measuring economic relationships.
 But the scope is much broader. Various econometricians used different ways to define
econometrics.
 “Econometrics is the science which integrates and applies: economic theory, economic
statistics, and mathematical economics to investigate the empirical support of the general
laws established by economic theory.
 It is a special type of economic analysis and research.
- Starting from economic theory, we can express the relationship in mathematical terms so
that they can be measured by the methods called econometric methods in order to obtain
numerical estimates of the coefficients of the economic relationships.
1.1 Why a Separate Discipline?
Econometrics integrate and applies:- economic theory, mathematical economics,
economic statistics and Mathematical statistics to provide numerical values for the
parameters of economic relationships and verifying economic theories.
 Economic theory- makes statements or hypotheses that are mostly qualitative in nature. But
 It does not provide numerical measure of the relationship between economic variables.
 It does not tell by how much the quantity of one variable will go up or down as a result of
a certain change in the other variable.
 Therefore, it is the job of the econometrician to provide such numerical estimates.
Econometrics gives empirical content to most economic theory.
Mathematical economics - expresses economic theory in mathematical form (equations) without
regard to measurability or empirical verification of the theory.
No essential difference between mathematical economics and economic theory, economic
theory use verbal exposition while mathematical uses symbols.
Both express economic relationships in an exact or deterministic form. They
(mathematical economics and economic theory) does not allow for random elements
which might affect the relationship and make it stochastic.

121 | P a g e
Furthermore, they do not provide numerical values for the coefficients of economic
relationships.
 However, econometrics method does not assume exact or deterministic relationship. It
assumes random disturbance variable in relationships among economic variables which
relate deviations from exact behavioral patterns suggested by economic theory and
mathematical economics. Furthermore, econometric methods provide numerical values
of the coefficients of economic relationships.
Economic statistics- is mainly concerned with collecting, processing, and presenting economic
data in the form of charts and tables and describing the pattern of economic data over time. The
data thus collected constitute the raw data for econometric work. But the economic statistician
does not go any further to test economic theories, however; the econometrician does.
Mathematical (or inferential) statistics- deals with the method of measurement which is
developed on the basis of controlled experiments. But statistical methods of measurement are
not appropriate for a number of economic relationships because for most economic relationships
controlled or carefully planned experiments cannot be designed due to the fact that the nature of
relationships among economic variables are stochastic or random. Yet the fundamental ideas of
inferential statistics are applicable in econometrics, but they must be adapted to the problem of
economic life. Econometric methods are adjusted so that they may become appropriate for the
measurement of economic relationships which are stochastic. The adjustment consists primarily
in specifying the stochastic (random) elements that are supposed to operate in the real world and
enter into the determination of the observed data.
1.4 Economic models vs econometric models
i) Economic models:
- The simplified analytical framework economic theory is called an economic model. It is
an organized set of relationships that describes the functioning of an economic entity
under a set of simplifying assumptions. All economic reasoning is ultimately based on
models.
Economic models consist of the following three basic structural elements.
1. A set of variables
2. A list of fundamental relationships and
3. A number of strategic coefficients
ii) Econometric models:
The most important characteristic of econometric relationships is that they contain a random
element which is ignored by mathematical economic models.
Example: Microeconomic theory postulates that the demand for a commodity depends on its
price, on the prices of other related commodities, on consumers’ income and on tastes. This is
an exact relationship which can be written mathematically as:
Q  b 0  b1 P  b 2 P0  b 3 Y  b 4 t → exact relationship.
However, many more factors may affect demand for the commodity. In econometrics the
influence of these ‘other’ factors is taken into account by the introduction of random variable
into the economic relationships.

122 | P a g e
In our example, the demand function studied with the tools of econometrics would be of the
stochastic form:
Q  b0  b1 P  b2 P0  b3Y  b4 t  u , where u stands for the random factors which affect
the quantity demanded.

1.5 Methodology of econometrics


Econometric research is concerned with the measurement of the parameters of economic
relationships and with the predication of the values of economic variables. Some variables are
postulated as causes of the variation of other variables.
Based on the postulated theoretical relationships among economic variables, econometric
research or inquiry proceeds along the following stages:
1. Specification the model
2. Estimation of the model
3. Evaluation of the estimates
4. Evaluation of the forecasting power of the estimated model
1. Specification of the model
 The relationships between economic variables must be expressed in mathematical form.
 This step involves the determination:
i) the dependent and independent (explanatory) variables in the model.
ii) the a priori theoretical expectations about the size and sign of the parameters.
iii) the mathematical form of the model
 It should be based on economic theory and information related to the phenomena.
 Thus, presupposes knowledge of economic theory and familiarity with the particular
phenomenon being studied.
 The most important and the most difficult stage of any econometric research.
 In this stage there exists enormous degree of likelihood of committing errors or
incorrectly specifying the model.
 Some of the common reasons for incorrect specification of the econometric models are:
1. Imperfections, looseness of statements in economic theories.
2. Limitation of knowledge of the factors which are operative in any particular case.
3. Formidable obstacles presented by data requirements in the estimation of large models.
 The most common errors of specification are:
a. Omissions of some important variables from the function.
b. The omissions of some equations (for example, in simultaneous equations model).
c. Incorrect mathematical form of the functions.
2. Estimation of the model
This is purely a technical stage which requires knowledge of the various econometric methods,
their assumptions and the economic implications for the estimates of the parameters. This stage
includes the following activities.
i. Gathering of the data on the variables included in the model.
ii. Examination of the identification conditions of the function (especially for
simultaneous equations models).
iii. Examination of the aggregations problems involved in the variables of the function.

123 | P a g e
iv. Examination of the degree of correlation between the explanatory variables (i.e.
examination of the problem of multicollinearity).
v. Choice of appropriate econometric techniques for estimation, i.e. to decide a specific
econometric method to be applied in estimation; such as, OLS, MLM, Logit, and
Probit.
3. Evaluation of the estimates
 This stage consists of deciding whether the estimates of the parameters are theoretically
meaningful, statistically satisfactory and reliability of the results.
For this purpose we use three groups of criteria which may be classified into:
i. Economic a priori criteria: - determined by economic theory and refer to the size and
sign of the parameters of economic relationships.
ii. Statistical criteria (first-order tests): - determined by statistical theory which focuses
on the evaluation of the statistical reliability of the estimates of the parameters of the
model. Correlation coefficient test, standard error test, t-test, F-test, and R2-test are
some of the most commonly used statistical tests.
iii. Econometric criteria (second-order tests):-
 Set by the theory of econometrics investigating whether the assumptions of
the econometric method employed are satisfied or not.
 It serve as a second order test (as statistical tests) i.e. they determine the
reliability of the statistical criteria;
 Help us to establish whether the estimates have the desirable properties
(unbiasedness, efficiency and consistency).
 Econometric criteria aim at the detection of the violation or validity of the
assumptions of the various econometric techniques.
4) Evaluation of the forecasting power of the model:
- Forecasting is one of the aims of econometric research. It is possible that the model
may be economically meaningful and statistically and econometrically correct for the
sample period for which the model has been estimated; however, it may not be
suitable for forecasting due to various factors (reasons).
- Therefore, this stage involves the investigation of the stability of the estimates and
their sensitivity to changes in the size of the sample.
1.6) Desirable properties of an econometric model
An econometric model is a model whose parameters estimated with some appropriate
econometric technique.
The ‘goodness’ of the model is judged according to the following desirable properties.
1. Theoretical plausibility:- compatible with the postulates of economic theory.
- describe the economic phenomena to which it relates.
2. Explanatory ability:- should be able to explain the observations of the actual world.
- must be consistent with the observed behavior of the economic
variables whose relationship it determines.
3. Accuracy of the estimates:- coefficients should be accurate, approximate to the true
parameters of the structural model.
- should possess the desirable properties of unbiasedness,
consistency and efficiency.

124 | P a g e
4. Forecasting ability:- the model should be able to predict future values of the dependent
(endogenous) variables.
5. Simplicity:- The model should represent the economic relationships, fewer the equations
and the simpler mathematical form without affecting desirable properties.
1.7) Goals of Econometrics
 Three main goals of Econometrics are:
i) Analysis - testing economic theory
ii) Policy making - obtaining numerical estimates of the coefficients of economic
relationships for policy simulations.
iii) Forecasting - using the numerical estimates of the coefficients in order to
forecast the future values of economic magnitudes.
1.8) The Sources, Types and Nature of Data
The success of any econometric analysis ultimately depends on the availability of the
appropriate data. It is therefore essential that we spend some time discussing the nature,
sources, and limitations of the data that one may encounter in empirical analysis.
Types of Data
Three types of data may be available for empirical analysis.
1) Time Series Data
- is a set of observations or values that a variable takes at different times.
- may be collected at regular time intervals (daily, weekly, monthly, quarterly,
annually).
- most empirical work based on time series data assumes stationary (its mean and
variance do not vary systematically over time).
2) Cross-Section Data
- data on one or more variables collected at the same point in time, such as the census
of population conducted by the Census Bureau every 10 years.
- Just as time series data create their own special problems (stationarity issue), cross-
sectional data have their own problems (the problem of heterogeneity).
3) Pooled Data:-
- In pooled, or combined, data are elements of both time series and cross-section data.
- Panel, Longitudinal, or Micro panel Data- is a special type of pooled data in which the same
cross-sectional unit (say, a family or a firm) is surveyed over time.

Chapter Two
2.1 THE CLASSICAL REGRESSION ANALYSIS
[The Simple Linear Regression Model]
 Economic theories mainly concerned on the relationships among economic variables.
 When it phrased in mathematical terms, can predict the effect of one variable on another.

125 | P a g e
 The functional relationships of these variables define the dependence of one variable upon
the other variable (s) in the specific form.
The specific functional forms may be linear, quadratic, logarithmic, exponential,
hyperbolic, or any other form.
In this chapter we shall consider a simple linear regression model, i.e. a relationship between two
variables related in a linear form. We shall first discuss two important forms of relation:
stochastic and non-stochastic, among which we shall be using the former in econometric analysis.
2.1. Stochastic and Non-stochastic Relationships
A relationship between X and Y, as Y = f(X) is said to be:
 Deterministic - if for each value of the independent variable (X) there is one and only one
corresponding value of dependent variable (Y).
 Stochastic - if for a particular value of X there is a whole probabilistic distribution of values
of Y. In such a case, for any given value of X, the dependent variable Y assumes some
specific value only with some probability.
Let’s illustrate the distinction between stochastic and non stochastic relationships with the help
of a supply function.
Assuming that the supply for a certain commodity depends on its price (other determinants taken
to be constant) and the function being linear, the relationship can be put as:
Q  f ( P )     P                            ( 2.1)
 From the above relationship for a particular value of P, there is only one corresponding
value of Q. Therefore, it is a deterministic (non-stochastic) relationship since for each
price there is always only one corresponding quantity supplied.
 This implies that all the variation in Y is due solely to changes in X, and that there are no
other factors affecting the dependent variable.
 If plotted on a two-dimensional plane, would fall on a straight line. However, if we
gather observations on the quantity actually supplied in the market at various prices and
we plot them on a diagram we see that they do not fall on a straight line.
 The derivation of the observation from the line may be attributed by the factors:
i) Omission of important variables from the function
ii) Random behavior of human beings
iii) Imperfect specification of the model
iv) Error of aggregation
v) Error of measurement
 To take into account the above sources of errors we introduce a random variable(error
term or random disturbance or stochastic term) which is denoted by the letter ‘u’ or ‘  ’
in the econometric function. So called because u is supposed to ‘disturb’ the exact linear
relationship which exist between X and Y.
 By introducing this random variable in the function the model is rendered stochastic of
the form:
Yi     X  u i ……………………………………………………….(2.2)
 Thus a stochastic model is a model in which the dependent variable is not only
determined by the explanatory variable(s) included in the model but also by others which
are not included in the model.
126 | P a g e
2.2. Simple Linear Regression model.
The above stochastic relationship (2.2) with one explanatory variable is called simple linear
regression model.
The true relationship which connects the variables involved is split into two parts:
- a part represented by a line and
- a part represented by the random term ‘u’.
 The scatter of observations represents the true relationship between Y and X. The line
represents the exact part of the relationship and the deviation of the observation from the
line represents the random component of the relationship.
- Were it not for the errors in the model, we would observe all the points on the line.
However because of the random disturbance, we observe points deviating from the line.
These points diverge from the regression line by u1 , u 2 ,...., u n .
Yi     xi  ui
 
  
the dependent var iable the regression line random var iable

- The first component in the bracket is the part of Y explained by the changes in X and the
second is the part of Y not explained by X, that is to say the change in Y is due to the
random influence of u i .
2.2.1 Assumptions of the Classical Linear Stochastic Regression Model
The classicals made important assumption in their analysis of regression .The most important of
these assumptions are discussed below.
1. The model is linear in parameters.
- They assumed that the model should be linear in the parameters regardless of whether the
explanatory and the dependent variables are linear or not.
- because if the parameters are non-linear it is difficult to estimate them since their value is
not known but you are given with the data of the dependent and independent variable.
U i is a random real variable
-
the value which u may assume in any one period depends on chance; it may be positive,
negative or zero.
2. The mean value of the random variable(U) is zero
- for each value of x, the random variable(u) may assume various values, some greater
than zero and some smaller than zero,
- if we considerer all the positive and negative values of u, for any given value of X,
average value equal to zero. In other words the positive and negative values of u cancel
each other.
Mathematically, E (U i )  0 ………………………………..….(2.3)
3. The variance of the random variable(U) is constant in each period
(The assumption of homoscedasticity)
Mathematically;
 Var (U i )  E[U i  E (U i )] 2  E (U i ) 2   2 ; (since E (U i )  0 ).
 This constant variance is called homoscedasticity assumption and the constant variance
itself is called homoscedastic variance.
4. The random variable (U) has a normal distribution

127 | P a g e
- the values of u (for each x) have a bell shaped symmetrical distribution about their zero
mean and constant variance  2 , i.e.
U i  N (0,  2 ) ………………………………………..……2.4
5. The random terms of different observations U i , U j  are independent.
(The assumption of no autocorrelation)
- This means the value which the random term assumed in one period does not depend on
the value which it assumed in any other period.
Algebraically,  
Cov (u i u j )   [(u i  (u i )][u j  (u j )]
 E (u i u j )  0 …………………………..….(2.5)
6. The X i are a set of fixed values in the hypothetical process of repeated sampling which
underlies the linear regression model.
- This means that, in taking large number of samples on Y and X, the X i values are the
same in all samples, but the u i values do differ from sample to sample, and so of course do
the values of y i .
7. The random variable (U) is independent of the explanatory variables.
- This means there is no correlation between the random variable and the explanatory
variable.
- If two variables are unrelated their covariance is zero.
Hence Cov( X i , U i )  0 ………………………………………..….(2.6)
8. The explanatory variables are measured without error
- Random variable (u) absorbs the influence of omitted variables and possibly errors of
measurement in the y’s. i.e., we will assume that the regressors are error free, while y
values may or may not include errors of measurement.
Dear students! We can now use the above assumptions to derive the following basic concepts.
A. The dependent variable Yi is normally distributed.
 
i.e  Yi ~ N (  x i ), 2 ………………………………(2.7)

 The shape of the distribution of Yi is determined by the shape of the distribution of


u i which is normal by assumption 4.
 Since  and  , being constant, they don’t affect the distribution of yi .
 Furthermore, the values of the explanatory variable, xi , are a set of fixed values by
assumption 5 and therefore don’t affect the shape of the distribution of yi .
 Yi ~ N(  x i ,  2 )
B. successive values of the dependent variable are independent, i.e Cov (Yi , Y j )  0
2.2.2 Methods of estimation
 Specifying the model based on the assumptions is the first stage of any econometric
application.

128 | P a g e
The next step is the estimation of the numerical values of the parameters of economic

relationships.
 The parameters of the regression model can be estimated by various methods.
Three of the most commonly used methods are:
1. Ordinary least square method (OLS)
2. Maximum likelihood method (MLM)
3. Method of moments (MM)
2.2.2.1 The ordinary least square (OLS) method
 The model Yi     X i  U i is called the true relationship between Y and X because Y
and X represent their respective population value, and
  and  are the true parameters since they are estimated from the population value of
Y and X.
 But it is difficult to obtain the population value of Y and X because of technical or
economic reasons.
- So we take the sample value of Y and X. The parameters estimated from the sample value
of Y and X are called the estimators of the true parameters  and  and are symbolized
as ˆ and ˆ .
- The model Yi  ˆ  ˆX i  ei , is called estimated relationship between Y and X since
ˆ and ˆ are estimated from the sample of Y and X and
- ei represents the sample counter part of the population random disturbance U i .
 Estimation of  and  by least square method (OLS) involves finding values for the
estimators ˆ and ˆ which will minimize the sum of square of the squared residuals(  ei2 ).
From the estimated relationship: Y  ˆ  ˆX  e , i i i

we obtain:
ei  Yi  (ˆ  ˆX i ) ……………………………(2.6)

e 2
i   (Yi  ˆ  ˆX i ) 2 ……………………….(2.7)
To find the values of ˆ and ˆ that minimize this sum, we have to partially differentiate
 ei2 with respect to ˆ and ˆ and set equal to zero.
  ei2
1.  2 (Yi  ˆ  ˆX i )  0.......................................................(2.8)
ˆ
Rearranging this expression we will get: Y  n  ˆX ……….(2.9)
 i i

If you divide (2.9) by ‘n’ and rearrange, we get


ˆ  Y  ˆX .......... .......... .......... .......... .......... .......... .......... ....( 2.10)
  ei2
2.  2 X i (Yi  ˆ  ˆX )  0..................................................(2.11)
ˆ


129 | P a g e
Note: the term in the parenthesis in equation 2.8 and 2.11 is the residual, e  Yi  ˆ  ˆX i .
Hence it is possible to rewrite (2.8) and (2.11) as  2 ei  0 and  2 X i ei  0 .
It follows that;
e i  0 and X e i i  0............................................(2.12)
If we rearrange equation (2.11) we obtain;
 Yi X i  ˆX i  ˆX i2 ……………………………………….(2.13)
Equation (2.9) and (2.13) are called the Normal Equations. Substituting the values of ̂ from
(2.10) to (2.13), we get:
Y X  X (Y  ˆX )  ˆX 2
 i i i i

 Y X i  ˆXX i  ˆX i2

Y X i i  Y X i  ˆ (X i2  XX i )
XY  nXY = ˆ ( X i2  nX 2)
 XY  n X Y
ˆ  ………………….(2.14)
 X i2  n X 2
Equation (2.14) can be rewritten in somewhat different way as follows;
( X  X )(Y  Y )  ( XY  XY  XY  XY )
 XY  Y X  XY  nXY
 XY  nY X  nXY  nXY

( X  X )(Y  Y )  XY  n X Y              (2.15)
( X  X ) 2  X 2  nX 2                  (2.16)
Substituting (2.15) and (2.16) in (2.14), we get
( X  X )(Y  Y )
ˆ 
( X  X ) 2
Now, denoting ( X i  X ) as xi , and (Yi  Y ) as yi we get;
 xi yi
ˆ  ……………………………………… (2.17)
 x i2
The expression in (2.17) is termed as the deviation form.
2.2.2.2 Estimation of a function with zero intercept
Suppose: Yi     X i  U i , subject to the restriction   0.
To estimate ˆ , put in a form of restricted minimization problem and then Lagrange method is
applied.
n
We minimize: ei2   (Yi  ˆ  ˆX i ) 2
i 1

130 | P a g e
Subject to: ˆ  0
The composite function then becomes
Z   (Yi  ˆ  ˆX i ) 2  ˆ , where  is a Lagrange multiplier.
We minimize the function with respect to ˆ , ˆ , and 
Z
 2(Yi  ˆ  ˆX i )    0        (i)
ˆ
Z
 2(Yi  ˆ  ˆX i ) ( X i )  0        (ii )
ˆ
z
 2  0                    (iii)

Substituting (iii) in (ii) and rearranging we obtain:
X (Y  ˆX )  0
i i i

Yi X i  ˆX i  0
2

X i Yi
ˆ  ……………………………………..(2.18)
X i2
This formula is in actual observation not in deviation forms.

2.2.2.3. Statistical Properties of Least Square Estimators


An estimate must be close to the value of the true population parameters (small range of
variation around the true parameter). How are we to choose among the different econometric
methods, the one that gives ‘good’ estimates? We need some criteria for judging the ‘goodness’
of an estimator. ‘Closeness’ of the estimate to the population parameter is measured by the mean
and variance or standard deviation of the sampling distribution of the estimates of the different
econometric methods. We assume the usual process of repeated sampling i.e. we assume that we
get a very large number of samples each of size ‘n’; we compute the estimates ˆ ’s from each
sample, and for each econometric method and we form their distribution. We next compare the
mean (expected value) and the variances of these distributions and we choose among the
alternative estimates the one whose distribution is concentrated as close as possible around the
population parameter.
Properties of OLS Estimators
Optimum properties of OLS estimators may be summarized by well known theorem known as
the Gauss-Markov Theorem.
States that: “the least squares estimators are linear, unbiased and have minimum variance,
BLUE property.
An estimator is called BLUE if:
a. Linear:- a linear function of the random variable, such as, the dependent variable Y.
131 | P a g e
b. Unbiased:- its average or expected value is equal to the true population parameter.
c. Minimum variance:- It has a minimum variance in the class of linear and unbiased
estimators. An unbiased estimator with the least variance is known as an efficient
estimator.
According to the Gauss-Markov theorem, the OLS estimators possess all the BLUE properties.
The detailed proof of these properties are presented below
a. Linearity: for ( ˆ )
Proposition: ˆ & ˆ are linear in Y.
b. Unbiasedness:
Proposition: ˆ & ˆ are the unbiased estimators of the true parameters  & 
Recall that if ̂ is an estimator of  then E (ˆ)    the amount of bias and
if ̂ is the unbiased estimator of  then bias =0 i.e. E (ˆ)    0  E (ˆ)  
In our case, ˆ & ˆ are estimators of the true parameters  &  .To show that they are the
unbiased estimators of their respective parameters means that:
c. Minimum variance of ˆ and ˆ
Now, we have to establish that out of the class of linear and unbiased estimators of  and  ,
ˆ and ˆ possess the smallest sampling variances. For this, we shall first obtain variance
of ˆ and ˆ and then establish that each has the minimum variance in comparison of the
variances of other linear and unbiased estimators obtained by any other econometric methods
than OLS.
We have computed the variances OLS estimators.
Now, check whether these variances of OLS estimators do possess minimum variance property
compared to the variances of other estimators of the true  and  , other than ˆ and ˆ .
To establish that ˆ and ˆ possess minimum variance property, we compare their variances
with that of the variances of some other alternative linear and unbiased estimators of  and  ,
say  * and  * . Now, we want to prove that any other linear and unbiased estimator of the true
population parameter obtained from any other econometric method has larger variance than that
of OLS estimators.
Lets first show minimum variance of ˆ and then that of ̂ .
The variance of the random variable (Ui)
 The OLS estimators involve  2  pop/n variance of the random disturbance term.
 But difficult to obtain the population data of the disturbance term because of technical and
economic reasons.
 Hence, difficult to compute  2 ; this implies that variances of OLS estimates are also difficult
to compute.

132 | P a g e
 But we can compute these variances if we take the unbiased estimator of  2 which is ˆ 2
from the sample value of the disturbance term �� from the expression:
ei2
ˆ u2  …………………………………..2.30
n2
To use ˆ 2 in the expressions for the variances of ˆ and ˆ , we have to prove whether ˆ 2 is
unbiased estimator of  2 , i.e.,
e   
2

E (ˆ 2 )  E 
i 2

n2

2.2.2.4. Statistical test of Significance of the OLS Estimators (First


Order tests)
After estimation of the parameters, we need to know how ‘good’ is the fit of the regression line
to the sample observation of Y and X. That is, we need to measure the dispersion of observations
around the regression line.
The closer the observation to the line, the better the goodness of fit, i.e. the better is the
explanation of the variations of � by the changes in the explanatory variables.
We divide the available criteria into three groups: the theoretical a priori criteria, the statistical
criteria, and the econometric criteria. Under this section, our focus is on statistical criteria (first
order tests). The two most commonly used first order tests in econometric analysis are:
1) The coefficient of determination (R2):- is used for judging the explanatory power of the
independent variable(s).
2) The standard error tests of the estimators: is used for judging the statistical reliability of
the estimates of the regression coefficients.
1. The Coefficient of Determination r2: A Measure of “Goodness of Fit”
After the estimation of the parameters, we need to know how ‘good’ is the fit of the regression
line to the sample observation of Y and X. That is, we need to measure the dispersion of
observations around the regression line.
The closer the observation to the line, the better the goodness of fit, i.e. it better explains the
variations of Y by the changes in the explanatory variables.
We use three criteria:- the theoretical a priori criteria, the statistical criteria, and the
econometric criteria.
The two most commonly used Statistical criteria (first order tests) in econometric analysis are:
3) The coefficient of determination (R2):- is used to decide the explanatory power of the
independent variable(s).
4) The standard error tests of the estimators (s.e): is used for judging the statistical reliability
of the estimates of the regression coefficients.
1. The Coefficient of Determination r2: A Measure of “Goodness of Fit”
 r2 is a measure that tells how well the sample regression line fits the data.

133 | P a g e
 It measures the proportion or percentage of the total variation in Y explained by the
regression model.
 If all the observations lie on the regression line, a “perfect” fit, but this is rarely the case,
there will be some positive �� and some negative �� .
 By fitting the line Yˆ  ˆ 0  ˆ1 X we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes in the explanatory variable X.

 y i2  
 yˆ 2   e i2 ………………………………... (2.47)
 
Total Explained Un exp lained
var iation var iation var ation

�2� = 2 �2� + �2�


OR,
Total sum of Explained sum Re sidual sum
 
square of square of square i.e.
  
        
TSS ESS RSS

TSS  ESS  RSS … this shows the total variation in the observed Y values about their
mean value can be partitioned into two parts:
i. ��� = attributable to the regression line and
ii. ��� = random forces because not all actual Y observations lie on the fitted line.
Now dividing the above Equation by TSS on both sides, we obtain:
��� ���
1= +
��� ���
(�� −�)2 �2�
1= )2
+
(�� −� (�� −�)2
We now define �2 as:
(��−�)2 ��� �2� ���
�2 = (�� −�)2
= ��� �� �2 = 1 − (�� −�)2
�2 = 1 − ���

Properties of r2
i. �� �� � ����������� ��������
ii. ��� ������ ��� 0 ≤ � 2 ≤ 1.
iii. �2 = 1 ����� � ������� ���, that is, Ŷ i = Yi for each i. On the other hand,
iv. �2 = 0 ����� �� ���������ℎ�� ������� �ℎ� ���������� ��� �ℎ� ���������.
2.6 Confidence Intervals and Hypothesis Testing
To test the significance of the OLS estimators we need:
 Variance of the parameter estimators
 Unbiased estimator of  2
 The assumption of normal distribution of error term:
The OLS estimators ˆ and ˆ are obtained from a sample of observations on Y and X. Since
sampling errors are inevitable in all estimates, it is necessary to apply test of significance in order
to measure the size of the error and determine the degree of confidence in order to measure the
validity of these estimates. This can be done by using various tests. The most common ones are:
134 | P a g e
i) Standard error test
ii) Student’s t-test
iii) Confidence interval
All of these testing procedures reach on the same conclusion. Let us now see these testing
methods one by one.
i) Standard error (s.e test
 To decide whether the estimators ˆ and ˆ are significantly different from zero, i.e.
whether the sample from which they have been estimated might have come from a
population whose true parameters are zero,   0 and / or   0 .
Formally we test the null hypothesis
Null hypothesis, H 0 :  i  0 against
Alternative hypothesis, H 1 :  i  0
The standard error test may be outlined as follows.
1st : Compute standard error of the parameters.
SE ( ˆ )  var(ˆ )
SE (ˆ )  var(ˆ )
2nd : compare the �� with the numerical values of ˆ and ˆ .
Decision rule:
 If SE ( ˆ ) 
i
1
2 ˆi , accept �0 and reject �1 .
 We conclude that ̂ i is statistically insignificant.
 If SE ( ˆi )  1 2 ˆi , reject �0 and accept �1 .
 We conclude that ̂ i is statistically significant.
The acceptance or rejection of the null hypothesis has definite economic meaning.
 Acceptance of the �0 :   0 (the slope parameter is zero) implies that:
- the explanatory variable to which this estimate relates does not influence the
dependent variable Y.
- evidenced that changes in � ����� � ����������.
- Y    (0) x   , i.e. no relationship between X and Y.
Numerical example:
Suppose that from a sample of size n=30, we estimate the following supply function.
Q  120  0.6 p  ei
SE : (1.7) (0.025)
Test the significance of the slope parameter at 5% level of significance using the �� test.
SE ( ˆ )  0.025
( ˆ )  0.6 , 1
2 ˆ  0.3
 SE ( ˆi )  1
2 ˆi ,

135 | P a g e
 The implication is ˆ is statistically significant at 5% level of significance.
Note: The SE test is an approximated test (which is approximated from the z-test and t-test) and
implies a two tail test conducted at 5% level of significance.

ii) Student’s t-test


 Important to test the significance of the parameters.
 a test of significance is a procedure by which sample results are used to verify the truth or
falsity of a null hypothesis.
 a statistic is said to be statistically significant if the value of the test statistic lies in the
critical region. In this case the null hypothesis is rejected.
 By the same token, a test is said to be statistically insignificant if the value of the test
statistic lies in the acceptance region.
From statistics, any variable � can be transformed into � using the general formula:
X 
t ,���ℎ � − 1 degree of freedom.
sx
Where  = value of the population mean
s x  sample estimate of the population standard deviation

( X  X ) 2
sx 
n 1
n  sample size
Recall that under the normality assumption we can derive the t-value of the OLS estimates
ˆ   
t ˆ  i 
SE ( ˆ ) 
 ���ℎ � − � ������ �� �������.
ˆ   
tˆ 
SE (ˆ ) 
�ℎ���: �� = �� �������� �����
� = ������ �� ���������� �� �ℎ� �����.
������� �ℎ� � ������������ ���ℎ � − 2 ��.
And since this test statistic follows the t distribution, the confidence-interval statements can be
constructed as the following:

�−�∗
Pr − � ≤ ≤� = 1 −
 �� � 
2 2

Where �∗ is the value � under �0 and where − � ��� � are the values of � (the critical t
 
2 2
values) obtained from �ℎ� � �����.
Rearranging the above equation gives:-

136 | P a g e
Pr �∗ − � �� � ≤ � ≤ �∗ + � �� � = 1 −
 
2 2

which gives the interval in which � will fall ���ℎ 1 − � �����������.


 100(1 − �)% �� established in above equation is known as the region of acceptance of �0
and
 the region(s ) ������� �ℎ� �� �� (are) called the region(s) of rejection(of H0) or the critical
region(s).
 the confidence limits, the endpoints of the confidence interval, are also called critical values.
Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is � − 2.
Formal testing procedure of the hypothesis:
���� �: ��������� ���� ��� ����������� ℎ����ℎ����
H 0 :  i  0 for the slope parameter;
H1 : i  0
and
H0 :  0 for the intercept.
H1 :   0
���� �: ������� ���� ��������� (�∗ ), by taking the value of  in the null hypothesis.
ˆ  0 ˆ
t*  
SE ( ˆ ) SE ( ˆ )
���� �: �ℎ���� ����� �� ������������.
- is the probability of making ‘�����’ ��������
- the probability of rejecting the hypothesis when it is actually true or the probability of
committing a type I error.
���� �: �ℎ��� �ℎ��ℎ�� �� �� ��� ���� ���� �� ��� ���� ����.
- If the inequality sign in the alternative hypothesis is  , then it implies a two tail test and
divide the chosen level of significance by two; decide the critical region or critical value
of � called �� .
- if the sign is either > or < , ��� ���� ���� and no need to divide the chosen level of
significance by two to obtain the critical value from the t-table.
Example:
If we have H 0 :  i  0
against: H1 : i  0
Then this is a two tail test. If the level of significance is 5%, divide it by two to obtain critical
value of t from the t-table.
���� �: ������ �� ��  2 ���ℎ � − 2 �� ��� ��� ���� ����.
Step 5: Compare � ∗ (the computed value of t) with �� (critical value of t)

137 | P a g e
 If � ∗ > �� , reject �0 and accept�1 .
- Conclusion: ˆ is statistically significant.
 If � ∗ < �� , accept �0 and reject�1 .
- Conclusion: ˆ is statistically insignificant.
 A statistic is said to be statistically significant if the value of the test statistic lies in the
critical region. In this case the null hypothesis is rejected.
 By the same token, a test is said to be statistically insignificant if the value of the test
statistic lies in the acceptance region.
TABLE 2.1: The t Test of Significance: Decision
Numerical Example:
Suppose that from a sample size n=20 we estimate the following consumption function:
� = 100 + 0.7� + �
(75.5) (0.21)
The values in the brackets are standard errors. We want to test the null hypothesis:
H 0 :  i  0 against
H 1 :  i  0 , use the � − ���� at 5% level of significance.
a. the t-value for the test statistic is:
ˆ  0 ˆ 0.70
t*   =  3.3
SE ( ˆ ) SE ( ˆ ) 0.21
b. Since the alternative hypothesis (H1) is stated by inequality sign (  ) ,it is a two tail test,
hence we divide  2  0.05 2  0.025 to obtain the critical value of ‘t’ at  2 =0.025 and 18
degree of freedom (df) i.e. (n-2=20-2). From the
t-table ‘tc’ at 0.025 level of significance and 18 df is 2.10.
c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ˆ is statistically significant.
iii) Confidence interval
a) Confidence Intervals for Regression Coefficients
Rejection of the null hypothesis doesn’t mean that our estimates ˆ and ˆ are correct estimate of
the true population parameter  and  . It simply means that our estimate comes from a sample
drawn from a population whose parameter  is different from zero.
In order to define how close the estimate to the true parameter, we must construct confidence
interval for the true parameter, in other words we must establish limiting values around the
estimate with in which the true parameter is expected to lie within a certain “degree of
confidence”. In this respect we say that with a given probability the population parameter will
be within the defined confidence interval (confidence limits).
We choose a probability in advance and refer to it as confidence level (interval coefficient). It is
customarily in econometrics to choose the 95% confidence level. This means that in repeated
sampling the confidence limits, computed from the sample, would include the true population
parameter in 95% of the cases. In the other 5% of the cases the population parameter will fall
outside the confidence interval.
138 | P a g e
In a two-tail test at  level of significance, the probability of obtaining the specific t-value either
–tc or tc is  2 at n-2 degree of freedom. The probability of obtaining any value of t which is
ˆ  
equal to at n-2 degree of freedom is 1   2   2  i.e. 1   .
SE ( ˆ )
i.e. Pr t c  t*  t c   1   …………………………………………(2.57)
ˆ  
but t*  …………………………………………………….(2.58)
SE ( ˆ )
Substitute (2.58) in (2.57) we obtain the following expression.
 ˆ   
Pr  t c   t c   1   ………………………………………..(2.59)
 SE ( ˆ ) 
 
Pr  SE ( ˆ )t c  ˆ    SE ( ˆ )t c  1        by multiplying SE ( ˆ )
Pr ˆ  SE ( ˆ )t      ˆ  SE ( ˆ )t   1        by subtracting ˆ
c c

Pr ˆ  SE ( ˆ )    ˆ  SE ( ˆ )t   1        by multiplying by  1
c

Prˆ  SE ( ˆ )t    ˆ  SE ( ˆ )t   1        int erchanging


c c

The limit within which the true  lies at (1   )% degree of confidence is:
[ ˆ  SE ( ˆ )t c , ˆ  SE ( ˆ )t c ] ; where t c is the critical value of t at  2 confidence interval and n-2
degree of freedom.
The test procedure is outlined as follows.
H0 :   0
H1 :   0
Decision rule: If the hypothesized value of  in the null hypothesis is within the confidence
interval, accept H0 and reject H1. The implication is that ˆ is statistically insignificant; while if
the hypothesized value of  in the null hypothesis is outside the limit, reject H0 and accept H1.
This indicates ˆ is statistically significant.
*A 100(1−α)% confidence interval for β2.

Numerical Example:
Suppose we have estimated the following regression line from a sample of 20 observations.
Y  128.5  2.88 X  e
(38.2) (0.85)
The values in the bracket are standard errors.
a. Construct 95% confidence interval for the slope of parameter
b. Test the significance of the slope parameter using constructed confidence interval.
Solution:

139 | P a g e
a. The limit within which the true  lies at 95% confidence interval is:
ˆ  SE ( ˆ )t c

ˆ  2.88 , SE ( ˆ )  0.85 , t c at 0.025 level of significance and 18 degree of


freedom is 2.10.
 ˆ  SE ( ˆ )t  2.88  2.10(0.85)  2.88  1.79.
c

The confidence interval is:


(1.09, 4.67)
b. The value of  in the null hypothesis is zero which implies it is outside the confidence
interval. Hence  is statistically significant.
Review Questions
1. Econometrics deals with the measurement of economic relationships which are stochastic
or random. The simplest form of economic relationships between two variables X and Y
can be represented by:
Yi   0   1 X i  U i ; where  0 and  1  are regression parameters and U i  the
stochastic disturbance term
What are the reasons for the insertion of U-term in the model?
2. The following data refers to the demand for money (M) and the rate of interest (R) in for
eight different economics:
M (In billions) 56 50 46 30 20 35 37 61
R% 6.3 4.6 5.1 7.3 8.9 5.3 6.7 3.5
a. Assuming a relationship M     R  U i , obtain the OLS estimators of  and 
b. Calculate the coefficient of determination for the data and interpret its value
c. If in a 9th economy the rate of interest is R=8.1, predict the demand for money(M) in
this economy.
3. The following data refers to the price of a good ‘P’ and the quantity of the good supplied,
‘S’.
P 2 7 5 1 4 8 2 8
S 15 41 32 9 28 43 17 40
a. Estimate the linear regression line ( S )     P
b. Estimate the standard errors of ˆ and ˆ
c. Test the hypothesis that price influences supply
d. Obtain a 95% confidence interval for 
4. The following results have been obtained from a simple of 11 observations on the values
of sales (Y) of a firm and the corresponding prices (X).
X  519.18
Y  217.82
 X  3,134,543
i
2

 X Y  1,296,836
i i

 Y  539,512
i
2

140 | P a g e
i) Estimate the regression line of sale on price and interpret the results
ii) What is the part of the variation in sales which is not explained by the
regression line?
iii) Estimate the price elasticity of sales.
5. The following table includes the GNP(X) and the demand for food (Y) for a country over
ten years period.
year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Y 6 7 8 10 8 9 10 9 11 10
X 50 52 55 59 57 58 62 65 68 70
a. Estimate the food function
b. Compute the coefficient of determination and find the explained and unexplained
variation in the food expenditure.
c. Compute the standard error of the regression coefficients and conduct test of
significance at the 5% level of significance.
6. A sample of 20 observation corresponding to the regression model Yi     X i  U i
gave the following data.
 Yi  21.9  Yi  Y   86.9
2

 X  186.2  X  X   215.4
2
i i

  X  X Y
i i  Y   106.4
a. Estimate  and 
b. Calculate the variance of our estimates
c.Estimate the conditional mean of Y corresponding to a value of X fixed at X=10.
7. Suppose that a researcher estimates a consumptions function and obtains the following
results:
C  15  0.81Yd n  19
(3.1) (18.7) R 2  0.99
where C=Consumption, Yd=disposable income, and numbers in the parenthesis are the ‘t-
ratios’
a. Test the significant of Yd statistically using t-ratios
b. Determine the estimated standard deviations of the parameter estimates
8. State and prove Guass-Markov theorem
9. Given the model:
Yi   0   1 X i  U i with usual OLS assumptions. Derive the expression for the error
variance.

CHAPTER THREE
The Multiple Linear Regression Analysis

141 | P a g e
3.1 Introduction
A dependent variable Y can depend on many factors (explanatory variables) or regressors. For
instance, in demand studies we study the relationship between quantity demanded of a good and
price of the good, price of substitute goods and the consumer’s income. The model we assume
is:
Yi   0   1 P1   2 P2   3 X i  u i -------------------- (3.1)
Where Yi  quantity demanded, P1 is price of the good itself, P2 is price of substitute goods, Xi is
consumer’s income, and  ' s are unknown parameters and u i is the disturbance.
Equation (3.1) is a multiple regression with three explanatory variables. In general for K-
explanatory variable we can write the model as follows:
Yi   0  1 X 1i   2 X 2i   3 X 3i  .........   k X ki  u i ------- (3.2)
Where X k i  (i  1,2,3,......., K ) are explanatory variables, Yi is the dependent variable and
 j ( j  0,1,2,....( k  1)) are unknown parameters and u i is the disturbance term.
The disturbance term has similar nature to that in simple regression, reflecting:
- the basic random nature of human responses
- errors of aggregation
- errors of measurement
- errors in specification and any other factors, other than xi that might influence Y.
Let’s start our discussion with the assumptions of the multiple regressions and we will proceed
our analysis with the case of two explanatory variables and then we will generalize the multiple
regression model in the case of k-explanatory variables using matrix algebra.

3.2 Assumptions of Multiple Regression Models


Multiple linear regression model assumptions are the same with the single explanatory variable
model developed earlier except the assumption of no perfect multicollinearity.
These assumptions are:
1. Randomness of the error term: The variable u is a real random variable.
2. Zero mean of the error term: E (u i )  0
3. Homoscedasticity: The variance of each u i is the same for all the xi values.
2 2
i.e. E (ui )   u (constant variance)
4. Normality of u: The values of each u i are normally distributed.
i.e. U i ~ N (0,  2 )
5. No autocorrelation or serial correlation between the disturbance terms:
The values of u i (corresponding to Xi ) are independent from the values of any other ��
(corresponding to Xj ) for i j.
�. �. ���(�� , �� , ) = 0 �. �. E (u i u j )  0 for xi  j
6. Independence of u i and Xi :

142 | P a g e
Every disturbance term u i is independent of the explanatory variables,
��� (��, �1� ) = ��� (�� , �2� ) = 0. �. �. E (u i X 1i )  E (u i X 2i )  0
7. No specification bias: The model must be correctly specified.
8. No perfect multicollinearity:-The explanatory variables are not perfectly linearly
correlated.
- Informally, no collinearity means none of the regressors can be written as exact
linear combinations of the remaining regressors in the model.
- Formally, no collinearity means that there exists no set of numbers, λ2 and λ3, not
both zero such that: λ2X2i +λ3X3i =0.
- If such an exact linear relationship exists, then X2 andX3 are said to be collinear
or linearly dependent. On the other hand, if λ2=λ3=0, then X2 and X3 are said to
be linearly independent.
3.3. A Model with Two Explanatory Variables
In order to understand the nature of multiple regression model easily, we start our analysis with
two explanatory variables, then extend this to the case of k-explanatory variables.
3.3.1 Estimation of parameters of two-explanatory variables model
The model, PRF: Y   0   1 X 1   2 X 2  U i ……………………………………(3.3)
Let us suppose that the sample data has been used to estimate the population regression equation
and equation (3.4) has been estimated by sample regression equation, which we write as:
Yˆ  ˆ  ˆ X  ˆ X ……………………………………………….(3.5)
0 1 1 2 2

Where ˆ j are estimators of the  j and Ŷ is known as the predicted value of Y.


Given sample observations on Y , X 1 & X 2 , we estimate (3.3) using the method of least square
(OLS).
Y  ˆ0  ˆ1 X 1i  ˆ2 X 2i  ei ……………………………………….(3.6)
is sample relation between Y , X 1 & X 2 .
ei  Yi  Yˆ  Yi  ˆ0  ˆ1 X 1  ˆ 2 X 2 …………………………………..(3.7)
To obtain expressions for the least square estimators, we partially differentiate e
2
i with respect
to ˆ0 , ˆ1 and ˆ 2 and set the partial derivatives equal to zero.

  ei2   
 2 Yi  ˆ 0  ˆ1 X 1i  ˆ 2 X 2i  0 ………………………. (3.8)
ˆ
 0


  ei2   
 2 X 1i Yi  ˆ 0  ˆ1 X 1i  ˆ1 X 1i  0 ……………………. (3.9)
ˆ
 1


  ei2   
 2 X 2i Yi  ˆ 0  ˆ1 X 1i  ˆ 2 X 2i  0 ………… ………..(3.10)
ˆ 2

Summing from 1 to n, the multiple regression equation produces three Normal Equations:

143 | P a g e
 Y  nˆ  ˆ X  ˆ X …………………………………….(3.11)
0 1 1i 2 2i

 X Y  ˆ X  ˆ X  ˆ X X …………………………(3.12)
2i i 0 1i 1
2
1i 2 1i 1i

 X Y  ˆ X  ˆ X X  ˆ X ………………………...(3.13)
2i i 0 2i 1 1i 2i 2
2
2i

From (3.11) we obtain ̂ 0


ˆ0  Y  ˆ1 X 1  ˆ2 X 2 ------------------------------------------------- (3.14)
Substituting (3.14) in (3.12) , we get:
 X 1iYi  (Y  ˆ1 X 1  ˆ2 X 2 )X 1i  ˆ1X 1i  ˆ2 X 2i
2

 X Y  Y X  ˆ1 (X 1i  X 1X 2i )  ˆ 2 (X 1i X 2i  X 2 X 2i )


2
 1i i 1i

  X Y  nY X  ˆ 2 (X 1i  nX 1i )  ˆ 2 (X 1i X 2  nX 1 X 2 ) ------- (3.15)


2 2
1i i 1i

We know that
 X  Yi   (X i Yi  nX i Yi )  xi y i
2
i

 X  X i   (  X i  nX i )   x i
2 2 2 2
i

Substituting the above equations in equation (3.14), the normal equation (3.12) can be written in
deviation form as follows:
 x y  ˆ x  ˆ x x …………………………………………(3.16)
2
1 1 1 2 1 2

Using the above procedure if we substitute (3.14) in (3.13), we get


 x y  ˆ x x  ˆ x ………………………………………..(3.17)
2
2 1 1 2 2 2

Let’s bring (2.17) and (2.18) together


 x y  ˆ x  ˆ x x ……………………………………….(3.18)
2
1 1 1 2 1 2

x y  ˆ1x1 x 2  ˆ 2 x 2 ……………………………………….(3.19)
2
2

ˆ1 and ˆ 2 can easily be solved using matrix


We can rewrite the above two equations in matrix form as follows.
 x12  x1 x 2 ˆ  x1y ………….(3.20) 1

x x 1 2  x2
2
ˆ 2 =  yx 2

If we use Cramer’s rule to solve the above matrix we obtain


2
x y . x  x x . x y
ˆ1  1 2 2 2 1 2 2 2 …………………………..…………….. (3.21)
x1 . x2  ( x1 x2 )
2
x y . x  x1 x2 . x1 y
ˆ2  2 2 1 2 ………………….……………………… (3.22)
x1 . x2  ( x1 x2 ) 2
3.3.2 Coefficient of Multiple Determination(R2)

144 | P a g e
In multiple linear regression models R2 measures the proportion of variation in the dependent
variable explained by all explanatory variables included in the model.
The coefficient of determination R2 has been defined as:
2
ESS
2 RSS e
R   1  1 i2
TSS TSS y i
ei2  y 2  ˆ1x1i yi  ˆ 2 x2i yi
 ˆ1x1i y i  ˆ 2 x 2i y i 
2
 
 y2 ei
  
Total sum of Explained sum of Re sidual sum of squares
square (Total square ( Explained ( un exp lained var iation )
var iation ) var iation )

ESS ˆ1x1i y i  ˆ 2 x 2i y i
In three variable case  R 2  
TSS y 2

 �2 measures the prediction ability of the model over the sample period, or
 It measures how well the estimated regression line fits the data.
 The value of �2 is equal to the squared sample correlation coefficient between Yˆ & Yt .
Since the sample correlation coefficient measures the linear association between two
variables,
 If �2 is high, there is a close association between the values of Yt and the values predicted
by the model, Ŷt . In this case, the model is said to “fit” the data well.
 If �2 is low, there is no association between the values of Yt and the values predicted by
the model, Ŷt and the model does not fit the data well.
Adjusted Coefficient of Multiple Determination ( R 2 )
 One difficulty with R 2 is that it can be made large by adding more and more variables,
even if the variables added have no economic justification. Algebraically, it is the fact
that as the variables are added the sum of squared errors (RSS) goes down (it can remain
unchanged, but this is rare) and thus R 2 goes up.
 Adjusted R 2 is an alternative way of measuring goodness of fit, and symbolized as R 2 .
It is computed as:
e 2 / n  k  n 1 
R 2  1  i2  1  (1  R 2 ) 
y / n  1 nk
 This measure does not always goes up when a variable is added because of the degree of
freedom term � − � in the denominator.
 As the number of variables increases, ��� goes down, but so does n-k. The effect on
R 2 depends on the amount by which R 2 falls.
 While solving one problem, this corrected measure of goodness of fit unfortunately
introduces another one. It loses its interpretation; R 2 is no longer the percent of variation
explained.
Variances and Standard Errors of OLS Estimators

145 | P a g e
� ��� ���� + ��� ���� − ��� �� ������
��� � = + . �
� ���� ���� −( ��� ��� )�

�. � � = ��� �

Where � is intercept coefficient


����
��� � = . �
���� . ���� –( ��� ��� )�

�
��� � = , where �12 is the sample
���� .(� –����
coefficient of correlation between �1 and �2 .

�. � � = ��� �
����
��� � = . �
���� . ���� – ( ��� ��� )�

�
��� � =
���� .(� –����

�. � � = ��� �
3.3 Properties of OLS Estimators and Gauss-Markov Theorem
3.3.2 Statistical Properties of the Parameters (Matrix Approach)
(Gauss-Markov Theorem)
In multiple regressions, the OLS estimators must satisfy the small sample property of estimators
(BLUE property). Now we proceed to examine the desired properties of the estimators in matrix
notations as follows:
1. Linearity
2. Unbiasedness.
3. Minimum variance
3.5. Hypothesis Testing in Multiple Regression Model
In multiple regression models we will undertake two tests of significance.
1. Significance of individual parameters of the model and
2. Overall significance of the model.
1. Tests of individual significance
Assuming that U i ~. N (0,  2 ) , we can use either the t-test or standard error test to test a
hypothesis about individual partial regression coefficient.
To illustrate consider the following example.
Let Y  ˆ  ˆ X  ˆ X  e ………………………………… (3.51)
0 1 1 2 2 i

i) H 0 :  1  0
H 1 : 1  0

146 | P a g e
ii) H 0 :  2  0
H1 :  2  0
The null hypothesis in (i) states that, holding X2 constant X1 has no influence on Y.
Similarly hypothesis (ii) states that holding �1 constant, �2 has no influence on the dependent
variable ��.
To test these null hypothesis we will use the following tests:
i- Standard error test: Under this testing method let’s test only for ˆ .The test for ˆ
1 2
will be done in the same way.
ˆ 2  x 22i ei2
SE ( ˆ1 )  var( ˆ1 )  ; where ˆ 2 
x x
2
1i
2
2i  ( x1 x 2 ) 2 n3

 If SE ( ˆ1 )  1 2 ˆ1 , we accept the null hypothesis that is, we can conclude that the
estimate  i is not statistically significant.
If �� 1 < 1 2 1 , we reject the null hypothesis, and we can conclude that the

estimate  i is statistically significant.
Note: The smaller the standard errors, the stronger the evidence that the estimates are statistically
reliable.
ii. The student’s t-test: We compute the � − ����� for each ̂ i
ˆ  
t*  i ~ t n -k , where n is number of observation and k is number of parameters. If
SE ( ˆi )
we have 3 parameters, the degree of freedom will be (� − 3). So;
ˆ   2
t*  2 ; with (� − 3) degree of freedom
SE ( ˆ 2 )
In our null hypothesis  2  0, the t* becomes:
ˆ 2
t* 
SE ( ˆ 2 )
 If �∗ (��������) < � (���������),
- we accept the null hypothesis ( �0 ), we can conclude that ˆ 2 is not significant.
This implies the regressor does not contribute to the explanation of the variations
in the dependent variable(�).
 If �∗ �������� > � (���������),
ˆ
- we reject the null hypothesis ( � ) and we accept the alternative ( � );  2 is
0 1
statistically significant.
- This implies the regressor does contribute to the explanation of the variations in
the dependent variable(�).

147 | P a g e
- Thus, the greater the value of t* the stronger the evidence that  i is statistically
significant.
2 Test of Overall Significance of the Model
In this section we extend significance test of the individual estimated partial regression
coefficient to joint test of the relevance of all the included explanatory variables. Now consider
the following:
Y   0   1 X 1   2 X 2  .........   k X k  U i
H 0 :  1   2   3  ............   k  0
H1 : at least one of the � is non-zero
This null hypothesis is a joint hypothesis that  1 ,  2 ,........ k are jointly or simultaneously equal
to zero. A test of such a hypothesis is called a test of overall significance of the observed or
estimated regression line, that is, whether Y is linearly related to X 1 , X 2 ,........ X k .
The test procedure for any set of hypothesis can be based on a comparison of the sum of squared
errors from the original, the unrestricted multiple regression model to the sum of squared errors
from a regression model in which the null hypothesis is assumed to be true. When a null
hypothesis is assumed to be true, we in effect place conditions or constraints, on the values that
the parameters can take, and the sum of squared errors increases. The idea of the test is that if
these sum of squared errors are substantially different, then the assumption that the joint null
hypothesis is true has significantly reduced the ability of the model to fit the data, and the data do
not support the null hypothesis.
If the null hypothesis is true, we expect that the data are compatible with the conditions placed
on the parameters. Thus, there would be little change in the sum of squared errors when the null
hypothesis is assumed to be true.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors in the model
obtained by assuming that the null hypothesis is true and URSS be the sum of the squared error
of the original unrestricted model i.e. unrestricted residual sum of square (URSS). It is always
true that RRSS - URSS  0.
Consider Yˆ  ˆ  ˆ X  ˆ X  .........  ˆ X  e .
0 1 1 2 2 k k i

This model is called unrestricted. The test of joint hypothesis is that:


H 0 :  1   2   3  ............   k  0
H1 : at least one of the  k is different from zero.
The sum of squared error when the null hypothesis is assumed to be true is called Restricted
Residual Sum of Square (RRSS) and this is equal to the total sum of square (TSS).
RRSS  URSS / K  1
The ratio: ~ F( k 1,nk ) (has an F-ditribution with k-1 and n-k
URSS / n  K
degrees of freedom for the numerator and denominator respectively)
RRSS  TSS

148 | P a g e
URSS  ei2  y 2  ˆ1yx1  ˆ 2 yx 2  ..........ˆ k yx k  RSS
(TSS  RSS ) / k  1
F
RSS / n  k
ESS / k  1
F ………………………………………………. (3.54)
RSS / n  k
If we divide the above numerator and denominator by y 2  TSS then:
ESS
/ k 1
F TSS
RSS
/nk
TSS
R2 / k 1
F …………………………………………..(3.55)
1 R2 / n  k
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or R2 & 1-
R2. If the null hypothesis is not true, then the difference between RRSS and URSS (TSS & RSS)
becomes large, implying that the constraints placed on the model by the null hypothesis have
large effect on the ability of the model to fit the data, and the value of F tends to be large. Thus,
we reject the null hypothesis if the F test static becomes too large. This value is compared with
the critical value of F which leaves the probability of  in the upper tail of the F-distribution
with k-1 and n-k degree of freedom.
If the computed value of F is greater than the critical value of F (k-1, n-k), then the parameters
of the model are jointly significant or the dependent variable Y is linearly related to the
independent variables included in the model.
Application of Multiple Linear Regression Models
Example 1: Consider the data given below fit a linear function:
Y    1 X 1   2 X 2   3 X 3  U
Σyi2=619; Σx1x2=240; Σx2x3=-420; Σx1x3=-330;

Σx12=270

Σx22=630; Σx32=750; Σx1yi=319; Σx2yi=492;

Σx3yi=-625;

N = 10; Y =52 ; X 1 =42 ; X 2 =62; X 3 =200


Small letters are in deviation form. Based on the information given above and model answer the
following question.
i. Estimate the parameter estimators using the matrix approach
ii. Compute the variance of the parameters.
iii. Compute the coefficient of determination (R2)and Adjusted coefficient of determination
( R 2 ) and interpret the result

149 | P a g e
iv. Report the regression result.
v. Test significance of each partial slope coefficients at 5% significance level
[use t0.025(6)=2.447]
vi. Test the overall significance of the model.[use F0.05(3,6)=4.76]

Chapter Four
Violations of basic Classical Assumptions
4. Introduction
The classicalists set important assumptions about the distribution of Yt and the random error
term ‘ut’ in the regression models.
 They assumed that the error term (ui) follow normal distribution with mean zero and
constant variation, var(u t )   2 , and that the
 Errors corresponding to different observation are uncorrelated, cov(u t , u s )  0 (for t  s)
and
 In multiple regression, no perfect correlation between the independent variables.
Now, we address the following ‘what if’ questions in this chapter.
 What if the error variance is not constant over all observations?
 What if the different errors are correlated?
 What if the explanatory variables are correlated?
4.1 Heteroscedasticity
4.1.1 The nature of Heteroscedasticty
In linear regression model
 The distribution of the disturbance term remains same over all observations of X; i.e. the
variance of each u i is the same for all the values of the explanatory variable.
Symbolically,
var(u i )  u i   (u i )    (u i2 )   u2 ; constant value.
2

 This feature of homogeneity of variance (or constant variance) is known as


homoscedasticity.
 If the disturbance terms do not have the same variance (non-constant variance) or non-
homogeneity of variance is known as heteroscedasticity. Thus, we say that U’s are
heteroscedastic when:
var(u i )   u2 (a constant) but
var(u i )   ui2 (a value that varies)
 The assumption of homoscedasticity states that the variation of each u i around its zero
mean does not depend on the value of explanatory variable.
Mathematically,  u2 is not a function of X; i.e.  2  f ( X i )
 If  u2 depends on the value of X (i.e  ui2  f ( X i ) ) the dispersion of hetroskedastic error
term(variance) will
i) increase with X
ii) greater in the middle range of X’s, tapering off toward the extremes.

150 | P a g e
iii) greater for low values of X.

 Its pattern depend on the signs and values of the coefficients of the relationship and take
the forms:
i.  ui2  K 2 ( X i2 )
ii.  2  K 2 (X i )
K
iii. etc.
 ui2 
Xi
4..1.4 Examples of Heteroscedastic functions
a) Consumption Function:
Suppose: C i     Yi  U i ;
where: C i  Consumption expenditure of ith household; Yi  Disposable income of ith
household
- At low levels of income, the average consumption is low, the variation is low.
- At high incomes the u' s will be high, while at low incomes the u' s will be small.
Therefore, the assumption of constant variance of u' s is does not hold when estimating
the consumption function from a cross section of family budgets.
b) Production Function:
Suppose the production function X  f ( K , L) .
-
Disturbance terms stand for many factors; like entrepreneurship, technological
differences, selling and purchasing procedures, differences in organizations, etc.
other than inputs, labor (L) and capital (K) considered in the production function.
These factors show considerable variance in large firms than in small ones. This
leads to breakdown of our assumption on homogeneity of variance terms.
4.1.5 Reasons for Hetroscedasticity
There are several reasons why the variances of u i may be variable. Some of these are:
a) Error learning model: it states that as people learn their error of behavior become
smaller over time. In this case  i2 is expected to decrease.
Example: as the number of hours of typing practice increases, the average number of
typing errors and as well as their variance decreases.
b) As data collection technique improves,  ui2 is likely to decrease.
c) Heteroscedasticity can also arise as a result of the presence of outliers.
- Observation that is much different (either very small or very large) in the sample.
d) Heteroscedasticity can arise from violating the assumption that linear regression model is
correctly specified.
e) Heteroscedasticity can arise due to skewness in one or more regressors in the model
- uneven distribution of data of the regressors
- Examples: distribution of income and wealth in most societies is uneven, being
owned by a few at the top.
f) Heteroscedasticity can also arise because of: incorrect data transformation and incorrect
functional form (e.g., linear versus log–linear models).

151 | P a g e
4.1.6 Consequences of Hetroscedasticity for the Least Square estimators
Heteroscedasticity has the following consequences:
i) The OLS estimators will have no bias (unbiased)
ii) Variance of OLS estimators will be incorrect
iii) OLS estimators shall be inefficient:
- the OLS estimators do not have the smallest variance in the class of unbiased
estimators both in small and large samples.
- i.e., the OLS estimator is linear, unbiased and consistent, but it is inefficient. .
- Under the heteroscedastic assumption:
- As a result the true s.e( ˆ ) shall be underestimated.
- the t-test and F-test associated with it will be overestimated which might lead to the
conclusion that in a specific case at hand ˆ is statistically significant (which in fact may
not be true).
- Our inference and prediction about the population coefficients would be incorrect.
- In the presence of heteroscedasticity, the BLUE estimators are provided by the method of
weighted least squares(WLS).
4.1.7 Detecting Heteroscedasticity
- Methods of testing heteroscedasticity. These are:
i. Informal method
ii. Formal method
i. Informal method
- It is a test based on the nature of the graph.
- When there exists a systematic relation between residual squared ei2 and the mean value
of Y i.e. (Yˆ ) or with X , heteroscedasticity in the data.
i

- When there is no systematic pattern between the two variables, may suggest no
hetroscedasticity is present in the data.

ii. Formal methods


The following are formal methods of testing heteroscedasticty.
a. Park test
Suggests that the variance of �� (  i2 ) is a function of the explanatory variable X i .
The functional form he suggested was:  i2   2 X i e vi
Or ln  i2  ln  2   ln X i  vi       3.14
where vi is the stochastic disturbance term.
Using û2� as a proxy for  i2 , and running the following regression.
lnûi2  ln  2   ln X i  vi
lnûi2     ln X i  vi        3.15 since  2 is constant.
 Testing whether there is a significant relation between X i and ûi2.
Let H 0 :   0 vs

152 | P a g e
H1 :   0
- If  is statistically significant, it would suggest hetroscedasticity in the data.
- If  is insignificant, we may accept the assumption of homoscedasticity.
- The park test is thus a two-stage test procedure;
i) run OLS regression disregarding the heteroscedasticity to obtain ûi2. and
ii) then run the regression in equation (3.15) above.
Example: Suppose that from a sample of size n=100 we estimate the relation between
compensation and productivity.
Y  1992 .342  0.2329 X i  ui        3.16
SE  (936 .479 ) (0.0098)
t (2.1275) (2.333) R 2  0.4375
 The residual obtained from (3.16) were regressed on X i gives the following result.
ln ui2  35.817  2.8099 ln X i  vi      (3.17)
SE  (38.319) ( 4.216)
t  (0.934) ( 0.667 ) R 2  0.0595
 The above result revealed that:
- the slope coefficient is statistically insignificant implying there is no statistically
significant relationship between the two variables.
- we may conclude that there is no heteroscedasticity in the error variance.
 Goldfeld and Quandt have argued the park test in that the error term vi entering into the
equation ln ui2     ln X i  vi may itself be heteroscedastic.
b. Glejser test:
 Glejser suggested regressing the absolute value of U i on the X i variable that is thought to
be closely associated with  i2 .
 In his experiment, Glejser use the following functional forms:
1
ui     X i  vi , ui      vi
Xi
ui     X i  vi , ui     X i  vi ; where vi is error term.
1
ui      vi , ui     X i2  vi
Xi
Goldfeld and Quandt point out problems with Glejser test is that:-
Expected value of error term vi is non-zero, i.e. it is serially correlated and heteroscedastic.
Models such as: ui     X i  vi and ui    X i2  vi are non-linear in parameters
and therefore cannot be estimated with the usual OLS procedure.
Glejser has found that for large samples the first four preceding models give generally
satisfactory results in detecting heteroscedasticity.

153 | P a g e
c. Goldfield-Quandt test
 This method is applicable if the heteroscedastic variance  i2 is positively related to one of the
explanatory variables in the regression model.
For simplicity, consider the usual two variable models:
Yi     i X i  U i
 Suppose 2� is positively related to �� as:
 i2   2 X i2        3.18; where  2 is constant.
- In the above equation  i2 would be larger, for larger values of X i . If that turns out to be
the case, heteroscedasticity is most likely to be present in the model.
 To test this explicitly, Goldfeld and Quandt suggest the following steps:
Step 1: Order the observations according to the values of X i beginning with the lowest value.
Step 2: Omit C central observations where C is specified a priori, and divide the remaining (n-c)
( n  c)
observations into two groups each of observations
2
( n  c)
Step 3: Fit separate OLS regression to the first observations and the last
2
( n  c)
observations, and obtain the respective residual sums of squares RSS, RSS1 and RSS2,
2
RSS1 representing RSS from the regression corresponding to the smaller X i values (the small
variance group) and RSS2 that from the larger X i values (the large variance group).
(n  c) (n  c  2 K )
 These RSS each have  K or df , where: K is the number of
2 2
parameters to be estimated, including the intercept term; and df is the degree of freedom.
RSS 2 / df
Step 4: compute  
RSS1 / df
  follows F distribution with numerator and denominator df each (n-c-2k)/2.
RSS 2 /(n  c  2 K ) / 2
 ~ F (n -c) (n -c) 
RSS1 /(n  c  2 K ) / 2 
 2
K ,
2
K 

 If the computed  ( F ) > Fcritical at the chosen level of significance,


- then reject the hypothesis of homoscedasticity, (heteroscedasticity is very likely).
Example: From a data on consumption expenditure in relation to income for a cross-section of
30 families. Suppose that consumption expenditure is linearly related to income but that
heteroscedasticity is present in the data. By dropping the middle four observations, the OLS
regression based on the first 13 and the last 13 observations and their associated sums of squares
of residuals are shown next (standard errors in parentheses).
Regression based on the first 13 observations

154 | P a g e
Yi  3.4094  0.6968 X i  ei
(8.7049) (0.0744)
R 2  0.8887
RSS1  377.17
df  11
Regression based on the last 13 observations
Yi  28.0272  0.7941 X i  ei
(30.6421) (0.1319)
R 2  0.7681
RSS 2  1536.8
df  11
RSS 2 / df 1536.8 / 11
 
From these results we obtain: RSS1 / df 377.17 / 11
  4.07
 �� ���ℎ �� �� (11,11)�� �ℎ� 5% ����� �� 2.82.
 Since ���� ( =) > �0.05 (11,11),
 we may conclude that there is heteroscedasticity in the error variance.
Spearman’s rank correlation test, Breusch-Pagan-Godfrey test and White’s General
Heteroscedasticity test are also another test methods heteroscedasticity.

4.1.8 Remedial measures for the problems of heteroscedasticity


Knowing the consequences of heteroscedasticity, it may be necessary to seek remedial measures.
The problem here is that we do not know the true heteroscedastic variances, 2� , for they are
rarely observed. If we could observe them, then we could obtain BLUE estimators by dividing
each observation by the (heteroscedastic) � and estimate the transformed model by OLS. This
method of estimation is known as the method of Weighted (Generalized) least squares(WLS).
Unfortunately, the true 2� is rarely known. Then what is the solution?
In practice, we make educated guesses about what 2� might be and transform the original
regression model in such a way that in the transformed model the error variance might be
homoscedastic. In short GLS is OLS on the transformed variables that satisfy the standard least
squares assumptions. The estimators thus obtained are known as GLS estimators, and it is these
estimators that are BLUE.
Some of the transformations used in practice are as follows:
1) If the true error variance is proportional to the square of one of the regressors, we can
divide both sides of Equation by that variable and run the transformed regression. We
then subject this regression to heteroscedasticity tests, such as the BP and White tests. If

155 | P a g e
these tests indicate that there is no evidence of heteroscedasticity, we may then assume
that the transformed error term is homoscedastic.
Example: Suppose the heteroscedasticity is of the form
(u i ) 2   i2  k 2 X i2 , the transforming variable is X 2  X if
Y    X i  U i where var(u i )   i2  K i2 X i2 .
Y  X i U i  U
The transformed model is:       i
Xi Xi Xi Xi Xi Xi
2) If the true error variance is proportional to one of the regressors, we can use the so-called
square transformation, that is, we divide both sides of equation by the square root of the
chosen regressor. We then estimate the regression thus transformed and subject that
regression to heteroscedasticity tests. If these tests are satisfactory, we may rely on this
regression.
Example: Suppose the heteroscedasticity is of the form : (u i2 )   i2  k 2 X i
The transforming variable is Xi
Y  X i U
The transformed model is:    i
Xi Xi Xi Xi
 U
   Xi  i
Xi Xi
3) The logarithmic transformation: sometimes, instead of estimating regression, we can
regress the logarithm of the dependent variable on the regressors, which may be linear or
in log form. The reason for this is that the log transformation compresses the scales in
which the variables are measured, thereby reducing a tenfold difference between two
values to a twofold difference.

4.2 Autocorrelation
4.2.1 The nature of Autocorrelation
The assumptions of the classicalist says that the cov(u i u j )  (u i u j )  0 which implies that
successive values of disturbance term U are temporarily independent, i.e. disturbance occurring
at one point of observation is not related to any other disturbance. This means that when
observations are made over time, the effect of disturbance occurring at one period does not carry
over into another period.
If the above assumption is not satisfied, that is, if the value of U in any particular period is
correlated with its own preceding value(s), we say there is autocorrelation of the random
variables. Hence, autocorrelation is defined as a ‘correlation’ between members of series of
observations ordered in time or space.

There is a difference between ‘correlation’ and autocorrelation. Autocorrelation is a special case


of correlation which refers to the relationship between successive values of the same variable,

156 | P a g e
while correlation may also refer to the relationship between two or more different variables.
Autocorrelation is also sometimes called as serial correlation but some economists distinguish
between these two terms. According to G.Tinner, autocorrelation is the lag correlation of a given
series with itself but lagged by a number of time units. The term serial correlation is defined by
him as “lag correlation between two different series.” Thus, correlations between two time series
such as u1 , u 2 .........u10 and u 2 , u 3 .........u11 , where the former is the latter series lagged by one time
period, is autocorrelation. Whereas correlation between time series such as u1 , u 2 .........u10 and
v2 , v2 .........v11 where U and V are two different time series, is called serial correlation.

4.2.2 Graphical representation of Autocorrelation


Since autocorrelation is correlation between members of series of observations ordered in time,
we will see graphically the trend of the random variable by plotting time horizontally and the
random variable (U i ) vertically.
Consider the following figures
The figures (a) –(d) above, show a cyclical pattern among the U’s indicating autocorrelation i.e.
figures (b) and (c) suggest an upward and downward linear trend and (d) indicates quadratic
trend in the disturbance terms. Figure (e) indicates no systematic pattern supporting non-
autocorrelation assumption of the classical linear regression model.
We can also show autocorrelation graphically by plotting successive values of the random
disturbance term vertically (ui) and horizontally (uj).
The above figures f and g similarly indicates us positive and negative auto- correlation
respectively while h indicates no autocorrelation.
In general, if the disturbance terms follow systematic pattern as in (f) and (g) there is
autocorrelation or serial correlation and if there is no systematic pattern, this indicates no
correlation.
42.3 Reasons for Autocorrelation
There are several reasons why serial or autocorrelation arises. Some of these are:

a. Cyclical fluctuations
Time series data such as GNP, PI, production, employment and unemployment exhibit business
cycle. When economic recovery starts, most of these series move upward, the value of a series at
one point in time is greater than its previous value.
 Thus, there is a ‘momentum’ built in to them, and it continues until something happens (e.g.
increase in interest rate or taxes or both) to slowdown them. Therefore, regression involving
time series data, successive observations are likely to be interdependent.
b. Specification bias
This arises because of the following.
i. Exclusion of variables from the regression model
ii. Incorrect functional form of the model
iii. Neglecting lagged terms from the regression model

157 | P a g e
Let’s see one by one how the above specification biases causes autocorrelation.
i. Exclusion of variables: The error term will show a systematic change when important
variable is excluded from the model.
For example, suppose the correct demand model is given by:
y t     1 x1t   2 x 2t   31 x3t  U t             3.21
where y  quantity of beef demanded, x1  price of beef, x2  consumer income,
x3  price of pork and t  time. Now, suppose we run the following
regression: y t     1 x1t   2 x 2t  Vt             ------3.22
running equation 3.22 when equation 3.21 is the ‘correct’ model or true relation,
letting Vt   3 x3t  U t . The error or disturbance term V will reflect a systematic pattern, thus
creating autocorrelation.
ii. Incorrect functional form of the model:
 It is also one source of the autocorrelation of error term.
Suppose the correct model in a cost-output study is as follows.
2
Marginal cost=  0  1output i   2 output i  U i             3.23
However, we incorrectly fit the following model.
M arg inal cos t i   1   2 output i  Vi -------------------------------3.24
The MC curve corresponding to the ‘true’ model is shown in the figure below along with the
‘incorrect’ linear cost curve.

 Between points A and B the linear marginal cost curve consistently over estimate the true
marginal cost; whereas,
 Outside these points it will consistently underestimate the true marginal cost. This result is
to be expected because the disturbance term Vi is, in fact, equal to (output)2+ ui, and hence
will catch the systematic effect of the (output)2 term on the marginal cost. In this case, Vi
will reflect autocorrelation because of the use of an incorrect functional form.
iii. Neglecting lagged term from the model: -
- If the dependent variable is to be affected by the lagged value of itself or the explanatory
variable and is not included in the model, the error term of the incorrect model will
reflect a systematic pattern which indicates autocorrelation in the model.

158 | P a g e
Suppose the correct model for consumption expenditure is:
Ct    1 yt   2 Ct 1  U t -----------------------------------3.25
 this is known as autoregression because one of the explanatory variables is the lagged value
of the dependent variable.
 The rationale for such a model is consumers do not change their consumption habits readily
for psychological, technological, or institutional reasons.
If we neglect the lagged term from the Equ/n, the resulting error term will reflect a systematic
pattern due to the influence of lagged consumption on current consumption.
4.2.5 The coefficient of autocorrelation
Autocorrelation is a kind of lag correlation between successive values of same variables. Thus,
we treat autocorrelation in the same way as correlation in general.
If the value of U in any particular period depends on its own value in the preceding period alone,
we say that U’s follow a first order autoregressive scheme AR(1) i.e.
u t  f (u t 1 ) . ---------------------------------------------------------- -------------3.28
 If �� depends on the values of the two previous periods, then:
 �� = �(��−1 , ��−2 ) this form of autocorrelation is called a second order AR scheme and so on.
Generally when autocorrelation is present, we assume simple first form of autocorrelation:
�� = �(��−1 , ��−2 ),
u t  u t 1  vt
--------------------------------------------3.30
where  the coefficient of autocorrelation and V is a random variable satisfying all the basic
assumption of OLS, (v)  0, (v 2 )   v2 and ( v i v j )  0 for i  j
The above relationship states the simplest possible form of autocorrelation; if we apply OLS on
the model given in ( 3.30) we obtain:
n

u u t t 1
̂  t 2
n
--------------------------------3.31
u
t 2
2
t 1

Given that for large samples: u t2  u t21 , we observe that coefficient of autocorrelation
 represents a simple correlation coefficient r.
n n n

u u t t 1 u u t t 1 u u t t 1
ˆ  t 2
n
 t 2
 t 2
 rut ut 1 (Why?)---------------------3.32
2
u t2 u t21
u  
2 n

t 2
t 1
  u 2 t 1 
 t 2 
 1  ˆ  1 since  1  r  1 ---------------------------------------------3.33
This proves the statement “we can treat autocorrelation in the same way as correlation in
general”. From statistics

159 | P a g e
 if r= 1 we call it perfect positive correlation,
 if r = -1 , perfect negative correlation and
 if r = 0 , there is no correlation.
By the same analogy:
- if ̂ = 1, perfect positive autocorrelation,
- if ̂ = -1, perfect negative autocorrelation and
- if  = 0, , no autocorrelation.
4.2.7 Effect of Autocorrelation on OLS Estimators.
If the error terms are correlated, the following consequences follow:
1. The OLS estimators are still unbiased and consistent.
2. They are still normally distributed in large samples.
3. But they are no longer efficient. That is, they are no longer BLUE (best linear unbiased
estimator). In most cases OLS standard errors are underestimated, which means the
estimated t-values are inflated, giving the appearance that a coefficient is more
significant than it actually may be.
4. As a result, as in the case of heteroscedasticity, the hypothesis-testing procedure becomes
suspect, since the estimated standard errors may not be reliable, even asymptotically (i.e.
in large samples). In consequence, the usual t- test and F- tests may not be valid.
i.e., If var( ˆ ) is underestimated, SE ( ˆ ) is also underestimated, this makes t-ratio large.
This large t-ratio may make ̂ statistically significant while it may not. Wrong
predictions and inferences about the characteristics of the population.
4.2.8 Detection (Testing) of Autocorrelation
There are two methods that are commonly used to detect the existence or absence of
autocorrelation in the disturbance terms. These are:
1. Graphic method
Detection of autocorrelation using graphs will be based on two ways.
a. Apply OLS to the given data whether it is auto correlated or not and obtain the error
terms. Plot et horizontally and et 1 vertically. i.e. plot the following observations
(e1 , e2 ), (e2 , e3 ), (e3 , e4 ).......( en , en 1 ) .
 If it is found that most of the points fall in quadrant I and III, as shown in fig (a) below,
we say that the given data is autocorrelated and the type of autocorrelation is positive
autocorrelation.
 If most of the points fall in quadrant II and IV, as shown in fig (b) below the
autocorrelation is said to be negative. But if the points are scattered equally in all the
quadrants as shown in fig (c) below, then we say there is no autocorrelation in the given
data.
2. Formal testing method
It is based on either the � − ����, � − ����, � − ���� �� �2 − ���� . If a test applies any of the
above, it is called formal testing method. The most frequently and widely used testing methods
by researchers are the followings:
A. Run test:

160 | P a g e
Run is the number of positive and negative signs of the error term arranged in sequence
according to the values of the explanatory variables, like “++++++++-------------
++++++++------------++++++”
If there are too many runs, it would mean the U' ˆ s change sign frequently, thus indicating
negative serial correlation. Similarly, if there are too few runs, they may suggest positive
autocorrelation.
Now let: n total number of observations  n1  n2
n1  number of + symbols; n2  number of – symbols; and k  number of runs
Under the null hypothesis that successive outcomes (here, residuals) are independent, and
assuming that n1  10 and n2  10 , the number of runs is distributed (asymptotically) normally
with:
2n1n2
Mean: ( k )  1
n1  n2
2n1 n 2 (2n1 n 2  n1  n 2 )
Variance:  k2 
(n1  n 2 ) 2 (n1  n 2  1)

Decision rule:
 Do not reject the null hypothesis of randomness or independence with 95% confidence.
- If (k )  1.96 k  k  1.96 k  ;
 Reject the null hypothesis
- If the estimated � lies outside these limits.
In a hypothetical example of n1  14, n2  18 and k  5 we obtain:
(k )  16.75,  k2  7.49395   k  2.7375
Hence the 95% confidence interval is:
16.75  1.96(2.7375)  11.3845,22.1155
since k=5, it clearly falls outside this interval. Therefore we can reject the hypothesis that the
observed sequence of residuals is random (are of independent) with 95% confidence.
B. The Durbin-Watson d test:
The most celebrated test for detecting serial correlation is one that is developed by statisticians
Durbin and Waston. It is popularly known as the Durbin-Waston d statistic, which is defined as:
t n

 (e t  et 1 ) 2
d t 2
t n
------------------------------------3.47
e
t 1
2
t

Note that, in the numerator of d statistic the number of observations is n  1 because one
observation is lost in taking successive differences.
It is important to note the assumptions underlying the d-statistics

161 | P a g e
1. The regression model includes an intercept term. If such term is not present, as in the case
of the regression through the origin, it is essential to rerun the regression including the
intercept term to obtain the RSS.
2. The explanatory variables, the �’�, are non-stochastic, or fixed in repeated sampling.
3. The disturbances U t are generated by the first order autoregressive scheme:
Vt  u t 1   t
4. The regression model does not include lagged value of the dependent variable as one of
the explanatory variables. Thus, the test is inapplicable to models of the following type
y t   1   2 X 2t   3 X 3t  .......   k X kt  ry t 1  U t
Where y t 1 the one period lagged value of y is such models are known as autoregressive
models. If d-test is applied mistakenly, the value of � in such cases will often be around 2,
which is the value of � in the absence of first order autocorrelation. Durbin developed the
so-called h-statistic to test serial correlation in such autoregressive.
5. There are no missing observations in the data.
In using the Durbin –Watson test, it is, therefore, to note that it cannot be applied in violation
of any of the above five assumptions.
t n

 (e t  et 1 ) 2
From equation 3.47 the value of d  t 2
t n

e
t 1
2
t

Squaring the numerator of the above equation, we obtain


n n

 et2   et21  2et et 1


d t 2 t 2
------------------3.48
et2
n n
However, for large samples  et2   et21 because in both cases one observation is lost.
t 2 t 2

Thus,
d  2(1  ˆ )
From the above relation, therefore
 ˆ  0, d  2

if  ˆ  1, d  0
 ˆ  1, d  4

Thus we obtain two important conclusions
i. Values of � ���� ������� 0 ��� 4
ii. If there is no autocorrelation ˆ  0, then d  2
Whenever, therefore, the calculated value of d turns out to be sufficiently close to 2, we accept
null hypothesis, and if it is close to zero or four, we reject the null hypothesis that there is no
autocorrelation.

162 | P a g e
However, because the exact value of d is never known, there exist ranges of values with in
which we can either accept or reject null hypothesis. We do not also have unique critical value
of � − ��������.
We have d L -lower bound and d u upper bound of the initial values of d to accept or reject the
null hypothesis.
For the two-tailed Durbin Watson test, we have set five regions to the values of d as depicted in
the figure below.

The mechanisms of the D-W test are as follows, assuming that the assumptions underlying the
tests are fulfilled.
 Run the OLS regression and obtain the residuals
 Obtain the computed value of � using the formula given in equation 3.47
 For the given sample size and given number of explanatory variables, find out critical d L
and d U values.
 Now follow the decision rules given below.
1. If d is less that d L or greater than (4  d L ) we reject the null hypothesis of no
autocorrelation in favor of the alternative which implies existence of autocorrelation.
2. If, d lies between d U and (4  d U ) , accept the null hypothesis of no autocorrelation
3. If however the value of d lies between d L and d U or between (4  d U ) and (4  d L ) , the
D-W test is inconclusive.
Example 1. Suppose for a hypothetical model: Y     X  U i ,if we found
d  0.1380 ; d L  1.37; d U  1.50
Based on the above values test for autocorrelation
Solution: First compute (4  d L ) and (4  d U ) and compare the computed value
of d with d L , d U , (4  d L ) and (4  d U )
(4  d L ) =4-1.37=2.63
(4  d U ) =4-1.5=2.50
since d is less than dL we reject the null hypothesis of no autocorrelation
Example 2. Consider the model Yt     X t  U t with the following observation on X and Y
X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11
Test for autocorrelation using Durbin -Watson method
Solution:
1. regress Y on X: i.e. Yt     X t  U t :
From the above table we can compute the following values.

163 | P a g e
xy  255, Y  7, (ei  et 1 ) 2  60.21
x 2  280, X  8, et2  41.767
y 2  274
xy 255
ˆ  2   0.91
x 280
ˆ  Y  ˆX  7  0.91(8)  0.29
Y  0.29  0.91X  U i
Yˆ  0.28  0.91X , R 2  0.85
(et  et 1 ) 2 60.213
d   1.442
et2 41.767
Values of d L and d U on 5% level of significance, with n=15 and one explanatory variable are:
d L =1.08 and d U =1.36.
(4  d u )  2.64
d U  d  4  d U  (1.364 2.64)
d *  1.442
Since d* lies between dU  d  4  dU , accept H0. This implies the data is not autocorrelated.
Although � − � ���� is extremely popular, the � − ���� has one great drawback in that if it falls
in the inconclusive zone or region, one cannot conclude whether autocorrelation does or does not
exist. Several authors have proposed modifications of the � − � ����.
In many situations, however, it has been found that the upper limit d U is approximately the true
significance limit. Thus, the modified DW test is based on d U in case the estimated d value lies
in the inconclusive zone, one can use the following modified d test procedure. Given the level of
significance  ; if
1.   0 versus H1 :   0 if the estimated d  d U , reject H0 at level  , that is there is
statistically significant positive correlation.
2. H 0 :   0 versus H 1 :   0 if the estimated d  d U or (4  d u )  d U reject H0 at level
2 statistically there is significant evidence of autocorrelation, positive or negative.
4.2.9 Remedial Measures for the problems of Autocorrelation
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measures. The remedy however depends on what knowledge one has about the
nature of interdependence among the disturbances. This means the remedy depends on whether
the coefficient of autocorrelation is known or not known.
A. when  is known- When the structure of autocorrelation is known, i.e  is known, the
appropriate corrective procedure is to transform the original model or data so that error term of

164 | P a g e
the transformed model or data is non auto correlated. When we transform, we are wippy of the
effect of  .
Suppose that our model is
Yt     X t  U t         3.49
and U t  U t 1  Vt , |  | 1     3.50
Equation (3.50) indicates the existence of autocorrelation. If  is known, then transform
Equation (3.49) into one that is not autocorrelated. The procedure of transformation will be given
below.
Take the lagged form of equation (1) and multiply through by  ..
y t 1
    X t 1  U t 1         3.51
Subtracting (3) from (1), we have:
Yt  Yt 1  (   )  (  X t   X t 1 )  (U t  U t 1 )      3.52
By rearranging the terms in (3.50), we have
Vt  U t  U t 1 which on substituting the last term of (3.52) gives
Yt  Yt 1  (   )   ( X t  X t 1 )  vt        3.53
Let: Yt *  Y  y t 1
a       (1   )
X t*  X t  X t 1
Equation (3.53) may be written as:
Yt *  a  BX t*  vt        (3.54)
It may be noted that in transforming Equation (3.49) into (3.54) one observation shall be lost
because of lagging and subtracting in (3.52). We can apply OLS to the transformed relation in
(3.54) to obtain ˆ and ˆ for our two parameters  and  .

ˆ  and it can be shown that
1 
2
 1 
var ˆ    var(aˆ )
1  
Because ̂ is perfectly and linearly related to â . Again since vt satisfies all standards
assumptions, the variance of ˆ and ˆ would be given by our standard OLS formulae.
 u2 X t2 *  u2
var(ˆ )  n
, var( ˆ )  n
n ( X t*  X ) 2  ( X *
t  X t* ) 2
ti

Estimators obtained in equation 6 are efficient, only if our sample size is large so that loss of one
observation becomes negligible.

165 | P a g e
B. When  is not known
When  is not known, we will describe the methods through which the coefficient of
autocorrelation can be estimated.
Method I: A priori information on 
Many times an investigator makes some reasonable guess about the value of autoregressive
coefficient by using his knowledge or institution about the relationship under study. Many
researchers usually assume that  =1 or -1.
Under this method, the process of transformation is the same as when  is known.
When  =1, the transformed model becomes;
(Yt  Yt 1 )  ( X t  X t 1 )  Vt ; where Vt  U t  U t 1
Note that the constant term is suppressed in the above. B̂ is obtained by taking merely the first
differences of the variable and obtaining line that passes through the origin. Suppose that one
assumes  =-1 instead of  =1, i.e the case of perfect negative autocorrelation. In such a case,
the transformed model becomes:
Yt  Yt 1 ( X t  X t 1 ) vt
Yt  Yt 1  2   ( X t  X t 1 )  vt Or   
2 2 2
This model is then called two period moving average regression model because actually we are
 Y  Yt 1   ( X  X t 1 ) 
regressing the value of one moving average  t  on another  t 
 2   2 
This method of first difference in quite popular in applied research for its simplicity. But the
method rests on the assumption that either there is perfect positive or perfect negative
autocorrelation in the data.
Method II: Estimation of  from d-statistic:
From equation ( 3.47 ), we obtained d  2(1  ˆ ) . Suppose we calculate certain value of d-
statistic in the case of certain data.
Given the d-value we can estimate  from this.
d  2(1  ˆ )
1
 ˆ  1  d
2
As already pointed out, ̂ will not be accurate if the sample size is small. The above
relationship is true only for large samples. For small samples, Theil and Nagar have suggested
the following relation:
n 2 (1  d 2 )  k 2
ˆ  ………………………………………………..3.55
n2  k 2

166 | P a g e
where n=total number of observation; d= Durbin Watson statistic ; k=number of coefficients
(including intercept term). Using this value of ̂ we can perform the above transformation to
avoid autocorrelation from the model.
Method III: The Cochrane-Orcutt iterative procedure: In this method, we remove
autocorrelation gradually starting from the simplest form of a first order scheme. First we obtain
the residuals and apply OLS to them;
et  et 1  vt …………………………………………………….3.56
We estimate ̂ from the above relation. With the estimated ̂ , we transform the original data
and then apply OLS to the model.
(Yt  ˆYt 1 )   (1  ˆ )   ( X t  ˆX t 1 )  Vt  ˆu t 1 ……………......…3.57
we once again apply OLS to the newly obtained residuals
et  et 1  wt
* *
……………………………………………………………3.58
We use this second estimate ̂ˆ to transform the original observations and so on we keep
proceeding until the value of the estimate of  converges. It can be shown that the procedure is
convergent. When the data is transformed only by using this second stage estimate of  , it is then
called two stages Cochrane-Orcutt method. However one can follow an alternative approach to
use at each step of interaction, the Durbin Watson d-statistic to residuals for autocorrelation or
till the estimates of  do not differ substantially from one another.
Method IV: Durbin’s two-stage method: Assuming the first order autoregressive scheme,
Durbin suggests a two-stage procedure for resolving the serial correlation problem. The steps
under this method are:
Given Yt     X t  u t ----------------------------------- (3.59)
U t  U t 1  vt
1. Take the lagged term of the above and multiply by 
Yt 1     X t 1  u t 1 --------------------------(3.60)
2. Subtract (3.60) from (3.59)
Yt  Yt 1   (1   )   ( X t  X t 1 )  u t  u t 1 ------(3.61)
3. Rewrite (3.61) in the following form
Yt   (1   )  Yt 1   X t   X t 1  vt
Yt   *  Yt 1   X t  X t 1  vt
This equation is now treated as regression equation with three explanatory variables
X t , X t 1 and Yt 1 . This provides estimate of  which is used to construct new variables
(Yt  ˆYt 1 ) and ( X t  ˆX t 1 ). In the second step, estimators of  and  are obtained from the
regression equation:
(Yt  ˆYt 1 )   *   ( X t  ˆX t 1 )  u t* ; where  *   (1   )

167 | P a g e
4.3 Multicollinearity
4.3.1 The nature of Multicollinearity
Multicollinearity meant the existence of a ‘perfect’ or ‘exact’, linear relationship among some or
all explanatory variables of a regression model. For regression model involving k-variable
explanatory variables x1 , x2 ,......, xk , an exact linear relationship is said to exist if the following
condition is satisfied.
1 x1  2 x2  .......  k xk  vi  0       (1)
where 1 ,  2 ,..... k are constants such that not all of them are simultaneously zero.
Note that:- multicollinearity refers only to linear relationships among the explanatory variables.
It does not rule out non-linear relationships among the x-variables.
For example: Y     1 xi   1 xi2   1 xi3  vi       (3.31)
Where: Y-Total cost and X-output.
The variables xi2 and xi3 are obviously functionally related to xi but the relationship is non-linear.
Strictly, therefore, models such (3.31) do not violate the assumption of no multicollineaity.
However, in concrete applications, the conventionally measured correlation coefficient will show
xi , xi2 and xi3 to be highly correlated, which as we shall show, will make it difficult to estimate
the parameters with greater precision (i.e. with smaller standard errors).
4.3.2 Reasons for Multicollinearity
1. The data collection method employed: Example: If we regress on small sample values of
the population; there may be multicollinearity but if we take all the possible values, it
may not show multicollinearity.
2. Constraint over the model or in the population being sampled.
For example: in the regression of electricity consumption on income (x1) and house size
(x2), there is a physical constraint in the population in that families with higher income
generally have larger homes than with lower incomes.
3. Overdetermined model: This happens when the model has more explanatory variables
than the number of observations. This could happen in medical research where there may
be a small number of patients about whom information is collected on a large number of
variables.
4.3.3 Consequences of Multicollinearity
Why does the classical linear regression model put the assumption of no multicollinearity among
the X’s? It is because of the following consequences of multicollinearity on OLS estimators.
1. If multicollinearity is perfect, the regression coefficients are indeterminate and their
standard errors are infinite.
Proof: - Consider a multiple regression model as follows:
y  ˆ x  ˆ x  e
i 1 1i 2 2i i

Recall the formulas of ˆ1 and ˆ 2 from our discussion of multiple regressions

168 | P a g e
 x 1 y  x 22   x 2 y  x 1 x 2
ˆ 1 
 x 12i  x 22  (  x 1 x 2 ) 2
 x 2 y  x 12   x 1 y  x 1 x 2
ˆ 1 
 x 12  x 22  (  x 1 x 2 ) 2
Assume x2  x1 ------------------------3.32
Where  is non-zero constants. Substitute 3.32in the above ˆ1 and ˆ 2 formula:
 x1 y ( x1 ) 2    x1 y x1 x1
ˆ 1 
 x 12i  (  x 1 ) 2  (  x 1  x 1 ) 2

2 2
 x1 y 2 x1    x1 y x1 0
   indeterminate.
 2 (  x 12 ) 2
  2 2
( x1 ) 2
0
Applying the same procedure, we obtain similar result (indeterminate value) for ˆ 2 .
Likewise, from our discussion of multiple regression model, variance of ˆ is given by : 1
2 2
 x
var(ˆ1 )  2

x x12  (x1 x 2 ) 2
2
1

Substituting x2  x1 in the above variance formula, we get:


 2 2 x12

2 (x12 ) 2  2 (x12 ) 2
 2 2 x12
    infinite.
0
These are the consequences of perfect multicollinearity. One may raise the question on
consequences of less than perfect correlation. In cases of near or high multicollinearity,
one is likely to encounter the following consequences.
2. If multicollinearity is less than perfect (i.e near or high multicollinearity), the regression
coefficients are determinate
Proof: Consider the two explanatory variables model above in deviation form.
If we assume x2  x1 it indicates us perfect correlation between x1 and x2 because
the change in x2 is completely because of the change in x1.Instead of exact multicollinearity,
we may have: x 2i  x1i  vt Where   0, vt is stochastic error term such that xi vi  0 . In
this case x2 is not only determined by x1,but also affected by some other variables given by
�� (stochastic error term).
Substitute x  x  v in the formula of ˆ above
2 1i t 1
2
x yx  x 2 yx1 x 2
ˆ1  1 2 2

x1i x  (x1 x 2 ) 2
2
2

169 | P a g e
x1 y2 x12  vi2  (y i x1i  y i vi )x12i 0
   determinate.
x 22i (2 x 22i  vi2 )  (x12 ) 2 0
This proves that if we have less than perfect multicollinearity the OLS coefficients are
determinate.
The implication of indetermination of regression coefficients in the case of perfect
multicolinearity is that it is not possible to observe the separate influence of x1 and x2 . But
such extreme case is not very frequent in practical applications. Most data exhibit less than
perfect multicollinearity.
3. If multicollineaity is less than perfect (i.e. near or high multicollinearity) , OLS estimators
retain the property of BLUE
- Hence, if the basic assumptions which are important to prove the BLUE property are not
violated whether multicollinearity exist or not ,the OLS estimators are BLUE .
4. Although BLUE, the OLS estimators have large variances and covariances.
 2 x 22
var( ˆ 2 )  2 2
x1 x 2  (x1 x 2 ) 2
1
Multiply the numerator and the denominator by 2
x2
1
 2 x 22 . 2
x 2 2
var( ˆ 2 )  
1
x x2
1
2
2  (x1 x 2 ) 2 .  2 x12 
(x1 x 2 ) 2
x 2 x12
2 2
 
 (x1 x 2 ) 2  x12  (1  r122 )
x12  1  2

2 
  x1  x1 

Where r122 is the square of correlation coefficient between x1 and x2 ,


If x  x  v , what happen to the variance of ˆ as r 2 is line rises.
2 1i i 1 12

As r12 tends to 1 or as collinearity increases, the variance of the estimator increase and in
the limit when r  1 variance of ˆ becomes infinite.
12 1

 r12 2
Similarly cov(  1 ,  2 )  . (why?)
(1  r122 ) x12 x12
As r12 increases to ward one, the covariance of the two estimators increase in absolute
value. The speed with which variances and covariance increase can be seen with the
variance-inflating factor (VIF) which is defined as:
1
VIF 
1  r122

170 | P a g e
VIF shows how the variance of an estimator is inflated by the presence of
multicollinearity. As r122 approaches 1, the VIF approaches infinity. That is, as the extent
of collinearity increase, the variance of an estimator increases and in the limit the variance
becomes infinite. As can be seen, if there is no multicollinearity between x1 and x 2 ,
VIF will be 1.
Using this definition we can express var(1 ) and var( ˆ 2 ) in terms of VIF

ˆ 2 ˆ 2
var(  1 )  2 VIF and var(  2 )  2 VIF
x1 x 2
Which shows that variances of ˆ1 and ˆ 2 are directly proportional to the VIF.
5. Because of the large variance of the estimators, which means large standard errors, the
confidence interval tend to be much wider, leading to the acceptance of “zero null
hypothesis” (i.e. the true population coefficient is zero) more readily.
6. Because of large standard error of the estimators, the computed t-ratio will be very small
leading one or more of the coefficients tend to be statistically insignificant when tested
individually.
7. Although the t-ratio of one or more of the coefficients is very small (which makes the
coefficients statistically insignificant individually), R2, the overall measure of goodness of fit,
can be very high.
Example: if y     1 x1   2 x 2  ....   k x k  vi
In the cases of high collinearity, it is possible to find that one or more of the partial slope
coefficients are individually statistically insignificant on the basis of t-test. But the R2 in
such situations may be so high, in such a case on the basis of � ���� one can convincingly
reject the hypothesis that  1   2       k  0 Indeed, this is one of the signals of
multicollinearity- insignificant � − ������ but a high overall �2 (i.e a significant
� − �����).
8. The OLS estimators and their standard errors can be sensitive to small change in the data.
4.3.4 Detection of Multicollinearity
A recognizable set of symptoms for the existence of multicollinearity are:
a. High coefficient of determination ( R2)
b. High correlation coefficients among the explanatory variables (rxi x j ' s)
c. Large standard errors and smaller t-ratio of the regression parameters
Note: None of the symptoms by itself are a satisfactory indicator of multicollinearity.
Because:
i) Large standard errors may arise for various reasons and not only because of the presence
so linear relationships among explanatory variables.
ii) A high rxi x j is only sufficient but not a necessary condition (adequate condition) for the
existence of multicollinearity because multicollinearity can also exist even if the
correlation coefficient is low.
However, the combination of all these criteria should help the detection of multicollinearity.

171 | P a g e
4.3.4.1 Test Based on Auxiliary Regressions:
Since multicollinearity arises because one or more of the regressors are exact or approximately
linear combinations of the other regressors, one may of finding out which X variable is related to
other X variables to regress each Xi on the remaining X variables and compute the corresponding
R2, which we designate as Ri2 , each one of these regressions is called auxiliary to the main
regression of Y on the X’s. Then, following the relationship between F and R2 established in
chapter three under over all significance, the variable,
R x21 , x2 , x3 ,... xk / k  2
Ri  ~ F( k  2, n  k 1)
1  R x21 , x2 , x3 ,... xk /( n  k  1)
where: - n is number of observation
- k is number of parameters including the intercept
If the computed F exceeds the critical F at the chosen level of significance, it is taken to mean
that the particular Xi collinear with other X’s; if it does not exceed the critical F, we say that it is
not collinear with other X’s in which case we may retain the variable in the model.
If Fi is statistically significant, we will have to decide whether the particular Xi should be
dropped from the model.
Note that according to Klieri’s rule of thumb, which suggest that multicollinearity may be a
troublesome problem only if R2 obtained from an auxiliary regression is greater than the overall
R2, that is obtained from the regression of Y on all regressors.
4.3.4.2 The Farrar-Glauber test - They use three statistics for testing mutlicollinearity there are
chi-square, F-ratio and t-ratio. This test may be outlined in three steps.
A. Computation of 2 to test orthogonalitly: two variables are called orthogonal if rxi x j  0 i.e.
if there is no any colinearity between them. In our discussion of in multiple regression models,
we have seen the matrix representation of a three explanatory variable model which is given by
 x12 x1 x 2 x1 x3 
 
x' x  x 2 x1 x 22 x 2 x 3 
x3 x1 x3 x 2 x32 

Divide each elements of x’x by xi x j and compute the determinant
 x12 x1 x 2 x1 x3 
 
 ( x 2 ) 2 x1 x 2 x12 x12 
1
 
x 2 x1 x 22 x 2 x 3 

 x 2 x 2 2
x 22 x32 
 1 2 (x 22 )
 x3 x1 x 3 x 2 x32 
 2 2

 x1 x1 x12 x32 (x32 ) 2 

172 | P a g e
 1 r12 r13 
 r12 1 r23 
 r13 r23 1 
The value of the determinant is equal to zero in the case of perfect multicollinearity. (since
r12 , r23  1)
On the other hand, it the case of orthogonality of the x’s, therefore rij  0 and the value of the
determinant is unity. It follows, therefore, that if the value of this determinant lies between zero
and unity, there exist some degree of multicollinearity. For detecting the degree of
multicollinearity over the whole set of explanatory variables, Glauber and Farrar suggests  2 to
test in the following way.
H 0 : x' s are orthogonal (i.e. rxi x j  0)
H1 : x' s are not orthogonal (i.e. rxi x j  1)
Glauber and Farrar have found that the quantity
 2 = -[ n –1 – 1/6(2k+5)] . log e {value of the standardized determinant}; has a  2 distribution
with 1/2k(k-1) df. If the computed 2 is greater than the critical value of 2, reject H0 in favour of
multicollinearity. But if it is less, then accept H0.

B. Computation of t-ratio to test the pattern of multicollinearity


The t-test helps to detect those variables which are the cause of multicollinearity. This test is
performed based on the partial correlation coefficients through the following procedure of
hypothesis.
H 0 : rxi x j . x1 , x2 , x3 ,...xk  0
H1 : rxi x j . x1 , x2 , x3 ,...xk  0
In the three variable model
2 (r12  r13 r23 ) 2
rx1x2 . x3  (How?)
(1  r232 )(1  r232 )
2
(r13  r12 r23 )
rx21 x3 . x3 
(1  r232 )(1  r122 )
(r23  r12 r13 ) 2
rx22 x3 . x1 
(1  r132 )(1  r122 )
(r 2 xi x j  x1 , x2 , x3 ,...xk ) n  k
t*  (How?)
(1  r 2 xi x j . x1 , x2 , x3 ,...xk )
if t*>t (tabulate), H0 is rejected

173 | P a g e
t*<t (tabulated), H0 is accepted, we accept Xi and Xj are not the cause of
multicollinearity since ( rxi x j is not significant)
4.3.4.3 Test of multicollinearity using Eigen values and condition index:
Using Eigen values we can drive a number called condition number K as follows:
max imum eigen value
k
Minimum eigen value
In addition using these value we can drive the condition index (CI) defined as
Max.eigen value
CI   k
min . eigen value
Decision rule: if K is between 100 and 1000 there is moderate to strong multicollinearity and if
it exceeds 1000 there is sever multicollinearity. Alternatively if CI(  k ) is between 10 and 30,
there is moderate to strong multicollinearity and if it exceeds 30 there is sever multicollinearity.
Example . If k=123,864 and CI=352 – This suggest existence of multicollinearity
4.3.4.4 Test of multicollinearity using Tolerance and variance inflation factor
2  1  2
var(ˆ1 )  2   VIF
x1  1  Ri2  xi2
where Ri2 is the R 2 in the auxiliary regression of Xj on the remaining (k-2) regressors and VIF is
variance inflation factor.
Some authors therefore use the VIF as an indicator of multicollinearity: The larger is the value
of VIFj, the more “troublesome” or collinear is the variable Xj. However, how high should VIF
be before a regressor becomes troublesome? As a rule of thumb, if VIF of a variable exceeds 10
(this will happens if Ri2 exceeds (0.9) the variable is said to be highly collinear.
Other authors use the measure of tolerance to detect multicollinearity. It is defined as
1
TOLi  (1  R 2j ) 
VIF
Clearly, TOLj =1 if Xj is not correlated with the other regressors, where as it is zero if it is
perfectly related to other regressors.

VIF (or tolerance) as a measure of collinearity, is not free of criticism. As we have seen earlier
 2 
var(ˆ ),    2 (VIF )  ; depends on three factors  2 , xi2 and VIF . A high VIF can be
 xi 
counter balanced by low  2 or high xi2 . To put differently, a high VIF is neither necessary
nor sufficient to get high variances and high standard errors. Therefore, high multicollinearity, as
measured by a high VIF may not necessary cause high standard errors.

4.3.5. Remedial measures

174 | P a g e
The following corrective procedures have been suggested if the problem of multicollinearity is
found to be serious.
1. Increase the size of the sample:
- Multicollinearity may be avoided or reduced if the size of the sample is increased.
- If the variables are collinear in the population, the procedure of increasing the size of the
sample will not help to reduce multicollinearity.
2. Introduce additional equation in the model:
- The addition of new equation transforms our single equation (original) model to
simultaneous equation model.
- The reduced form method (which is usually applied for estimating simultaneous
equation models) can then be applied to avoid multicollinearity.
3. Use extraneous information: –
- Extraneous information is the information obtained from any other source outside the
sample which is being used for the estimation.
- Extraneous information may be available from economic theory or from some empirical
studies already conducted.
- Three methods, through which extraneous information is utilized in order to deal with
the problem of multicollinearity.
4. Methods of transforming variables: This method is used when the relationship between
certain parameters is known as a priori

175 | P a g e
PART TWO

ECONOMETRICS TWO

CHAPTER ONE
1. Regression Analysis with Qualitative Information: Binary (or Dummy Variables)
1.1. Describing Qualitative Information
Qualitative factors often come in the form of binary information: a person is female or
male; a person does or does not own a personal computer; a firm offers a certain kind of
employee pension plan or it does not; a state administers capital punishment or it does not. In
all of these examples, the relevant information can be captured by defining a binary variable
or a zero- one variable. In econometrics, binary variables are most commonly called dummy
variables, although this name is not especially descriptive.
In defining a dummy variable, we must decide which event is assigned the value one and
which is assigned the value zero. For example, in a study of individual wage determination,
we might define female to be a binary variable taking on the value one for females and the
value zero for males.
The name in this case indicates the event with the value one. The same information is captured
by defining male to be one if the person is male and zero if the person is female. Either of
these is better than using gender because this name does not make it clear when the
dummy variable is one: does gender =1 correspond to male or female? What we call our
variables is unimportant for getting regression results, but it always helps to choose names
that clarify equations and expositions. Variables that assume such 0 and 1 values are called
dummy variables. Such variables are thus essentially a device to classify data into mutually
exclusive categories such as male or female. Dummy Variables usually indicates the
dichotomized “presence” or “absence”, “yes” or “no”, etc. Variables indicates a “quality” or
an attribute, such as “male” or “female”, “black” or “white”, “urban” or non - urban” ,
“before” or “after”, “North” or “south”, “East” or “West” , marital status, job category,
region, season, etc. We quantify such variables by artificially assigning values to them (for
example, assigning 0 and 1 to sex, where 0 indicates male and 1 indicates female), and use
them in the regression equation together with the other independent variables. Such variables
are called dummy variables. Alternative names are indicator variables, binary variables,
categorical variables, and dichotomous variables.
Dummy variables can be incorporated in regression models just as easily as quantitative
variables. As a matter of fact, a regression model may contain regressors that are all
exclusively dummy, or qualitative, in nature. Such models are called Analysis of Variance
(ANOVA) models. Regression models in most economic research involve quantitative

176 | P a g e
explanatory variables in addition to dummy variables. Such models are known as Analysis
of Covariance (ANCOVA) models.

1.2. Dummy as Independent Variables


1.2.1. A Single Dummy Independent Variable
How do we incorporate binary information into regression models? In the simplest case, with
only a single dummy explanatory variable, we just add it as an independent variable in the
equation.
For example, consider the following simple model of hourly wage determination:
���� = 0 + 0 � + 1 ���� + �� ,
�ℎ��� ���� = ���� ���� �� ������� ����������, ���� = ����� �� ���������
1 , �� ������
�=
0, ��ℎ������
In model (1.1), only two observed factors affect wage: gender and education. Since D=1
when the person is female, and D=0 when the person is male, the parameter  0 has the
following interpretation:  0 < 0 is the difference in hourly wage between females and
males, given the same amount of education (and the same error term u). Thus, the coefficient
0 determines whether there is discrimination against women: if 0 < 0 , then, for the same
level of other factors, women earn less than men on average.
In terms of expectations, if we assume the zero conditional mean assumption � � = 0, then
0 = � ���� � = 1, ���� − �( ���� � = 0, ���� )
The key here is that the level of education is the same in both expectations; the difference, 0 is
due to gender only. The situation can be depicted graphically as an intercept shift between males
and females. In Figure 1.1, the case 0 < 0 is shown, so that men earn a fixed amount more
per hour than women. The difference does not depend on the amount of education, and this
explains why the wage- education profiles for women and men are parallel.
At this point, you may wonder why we do not also include in (1.1) a dummy variable, say D,
which is one for males and zero for females. The reason is that this would be redundant.
In (1.1), the intercept for males is 0, and the intercept for females is 0 + 0 . Since there are
just two groups, we only need two different intercepts. This means that, in addition to 0 , we
need to use only one dummy variable; we have chosen to include the dummy variable for
females. Using two dummy variables would introduce perfect collinearity because female +
male = 1, which means that male is a perfect linear function of female. Including dummy
variables for both genders is the simplest example of the so- called dummy variable trap,
which arises when too many dummy variables describe a given number of groups.
Fig. 1.1
Model (1.1) contains one quantitative variable (level f education ) and one qualitative
variable (sex) that has two classes (or categories), namely, male and female. What is the
meaning of this equation? Assuming, as usual, that � �� = 0, we see that:

177 | P a g e
���� ������ �� ������ ������� ���������: � � � = 1, ����
= 0 + 0 + 1 ����……………………. 1.2
���� ������ �� ���� ������� ���������: �( � � = 0, ����) = 0 + 1����………………………………. . 1.3
If the assumption of common slopes is valid, a test of the hypothesis that the two regressions
(1.2) and (1.3) have the same intercept (i.e., there is no sex discrimination) can be made
easily by running the regression (1.2) and noting the statistical significance of the estimated
0 on th e basis of the traditional t test. If the t test shows that it is statistically
significant, we reject the null hypothesis that the male and female college professors’ levels
of mean annual salary are the same.
Before proceeding further, note the following features of the dummy variable regression
model considered previously.
1. To distinguish the two categories, male and female, we have introduced only one dummy
variable D. For example, if �� = 1 always denotes a male, when �� = 0 we know
that it is a female since there are only two possible outcomes. Hence, one dummy
variable suffices to distinguish two categories. The general rule is this: If a qualitative
variable has ‘m’ categories, introduce only ‘m - 1’ dummy variables. In our
example, sex has two categories, and hence we introduced only a single dummy
variable. If this rule is not followed, we shall fall into what might be called the
dummy variable trap, that is, the situation of perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D=1 for female and
D=0 for male.
3. The group, category, or classification that is assigned the value of 0 is often referred
to as the base, benchmark, control, comparison, reference, or omitted category . It is
the base in the sense that comparisons are made with that category.
4. The coefficient attached to the dummy variable D can be called the differential
intercept coefficient because it tells by how much the value of the intercept term of the
category that receives the value of 1 differs from the intercept coefficient of the base
category.
Example 1.1 (Effects of Computer Ownership on College GPA)
In order to determine the effects of computer ownership on college grade point average, we
estimate the model : ������ = 0 + 0 � + 1 ℎ���� + 2 ��� + �, where the dummy
variable D equals one if a student ow ns a personal computer and zero otherwise. There are
various reasons PC ownership might have an effect on colGPA. A student’s work might be of
higher quality if it is done on a computer, and time can be saved by not having to wait at a
computer lab. Of course, a student might be more inclined to play computer games or surf the
Internet if he or she owns a PC, so it is not obvious that 0 is positive. The variables hsGPA
(high school GPA) and ACT (achievement test score) are used as controls: it could be that
stronger students, as measured by high school GPA and ACT scores, are more likely to own
computers. We control for these factors because we would like to know the average effect on
colGPA if a student is picked at random and given a personal computer.

178 | P a g e
If the estimated model is
������ = 1.26 + 0.157� + 0.447ℎ���� + 0.0087���, �2 = 0.219
�� . 33 (. 057) (. 049) (. 0105)
This equation implies that a student who owns a PC has a predicted GPA about .16 point higher
than a comparable student without a PC (remember, both colGPA and hsGPA are on a four point
scale). The effect is also very statistically significant, with��� = 0.157 0.057≈2.75.

1.2.2 Using Dummy Variables for Multiple Categories


Suppose that, on the basis of the cross- sectional data, we want to regress the annual
expenditure on health care by an individual on the income and education of the
individual. Since the variable education is qualitative in nature, suppose we consider three
mutually exclusive levels of education: less than high school, high school, and college.
Now, unlike the previous case, we have more than two categories of the qualitative variable
education. Therefore, following the rule that the number of dummies be one less than the
number of categories of the variable, we should introduce two dummies to take care of the
three levels of education. Assuming that the three educational groups have a common slope
but different intercepts in the regression of annual expenditure on health care on annual
income, we can use the following model:
�� = 1 + 2 �2� + 3 �3� + �� + �� …………………………………………………1.4
Where Yi = annual expenditure on health care
Xi = annual expenditure
1, �� ℎ��ℎ ��ℎ��� ���������
�2 =
0, ��ℎ������
1, �� ������� ���������
�3 =
0, ��ℎ������
Note that in the preceding assignment of the dummy variables we are arbitrarily treating the
“less than high school education” category as the base category. Therefore, the intercept 0
will reflect the intercept for this category. The differential intercepts 1 and 2 tell by how
much the intercepts of the other two categories differ from the intercept of the base category,
which can be readily checked as follows: Assuming E(ui) = 0, we obtain from (1.4)
� �� �2 = 0, �3 = 0, �� = 1 + ��
� �� �2 = 1, �3 = 0, �� = (1 + 2 ) + ��
� �� �2 = 0, �3 = 1, �� = (1 + 3 ) + ��
which are, respectively the mean health care expenditure functions for the three levels of
education, namely, less than high school, high school, and college. Geometrically, the
situation is shown in fig 1.2 (for illustrative purposes it is assumed that(2 > 1 ).

Figure 1.2. Expenditure on health care in relation to income for three levels of education

179 | P a g e
1.2.3. Regression on One Quantitative Variable and Two Qualitative Variables
The technique of dummy variable can be easily extended to handle more than one qualitative
variable. Let us revert to the college professors’ salary regression (1.1), but now assume that in
addition to years of teaching experience and sex, the skin color of the teacher is also an
important determinant of salary. For simplicity, assume that color has two categories: black
and white. We can now write (1.1) as:
�� = 1 + 2 �2� + 3 �3� + �� + �� …………………………………………………1.5
Where Yi = annual salary
Xi = years of teaching experience
1, �� ������
�2 =
0, ��ℎ������
1, �� �ℎ���
�3 =
0, ��ℎ������
Notice that each of the two qualitative variables, sex and color, has two categories and hence
needs one dummy variable for each. Note also that the omitted, or base, category now is “black
female professor.”
Assuming E(u)= 0, we can obtain the following regression from (1.5)
���� ������ ��� ����� ������ ���������: � �� �2 = 0, �3 = 0, �� = 1 + ��
���� ������ ��� ����� ���� ���������: � �� �2 = 1, �3 = 0, �� = (1 + 2 ) + ��
���� ������ ��� �ℎ��� ������ ��������� : � �� �2 = 0, �3 = 1, �� = (1 + 3 ) + ��
���� ������ ��� �ℎ��� ���� ���������: � �� �2 = 1, �3 = 1, �� = (1 + 2 + 3 ) + ��
Once again, it is assumed that the preceding regressions differ only in the intercept
coefficient but not in the slope coefficient . An OLS estimation of (1.5) will enable us to test a
variety of hypotheses. Thus, if 3 is statistically significant, it will mean that color does affect
a professor’s salary. Similarly, if 2 is statistically significant, it will mean that sex also affects
a professor’s salary. If both these differential intercepts are statistically significant, it would
mean sex as well as color is an important determinant of professors’ salaries.
From the preceding discussion it follows that we can extend our model to include more than
one quantitative variable and more than two qualitative variables. The only precaution to be
taken is that the number of dummies for each qualitative variable should be one les s than
the number of categories of that variable.
Example 1.2
(Log Hourly Wage)
Let us estimate a model that allows for wage differences among four groups: married
men, married women, single men, and single women. To do this, we must select a base
group; we choose single men. Then, we must define dummy variables for each of the
remain ing groups. Call these D1 (married male), D2 (married female), and D3 (single female).
Putting these three variables into (1.1) and if the estimated equation gives the following result,
log ���� = 0.321 + . 213�1 − . 198�2 − . 110�2 + . 079���� + . 027���

180 | P a g e
�. � . 100 . 055 . 058 . 056 . 007 (. 005)
To interpret the coefficients on the dummy variables, we must remember that the base group is
single males. Thus, the estimates on the three dummy variables measure the proportionate
difference in wage relative to single males. For example, married men are estimated to
earn about 21.3% more than single men, holding levels of education and experience fixed. A
married woman, on the other hand, earns a predicted 19.8% less than a single man with the
same levels of the other variables.
1.2.4. Interactions Among Dummy Variables
Consider the following model:
�� = 1 + 2 �2� + 3 �3� + �� + �� ……………………………………………………. . 1.6
1 �� ������
Where Yi= annual expenditure �2 =
0, ��ℎ������
1, �� ������� ��������
Xi = annual income �3 =
0, ��ℎ������
Implicit in this model is the assumption that the differential effect of the sex dummy D2 is
constant across the two levels of education and the differential effect of the education
dummy D3 is also constant across the two sexes. That is, if, say, the mean expenditure on
clothing is higher for females than males this is so whether they are college graduates or
not. Likewise, if, say, college graduates on the average spend more on clothing than non-
college graduates, this is so whether they are female or males.
In many applications such an assumption may be untenable. A female college graduate may
spend more on clothing than a male college graduate. In other words, there may be interaction
between the two qualitative variables D2 and D3 . Therefore their effect on mean Y may
not be simply additive as in (1.6) but multiplicative as well, as in the following model:
�� = 1 + 2 �2� + 3 �3� + 4 �2� �3� + �� + �� …………………………………………. . 1.7
From (1.7 ) we obtain:
� �� �2 = 1, �3 = 1, �� = 1 + 2 + 3 + 4 + �� …………………. . 1.8
which is the mean clothing expenditure of graduate females. Notice that
2 = differential effect of being a female, 3 =differential effect of being a college graduate
4 =differential effect of being a female graduate which shows that the mean clothing
expenditure of graduate females is different (by 4 from the mean clothing expenditure of
females or college graduates. If 2 , 3 ��� 4 and are all positive, the average clothing
expenditure of females is higher (than the base category, which here is male no graduate), but
it is much more so if the females also happen to be graduates. Similarly, the average
expenditure on clothing by a college graduate tends to be higher than the base category but
much more so if the graduate happens to be a female. This shows how the interaction dummy
modifies the effect of the two attributes considered individually.
Whether the coefficient of the interaction dummy is statistically significant can be tested by the
usual t- test. If it turns out to be significant, the simultaneous presence of the two attributes will

181 | P a g e
attenuate or reinforce the individual effects of these attributes. Needless to say, omitting a
significant interaction term incorrectly will lead to a specification bias.

1.2.5. Testing for Structural Stability of Regression Models


Until now, in the models considered in this chapter we assumed that the qualitative variables
affect the intercept but not the slope coefficient of the various subgroup regressions. But
what if the slopes are also different? If the slopes are in fact different, testing for differences
in the intercepts may be of little practical significance. Therefore, we need to develop a
general methodology to find out whether two (or more) regressions are different, where
the difference may be in the intercepts or the slopes or both.
Suppose we are interested in estimating a simple saving function that relates domestic
household savings (S) with gross domestic product (Y) for Ethiopia. Suppose further that, at
a certain point of time, a series of economic reforms have been introduced. The hypothesis
here is that such reforms might have considerably influenced the savings- income relations
hip, that is, the relationship between savings and income might be different in the post
reform period as compared to that in the pre- reform period. If this hypothesis is true, then we
say a structural change has happened. How do we check if this is so?
Write the savings function as:
�� = 0 + 1 �� + 2 �� + 3 �� �� + �� …………………………. ……………. . 1.8
�ℎ��� � �� ℎ����ℎ��� ������ �� ���� �, � �� ��� �� ���� � ���:
0, �� ��� − ������( < 1991
�� =
1, �� ���� − ������( ≥ 1992
Here 3 is the differential slope coefficient indicating how much the slope coefficient of the
prereform period savings function differs from the slope coeffici ent of the savings function
in the post reform period. If 1 and 3 are both statistically significant as judged by the t- test,
then the pre-reform and post- reform regressions differ in both the intercept and the slope .
However, if only 1 is statistically significant, then the pre- reform and post- reform
regressions differ only in the intercept (meaning the marginal propensity to save (MPS) is
the same for pre- reform and post- reform periods). Similarly, if only 3 is statistically
significant, then the two regressions differ only in the slope (MPS).

Example: � = -20.76005 + 5.99916�� + 2.616285Yt - 0.5298177(Yt Dt )


SE (6.04) (6.4) (.57) (.6035149)
Since both 1 and 3 are both statistically insignificant, there is no difference between pre-
reform and after reform regression for the saving model.
1.3. Dummy as Dependent Variable
In the last several sections, we studied how, through the use of binary independent variables,
we can incorporate qualitative information as explanatory variables in a multiple regression
model. In all of the models up until now, the dependent variable Y has had quantitative
182 | P a g e
meaning (for example, Y is a dollar amount, a test score, a percent, or the logs of these).
What happens if we want to use multiple regression to explain a qualitative event?
In the simplest case, and one that often arises in practice, the event we would like to
explain is a binary outcome. In other words, our dependent variable, Y, takes on only two
values: zero and one. For example, Y can be defined to indicate whether an adult h as a high
school education; or Y can indicate whether a college student used illegal drugs during a
given school year; or Y can indicate whether a firm was taken over by another firm
during a given year. In each of these examples, we can let Y=1 denote one of the outcomes and
Y = 0 the other outcome.
There are several methods to analyze regression models where the dependent variable is binary.
The simplest procedure is to just use the usual OLS method. In this case the model is called the
linear probability model (LPM). The other alternative is to say that there is an underlying or
latent variable �∗ which we do not observe. What we observe is:
1, �� �∗ > 0
�= . This is the idea behind the logit and probit models.
0, ��ℎ������
1.3.1.The Linear Probability Models (LPM)
When we use a linear regression model to estimate probabilities, we call the model the
linear probability model. Consider the following model with when Y is a binary variable.
� = 0 + 1�1 + 2 �2 + …��� + �…………………………………………………………. 1.9
Since Y can take on only two values, � cannot be interpreted as the change in Y given a one-
unit increase in �� , holding all other factors fixed: Y either changes from zero to one or
from one to zero. Nevertheless, the � still have useful interpretations. If we assume that the
zero conditional mean assumption holds, that is,� � = 0, then we have, as always,
� � = 0 + 1 �1 + 2 �2 + …� �� = �
The key point is that when Y is a binary variable tak ing on the values zero and one, it is
always true that �( �=1 �)=�(� �): the probability of “success” —that is, the probability
that � = 1 − is the same as the expected value of Y. Thus, we have the important equation is
the same as the expected value of Y. Thus, we have the important equation
�( � = 1 �) = 0 + 1 �1 + 2 �2 + …��� = �………………………………. 1.10
which says that the probability of success, say �( �)=�(�=1 �)=�(� �) , is a linear
function of the Xj .
Equation (1.10) is an example of a binary response model, and P(Y= 1/X ) is also called
the response probability. Because probabilities must sum to one, � �=0 � = 1 − �( �=1 �),
(which is called the non - response probability) is also a linear function of the Xj.
The multiple linear regression model with a binary dependent variable is called the linear
probability model (LPM) because the response probability is linear in the parameters of �� .
In the LPM, � measures the change in the probability of success when �� changes, holding

183 | P a g e
other factors fixed. This is the usual linear regression model. This makes linear probability
models easy to estimate and interpret, but it also highlights some shortcomings of the LPM.
The drawbacks of this model are:
1. The right hand side of equation (1.9) is a combination of discrete and continuous
variables while the left hand side variable is discrete.
2. Usually we arbitrarily (or for convenience) use 0 and 1 for Y. If we use other values for
Y, say 3 and 4, β will also change even if the vector of factors X remains unchanged.
3. U assumes only two values: �� � = 1 �ℎ�� � = 1 − �, �� � = 0 �ℎ� � =− �
Consequently, u is not normally distributed but rather has a discrete (binary)
probability distribution defined.
4. It is easy to see that, if we plug in certain combinations of values for the
independent variables into (1.9), we can get predictions either less than zero or greater
than one. Since these are predicted probabilities, and probabilities must be between zero
and one, this can be a little embarrassing.
5. Due to problem 3, the variance u is heteroscedastic.
1.3.2. The Logit and Probit Models
The linear probability model is simple to estimate and use, but it has some drawbacks
that we discussed in Section 1.3.1. The two most important disadvantages are that the fitted
probabilities can be less than zero or greater than one and the partial effect of any
explanatory variable (appearing in level form) is constant. These limitations of the LPM can
be overcome by using more sophisticated binary response models.
In a binary response model, interest lies primarily in the response probability
� �=1 �)=�(�=1 �1 ,�2 ….�� ……………………………………. . 1.11
where we use X to denote the full set of explanatory variables. For example, when Y is
an employment indicator, X might contain various individual characteristics such as
education, age, marital status, and other factors that affect employment status, including a
binary indicator variable for participation in a recent job training program.
Specifying Logit and Probit Models
In the LPM, we assume that the response probability is linear in a set of parameters, � . To
avoid the LPM limitations, consider a class of binary response models of the form
�( � = 1 �) = �(0 + 1�1 + 2 �2 + …���) = �(�)………………………………. 1.10
where G is a function taking on values strictly between zero and one: 0 <G(z)< 1 , for all
real numbers z . This ensures that the estimated response probabilities are strictly between zero
and one.
As in earlier Econometrics I, we write � =0 + 1�1 + 2 �2 + …���
Various nonlinear functions have been suggested for the function G in order to make sure
that the probabilities are between zero and one. The two we will cover here are used in the vast
majority of applications (along with the LPM). In the logit model, G is the logistic function:
��
� � = exp (�) 1 + exp � = � = …………………………………………1.13
1 + ��

184 | P a g e
which is between zero and one for all real numbers z. This is the cumulative distribution
function (cdf) for a standard logistic random variable.
Here the response probability �( �=1 �) is evaluated as
��
� = �( � = 1 � ) =
1 + ��
Similarly, the non- response probability is evaluated as:
�� 1
1−�=� �=0 � =1− =
1 + ��
1+��
Note that the response and non- response probabilities both lie in the interval [0 , 1] , and
hence, are interpretable.
For the logit model, the ratio:
��
� �=1 � =

= 1 + �� = �� = �0+1�1+2�2+…+���
1−� � �=0 � = 1
1 + ��
is the ratio of the odds of Y=1 against Y=o. The natural logarithm of the odds (log - odds) is:

ln = 0 + 1 �1 + 2 �2 + …� ��
1−�
Thus, the log- odds is a linear function of the explanatory variables.
In the probit model , G is the standard normal cumulative distribution function (cdf),
which is expressed as an integral:

� � = � =  � �� …………………………………. . �. ��
−∞
Where  � �� ��� �������� ������ �������
−(�−)2
1 22
� = �
22 
�0
1 − �− 2/22
� � = �
−∞ 22 
The standard normal cdf has a shape very similar to that of the logistic cdf.

185 | P a g e
The estimating model that emerges from the normal CDF is popularly known as the probit
model, although sometimes it is also known as the normit model.
Note that both the probit and the logit models are estimated by Maximum Likelihood Estimation.
1.3.3. Interpreting the Probit and Logit Model Estimates
Given modern computers, from a practical perspective, the most difficult aspect of logit or
probit models is presenting and interpreting the results. The coefficient estimates, their
standard errors, and the value of the log- likelihood function are reported by all software
packages that do logit and probit, and these should be reported in any application.
The coefficients give the signs of the partial effects of each Xj on the response probability,
and the statistical significance of Xj is determined by whether we can reject H0:  j = 0 at a
sufficiently small significance level. However, the magnitude of the estimated parameters
(dZ/dX) has no particular interpretation. We care about the magnitude of dProb(Y)/dX. From
the computer output for a probit or logit estimation, you can interpret the statistical significance
and sign of each coefficient directly. Assessing magnitude is trickier.
Goodness of Fit Statistics
The conventional measure of goodness of fit, R2, is not particularly meaningful in binary
regressand models. Measures similar to R2, called pseudo R2, are available, and there are a
variety of them.
1. Measures based on likelihood ratios
Let LUR be the maximum likelihood function when maximized with respect to all the
parameters and LR be the maximum likelihood function when maximized with restrictions
j= 0.
�� 2
�2 = 1 − ( )�
���
2. Cragg and Uhler (1970) suggested a pseudo R2 that lies between 0 and 1.
2/� 2/�
� −�
�2 = �� 2/� � 2/�
(1 − �� )���
3. McFadden (1974) defined R as: 2

186 | P a g e
������
�2 = 1 −
log ��
4. Another goodness- of - fit measure that is usually reported is the so - called percent
correctly predicted , which is computed as follows. For each i, we compute the
estimated probability that Y i takes on the value one,��. If �� ≥ 0.5 the prediction of Yi
is unity, and if �� < 0.5, Yi is is predicted to be zero. The percentage of times the
predicted �� matches the actual Yi (which we know to be zero or one) is the percent
correctly predicted.
��. �� ������� �����������
����� �2 =
����� ��. �� ������������
Numerical Example:
This session shows an example of probit and logit regression analysis with Stata. The data
in this example were gathered on undergraduates applying to graduate school and includes
undergraduate GPAs, the reputation of the school of the undergraduate (a topnotch indicator),
the students' GRE score, and whether or not the student was admitted to graduate school.
Using this dataset, we can predict admission to graduate school using undergraduate GPA,
GRE scores, and the reputation of the school of the undergraduate. Our outcome variable is
binary, and we will use either a probit or a logit model. Thus, our model will calculate a
predicted probability of admission based on our predictors.

Iteration History - This is a listing of the log likelihoods at each iteration for the
probit/logit model. Remember that probit/logit regression uses maximum likelihood estimation,
which is an iterative procedure. The first iteration (called Iteration 0) is the log likelihood
of the "null" or "empty" model; that is, a model with no predictors. At the next iteration
(called Iteration 1), the specified predictors are included in the model. In this example, the
predictors are GRE, topnotch and GPA. At each iteration, the log likelihood increases
because the goal is to maximize the log likelihood. When the difference between successive
iterations is very small, the model is said to have "converged" and the iterating stops.
Log likelihood - This is the log likelihood of the fitted model. It is used in the Likelihood
Ratio Chi - Square test of whether all predictors' regression coefficients in the model are
simultaneously zero.

187 | P a g e
LR chi2(3) - This is the Likelihood Ratio (LR) Chi - Square test that at least one of the
predictors' regression coefficient is not equal to zero. The number in the parentheses indicates
the degrees of freedom of the Chi - Square distribution used to test the LR Chi - Square
statistic and is defined by the number of predictors in the model (3).
Prob > chi2 - This is the probability of getting a LR test statistic as extreme as, or more so,
than the observed statistic under the null hypothesis; the null hypothesis is that all of the
regression coefficients are simultaneously equal to zero. In other words, this is the
probability of obtaining this chi - square statistic or one more extreme if there is in fact no
effect of the predictor variables. This p- value is compared to a specified alpha level, our
willingness to accept a type I error, which is typically set at 0.05 or 0.01. The small p- value
from the LR test, 0.0001, would lead us to conclude that at least one of the regression
coefficients in the model is not equal to zero. The parameter of the chi - square distribution
used to test the null hypothesis is defi ned by the degrees of freedom in the prior line, chi2(3).
Pseudo R2 - This is McFadden's pseudo R- squared. Because this statistic does not mean
what Rsquare means in OLS regression (the proportion of variance of the response variable
explaine d by the predictors), it should be interpreted with great caution. The interpretation
of the coefficients can be awkward. For example, for a one unit increase in GPA, the log
odds of being admitted to graduate school (vs. not being admitted) increases by .667. For this
reason, many researchers prefer to exponentiate the coefficients and interpret them as
odds- ratios. Look at the following result.

Now we can say that for a one unit increase in GPA, the odds of being admitted to graduate
school (vs. not being admitted) increased by a factor of 1.94. Since GRE scores increase
only in units of 10, we can take the odds ratio and raise it to the 10th power, e.g. 1.00248 ^ 10
= 1.0250786, and say for a 10 unit increase in GRE score, the odds of admission to graduate
school increased by a factor of 1.025.

188 | P a g e
CHAPTER TWO
2. Introduction to Basic Regression Analysis with Time Series Data
2.1.The Nature of Time Series Data
Time series data are data collected for a single entity (person, firm, and country)
collected (observed) at multiple time periods. Examples :
 Aggregate consumption and GDP for a country (for example, 20 years of
quarterly observations= 80 observations)
 Birr/$, pound/$ and Euro/$ exchange rates (daily data for 2 years= 730 observations)
 Inflation rate for Ethiopia (quarterly data for 30 years = 120 observations )
 Gross domestic investment for Ethiopia (annual data for 40 years= 40 observations )
An obvious characteristic of time series data which distinguishes it from cross- sectional
data is that a time series data set comes with a temporal ordering. For example, in the
above data set mentioned in the examples, we must know that the data for 1970 immediately
precede the data for 1971. For analyzing time series data in the social sciences, we must
recognize that the past can affect the future, but not vice versa. To emphasize the proper
ordering of time series data, Table 2.1 gives a partial listing of the data on Ethiopia gross
capital formation (GCF) and gross domestic savings (GDS) both in million ETB since 1969 -
2010.
Another difference between cross- sectional and time series data is more subtle. In Chapters 2
and 3 of Econometrics I, we studied statistical properties of the OLS estimators based on the
notion that samples were randomly drawn from the appropriate population. Understanding

189 | P a g e
why cross sectional data should be viewed as random outcomes is fairly straightforward: a
different sample drawn from the population will generally yield different values of the
independent and dependent variables (such as education, experience, wage, and so on).
Therefore, the OLS estimates computed from different random samples will generally differ,
and this is why we consider the OLS estimators to be random variables.
How should we think about randomness in time series data? Certainly, economic time series
satisfy the intuitive requirements for bein g outcomes of random variables. For example,
today we do not know what the stock price will be at its close at the end of the next trading
day. We do not know what the annual growth in output will be in Ethiopia during the coming
year. Since the outcomes of these variables are not foreknown, they should clearly be viewed as
random variables. Formally, a sequence of random variables indexed by time is called a
stochastic process or a time series process. (“Stochastic” is a synonym for random.) When we
collect a time series data set, we obtain one possible outcome, or realization, of the
stochastic process. We can only see a single realization, because we cannot go back in time and
start the process over again. (This is analogous to cross- sectional analysis where we can
collect only one random sample.) However, if certain conditions in history had been
different, we would generally obtain a different realization for the stochastic process, and
this is why we think of time series data as the outcome of random variables.
The set of all possible realizations of a time series process plays the role of the population in
cross sectional analysis.
2.2. Stationary and non-stationary Stochastic Processes
2.2.1. Stochastic Processes
A random or stochastic process is a collection of random variables ordered in time. If we let
Y denote a random variable, and if it is continuous, we denote it as Y(t), but if it is
discrete, we denoted it as Yt . An example of the form er is an electrocardiogram, and an
example of the latter is GDP, PDI, GDI, GDS, etc. Since most economic data are collected at
discrete points in time, for our purpose we will use the notation Yt rather than Y(t). If we
let Y represent GDS, for our data we have Y1,Y 2,Y 3,...,Y39,Y40,Y41, where the subscript 1
denotes the first observation (i.e., GDS of 1969) and the subscript 41 denotes the last
observation (i.e., GDS of 2010).Keep in mind that each of these Y’s is a random variable.
Stationary Stochastic Processes
A type of stochastic process that has received a great deal of attention and analysis by time series
analysts is the so- called stationary stochastic process. Broadly speaking, a stochastic process
is said to be stationary if its mean and variance are constant over time and the value of
the covariance between the two time periods depends only on the distance or gap or lag between
the two time periods and not the actual time at which the covariance is computed. In the time
series literature, such a stochastic process is known as a weakly stationary, or covariance
stationary, or second-order stationary, or wide sense, stochastic process. For the purpose of this
chapter, and in most practical situations, this type of stationarity often suffices.
To explain weak stationarity, let Yt be a stochastic time series with these properties:
190 | P a g e
����: � �� =……………………………………………………………………1
��������: ��� �� = �(�� − )2 = 2 …………………………………………. . 2
����������: � = � �� − ��+� − …………………………………………. . 3
Where � , the covariance (or autocovariance) at lag k, is the covariance between the values
of Y tand Yt+k, that is, between two Y values k periods apart. If k=0, we obtain 0 , which is
simply the variance of Y(=σ2); if k=1, 1 is the covariance between two adjacent values of Y.
Suppose we shift the origin of Y from Yt to Yt+m (say, from 1969 to 1974 for our GDS data).
Now if Y t is to be stationary, the mean, variance, and autocovariances of Yt+m must be the
same as those of Yt . In short, if a time series is stationary, its mean, variance, and
autocovariance (at various lags) remain the same no matter at what point we measure them;
that is, they are time invariant . Such a time series will tend to return to its mean (called
mean reversion) and fluctuations around this mean (measured by its variance) will have a
broadly constant amplitude. If a time series is not stationary in the sense just defined, it is
called a nonstationar y time series(keep in mind we are talking only about weak
stationarity). In other words, a nonstationary time series will have a time- varying mean or a
time- varying variance or both.
Why are stationary time series so important? Because if a time series is nonstationary, we
can study its behavior only for the time period under consideration. Each set of time series
data will therefore be for a particular episode. As a consequence, it is not possible to generalize
it to other time periods. Therefore, for the purpose of forecasting, such (nonstationary) time
series may be of little practical value.
How do we know that a particular time series is stationary? In particular, is the time series
shown in Figure 2.1 stationary? We will take this important topic up in Sections 2.5,
where we will consider several tests of stationarity. But if we depend on common sense, it
would seem that the time series depicted in Figures 2.1 is nonstationary, at least in the
mean values

191 | P a g e
Before we move on, we mention a special type of stochastic process (or time series),
namely, a purely random, or white noise, process. We call a stochastic process purely random if
it has zero mean, constant variance σ2, and is serially uncorrelated. You may recall that the
error term u t , entering the classical normal linear regression mode l that we discussed in
Econometrics I was assumed to be a white noise process, which we denoted as u t ∼IID
N(0,σ2); that is, ut is independently and identically distributed as a normal distribution
with zero mean and constant variance.
Nonstationary Stochastic Processes
Although our interest is in stationary time series, one often encounters nonstationary time
series, the classic example being the random walk model (RWM). It is often said that asset
prices, such as stock prices or exchange rates, follow a random walk; that is, they are
nonstationary. We distinguish two types of random walks: (1) random walk without drift (i.e.,
no constant or intercept term) and (2) random walk with drift (i.e., a constant term is present).
Random Walk without Drift: Suppose ut is a white noise error term with mean 0 and
variance 2.Then the series Yt is said to be a random walk if:
�� = ��−1 + �� ……………………………………………………………………………. . 2.4
In the random walk model, as (2.4) shows, the value of Y at time t is equal to its value at time
(t−1) plus a random shock; thus it is an AR (1) model in the language of Chapter 4 of
Econometrics I. We can think of (2.4) as a regression of Y at time t on its value lagged one
period. Believers in the efficient capital market hypothesis argue that stock prices are
essentially random and therefore there is no scope for profitable speculation in the stock
market: If one could predict tomorrow’s price on the basis of today’s price, we would all be
millionaires.
Now from (2.4) we can write
�1 = �0 + �1
�2 = �1 + �2 = �0 + �1 + �2

192 | P a g e
�3 = �2 + �3 = �0 + �1 + �2 + �3
In general, if the process started at some time 0 with a value of Y0, we have
�� = �0 + �� ………………………………………………………. 2.5
Therefore,
�(�� ) = � �0 + �� = �0 ………………………………………………………. 2.6
In like fashion, it can be shown that
��� �� = �2 ………………………………………………………………………. 2.7
As the preceding expression shows, the mean of Y is equal to its initial, or starting, value,
which is constant, but as t increases, its variance increases indefinitely, thus violating a
condition of stationarity. In short, the RWM without drift is a nonstationary stochastic process.
In practice Y0 is often set at zero, in which case E (Yt ) =0.
An interesting feature of RWM is the persistence of random shocks (i.e., random errors),
which is clear from (2.5): Yt is the sum of initial Y0 plus the sum of random shocks. As a
result, the impact of a particular shock does not die away. For example, if u2= 2 rather than u2=0,
then all Yt ’s from Y 2 onward will be 2 units higher and the effect of this shock never dies
out. That is why random walk is said to have an infinite memory. The implication is that,
random walk remembers the shock forever; that is, it has infinite memory.
Interestingly, if you write (2.4) as
�� − ��−1 = ��� = �� ………………………………………………………………2.8
Where �is the first difference operator? It is easy to show that, while Yt is nonstationary, its first
difference is stationary. In other words, the first differences of a random walk time series
are stationary.
Random Walk with Drift. Let us modify (2.4) as follows:
�� = � + ��−1 + �� ………………………………………………………………2.9
Where � is known as the drift parameter. The name drift comes from the fact that if we write
the preceding equation as
�� − ��−1 = ��� = � + �� ………………………………………………………………2.10
it shows that Y t drifts upward or downward, depending on δ being positive or negative.
Note that model (2.9) is also an AR(1) model. Following the procedure discussed for random
walk without drift, it can be shown that for the random walk with drift model (2.9),
�(�� ) = �0 + �. � + �� ………………………………………………………………2.11
��� �� = �2 ……………………………………………………………………………2.12
As you can see, for RWM with drift the mean as well as the variance increases over time,
again violating the conditions of (weak) stationarity. In short, RWM, with or without drift,
is a nonstationary stochastic process. The random walk model is an example of what is
known in the literature as a unit root process .
Unit Root Stochastic Process
Let us write the RWM (2.4) as:

193 | P a g e
�� =��−1 + �� ; − 1 ≤ ≤ 1……………………………………………………………. 2.13
This model resembles the Markov first- order autoregressive model that we discussed on
autocorrelation. If ρ=1, (2.4.1) becomes a RWM (without drift). If ρ is in fact 1, we face
what is
known as the unit root problem, that is, a situation of nonstationary; we already know that
in this case the variance of Yt is not stationary. The name unit root is due to the fact that ρ=1.
Thus the terms nonstationary, random walk, and unit root can be treated as synonymous. If,
however, |ρ|<1, that is if the absolute value of ρ is less than one, then it can be shown that
the time series Yt is stationary in the sense we have defined it.
2.3. Trend Stationary and Difference Stationary Stochastic Processes
If the trend in a time series is completely predictable and not variable, we call it a deterministic
trend, whereas if it is not predictable, we call it a stochastic trend. To make the definition more
formal, consider the following model of the time series Yt.
�� = 1 + 2 � + 3 ��−1 + �� ………………………………………………………………….14
Where ut is a white noise error term and where t is time measured chronologically. Now we have
the following possibilities:
Pure random walk: If in (2.14) β1=0,β2=0,β3=1,we get
�� = ��−1 + �� ………………………………………………………………………………. 2.15
which is nothing but a RWM without drift and is therefore nonstationary. But note that, if we
write (2.15) as
��� = �� − ��−1 + �� ……………………………………………………………………. 2.16
it becomes stationary, as noted before. Hence, a RWM without drift is a difference stationary
process (DSP).
Random walk with drift: If in (2.14) 1 ≠ 0, 2 = 0, 3 = 1, we get
�� = 1 + ��−1 + �� ……………………………………………………………………. 2.17
which is a random walk with drift and is therefore nonstationary. If we write it as
�� − ��−1 = ��� = 1 + �� …………………………………………………………. 2.17�
this means Y t will exhibit a positive (β1>0) or negative (β1<0) trend. Such a trend is called a
stochastic trend. Equation (2.16a) is a DSP process because the nonstationarity in Yt can be
eliminated by taking first differences of the time series.
Deterministic trend: If in (2.14), 1 ≠ 0, 2 ≠ 0, 3 = 0we get
�� = 1 + 2 � + �� ……………………………………………………………………. 2.17�
which is called a trend stationary process (TSP).Although the mean of Yt is β1+β2t, which
is not constant, its variance (=σ2) is. Once the values of β1 and β2 are known, the mean can
be forecasted perfectly. Therefore, if we subtract the mean of Yt from Yt , the resulting
series will be stationary, hence the name trend stationary. This procedure of removing the
(deterministic) trend is called detrending.

194 | P a g e
Random walk with drift and deterministic trend: If in (2.14) 1 ≠ 0, 2 ≠ 0, 3 = 1 , we
obtain:
�� = 1 + 2 � + ��−1 + �� ……………………………………………………………………. 2.18 we have a
random walk with drift and a deterministic trend, which can be seen if we write this equation
as:
�� − ��−1 = ��� = 1 + 2 � + �� …………………………………………………………. 2.18�
which means that Yt is nonstationary.
Deterministic trend with stationary AR (1) component: If in (2.14) 1 ≠ 0, 2 ≠ 0, 3 < 1, we
obtain:
�� = 1 + 2 � + 3 ��−1 + �� ……………………………………………………………………. 2.19
which is stationary around the deterministic trend.
2.4. Integrated Stochastic Process
The random walk model is but a specific case of a more general class of stochastic
processes known as integrated processes. Recall that the RWM without drift is
nonstationary, but its first difference, as shown in (2.8), is stationary. Therefore, we call the
RWM without drift integrated of order 1, denoted as I(1). Similarly, if a time series has to be
differenced twice (i.e., take the first difference of the first differences) to make it stationary,
we call such a time series integrated of order 2. In general, if a (nonstationary) time series
has to be differenced d times to make it stationary, that time series is said to be integrated
of order d. A time series Y t integrated of order d is denoted as Yt ∼I(d). If a time series Yt is
stationary to begin with (i.e., it does not require any differencing), it is said to be integrated
of order zero, denoted by Yt ∼I(0).Thus, we will use the terms “stationary time series” and
“time series integrated of order zero” to mean the same thing .
Most economic time series are generally I(1); that is, they generally become stationary only
after taking their first differences.
Properties of Integrated Series
The following properties of integrated time series may be noted: Let Xt , Y t , and Z t be
three time series.
i. If Xt ∼I(0) and Yt ∼I(1),then Zt = (Xt + Yt ) = I(1); that is, a linear combination
or sum of stationary and nonstationary time series is nonstationary.
ii. If Xt∼I(d), then Zt = (a+bXt ) = I(d), where a and b are constants. That is, a
linear combination of an I(d) series is also I(d).
Thus, if Xt∼I(0), then Zt = (a+bXt )∼I(0).
iii. If Xt ∼ I(d1) and Yt ∼ I(d2), then Zt =(aXt + bYt )∼I(d2), where d1<d2.
iv. If Xt ∼I(d) and Yt ∼ I(d), then Z t =(aXt +bYt )∼I(d*); d* is generally equal to d, but
in some cases d*<d.
2.5. Tests of Stationarity: The Unit Root Test
A test of stationarity (or nonstationarity) that has become widely popular over the past
several years is the unit root test. The starting point is the unit root (stocha stic) process that we
discussed in Section 2.2. We start with (2.4.1)
195 | P a g e
�� = ��−1 + �� ; − 1 ≤ ≤ 1………………………………………………………………. 2.20
Where ut is a white noise error term.
We know that if ρ =1, that is, in the case of the unit root, (2.20) becomes a random walk
model without drift, which we know is a nonstationary stochastic process. Therefore, why
not simply regress Yt on its (one period) lagged value Yt−1 and find out if the estimated ρ is
statistically equal to 1? If it is, then Yt is nonstationary. This is the general idea behind the unit
root test of stationarity.
For theoretical reasons, we manipulate (2.20) as follows: Subtract Yt−1 from both sides of
(2.20) to obtain:
�� − ��−1 = ��−1 − ��−1 + �� ; …………………………………………………. 2.21
which can be alternatively written as:
��� =��−1 + �� ; …………………………………………………. 2.22
Where δ = (ρ−1) and , is the first- difference operator.
In practice, therefore, instead of estimating (2.20), we estimate (2.22) and test the (null)
hypothesis that δ = 0. If δ = 0, then ρ = 1, that is we have a unit root, meaning the
time series under consideration is nonstationary. Unfortunately, under the null hypothesis that
δ = 0 (i.e., ρ = 1), the t value of the estimated coefficient of Yt−1 does not follow the t
distribution even in large samples; that is, it does not have an asymptotic normal
distribution. Dickey and Fuller have shown that under the null hypothesis that δ = 0, the
estimated t value of the coefficient of Yt−1 in (2.22) follows the τ(tau) statistic. In the literature
the tau statistic or test is known as the Dickey–Fuller (DF) test, in honor of its discoverers.
To allow for the various possibilities, the DF test is estimated in three different forms, that is,
under three different null hypotheses.
�� �� ������ ����: ��� =��−1 + �� ; …………………………………………………. 2.23�
�� �� ������ ���� ���ℎ �����: ��� = 1 +��−1 + �� ; …………………………………. 2.23�
�� �� ������ ���� ���ℎ ����� ��� �����: ��� = 1 + 2 � +��−1 + �� ; ………………. 2.23�
Where t is the time or trend variable. In each case, the null hypothesis is that δ = 0; that is, there
is a unit root—the time series is nonstationary. The alternative hypothesis is that δ is less than
zero; that is, the time series is stationary. If the null hypothesis is rejected, it means that Yt is a
stationary time series.

The Augmented Dickey–Fuller (ADF) Test


In conducting the DF test as in (2.23a- c), it was assumed that the error term ut was uncorrelated.
But in case the ut are correlated, Dickey and Fuller have developed a test, known as the
Augmented Dickey–Fuller (ADF) test. This test is conducted by “augmenting” the preceding
three equations by adding the lagged values of the dependent variable ΔYt -i. To be specific,
suppose we use (2.23c). The ADF test here consists of estimating the following regression:
��� = 1 + 2 � +��−1 + � ���−1 + � ; ………………. 2.24

196 | P a g e
Where εt is a pure white noise error term and where ΔYt−1= (Y t−1−Yt−2), ΔYt−2 = (Yt−2−Yt−3 ),
etc. In ADF we still test whether δ = 0 and the ADF test follows the same asymptotic distribution
as the DF statistic, so the same critical values can be used.
To give a glimpse of this procedure, we estimated (2.24) for the GDP series using one
lagged difference of natural log of GDP of Ethiopia; the results were as follows :
����� =− 0.2095 + 0.0016� + 0.0197157����−1 + 0.0269�����−1
� = τ tau −0.28 0.67 0.27 0.15
1% �� =− 4.242 5%�� =− 3.540
The t(=τ) value of the GDP t−1 coefficient ( =δ) is 0.27, but this value in absolute terms is
much less than even the 1% and 5% critical τ value of −4.242 and - 3.540 respectively ,
again suggesting that even after taking care of possible autocorrelation in the error term, the
GDP series is nonstationary.
The Phillips–Perron (PP) Unit Root Tests
An important assumption of the DF test is that the error terms ut are independently and
identically distributed. The ADF test adjusts the DF test to take care of possible serial
correlation in the error terms by adding the lagged difference terms of the regressand. Phillips
and Perron use non parametric statistical methods to take care of the serial correlation in the
error terms without adding lagged difference terms.
The Phillips- Perron test involves fitting the following regression:
�� = 1 + 2� +��−1 + ��
Under the null hypothesis that ρ = 0, the PP Z(t ) and Z (ρ) statistics have the same
asymptotic distributions as the ADF t- statistic and normalized bias statistics. One advantage
of the PP tests over the ADF tests is that the PP tests are robust to general forms of
heteroscedasticity in the error term ut . Another advantage is that the user does not have to
specify a lag length for the test regression. Now let’s test for whether lnGDP is stationary or
not using PP test.

Phillips-Perron test for unit root Number of obs = 34


Newey-West lags = 3

Interpolated Dickey-Fuller
Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value

Z(rho) 2.343 -17.812 -12.788 -10.380


Z(t) 3.635 -3.689 -2.975 -2.619

MacKinnon approximate p-value for Z(t) = 1.0000

The above result shows that lnRGDP is nonstationary at level.

2.6 The Phenomenon of Spurious Regression

197 | P a g e
Consider the following two random walk models:
�� = ��−1 + ��
�� = ��−1 + ��
Where ut ∼N(0, 1)and vt ∼N(0, 1). We also assumed that ut and vt are serially uncorrelated as
well as mutually uncorrelated. As you know by now, both these time series are nonstationary;
that is, they are I(1) or exhibit stochastic trends.
Suppose we regress Yt on Xt. Since Yt and Xt are uncorrelated I(1) processes, the R2 from the
regression of Yon X should tend to zero; that is, there should not be any relationship between the
two variables.
�� = 13.26 + 0.337�� , �2 = 0.104, � = 0.0121
� = 0.62 (0.044)
The coefficient of X is highly statistically significant, and, although the R2 value is low, it is
statistically significantly different from zero. From these results, you may be tempted to
conclude that there is a significant statistical relationship between Y and X, whereas a priori
there should be none. This is in a nutshell the phenomenon of spurious or nonsense regression,
first discovered by Yule. That there is something wrong in the preceding regression is suggested
by the extremely low Durbin–Watson d value, which suggests very strong first-order
autocorrelation. According to Granger and Newbold, an R2 > d the estimated regression is
spurious. The R2 and the t statistic from such a spurious regression are misleading, and the t
statistics are not distributed as (Student’s) t distribution and, therefore, cannot be used for testing
hypotheses about the parameters.

2.7 Cointegration: Regression of a Unit Root Time Series


on Another Unit Root Time Series
We have warned that the regression of a nonstationary time series on another nonstationary time
series may produce a spurious regression. Suppose that �� ��� �� are time series data and
individually they both are I(1); that is, they contain a stochastic trend. It is quite possible that the
two series share the same common trend so that the regression of one on the other will not be
necessarily spurious.
�� = 1 + 2 �� + ��
�� = �� − 1 − 2 ��
Suppose we now subject �� to unit root analysis and find that it is stationary; that is, it is I(0).
The variables �� ��� �� are individually I(1), that is, they have stochastic trends, their linear
combination is I(0). So, the linear combination cancels out the stochastic trends in the two series.
In this case we say that the two variables are cointegrated. Economically speaking, two
variables will be cointegrated if they have a long-term, or equilibrium, relationship between them.

Error Correction Mechanism (ECM)

198 | P a g e
If the variables �� ��� �� are cointegrated, that is, there is a long-term, or equilibrium,
relationship between the two. Of course, in the short-run there may be disequilibrium. Therefore,
we can treat the error term in the following equation as the “equilibrium error.” And we can
use this error term to tie the short-run behavior of �� to its long-run value:
�� = �� − 1 − 2 �� − 3 �…………………………………………………………………2.25
If two variables Y and X are cointegrated, the relationship between the two can be expressed as
ECM.
Now consider the following model:
��� = 0 + 1 ��� + 2 ��−1 + � ………………………………………………………………. . 2.26
ECM equation (2.25) states that ��� depends on ��� and also on the equilibrium error term. If
the equilibrium error term is nonzero, then the model is out of equilibrium. Suppose, if ��� is
zero and ��−1 is positive, ��−1 is too high to be in equilibrium, that is, ��−1 is above its
equilibrium value of ( 0 + 1 ��−1 ). Since 2 is expected to be negative, the term 2 ��−1 is
negative and, therefore, ��� will be negative to restore the equilibrium. That is, if �� is above its
equilibrium value, it will start falling in the next period to correct the equilibrium error; hence
the name ECM. By the same token, if ��−1 is negative (i.e., �� is below its equilibrium value),
2��−1 will be positive, which will cause ��� to be positive, leading �� to rise in period t. Thus,
the absolute value of 2 decides how quickly the equilibrium is restored.
In practice, we estimate ��−1 by:
��−1 = �� − 1 − 2 �� − 3� , Note that the error correction coefficient 2 is expected to be
negative.
�������: ��� = 0.061 + 0.29��� − 0.122��−1 , �2 = 0.1658 , � = 2.15
� = 9.6753 6.2282 ( − 3.8461)
Statistically, the ECM term is significant, suggesting that �� adjusts to �� with a lag; only
about 12 percent of the discrepancy between long-term and short-term.

199 | P a g e
CHAPTER THREE
3. INTRODUCTION TO SIMULTANEOUS EQUATION MODELS
3.1. The Nature of Simultaneous Equation Models
In many situations, such a one- way or unidirectional cause- and- effect relationship is not
meaningful. This occurs if Y is determined by the X’s, and some of the X’s are, in turn,
determined by Y. In short, there is a two way, or simultaneous, relationship between Y and
(some of) the X’s, which makes the distinction between dependent and explanatory variables o f
dubious value. It is better to lump together a set of variables that can be determined
simultaneously by the remaining set of variables—precisely what is done in simultaneous-
equation models. In such models there is more than one equation—one for each of the
mutually, or jointly, dependent or endogenous variables. And unlike the single- equation
models, in the simultaneous- equation models one may not estimate the parameters of a single
equation without taking into account information provided by other equations in the system.
Example:
The classic example of simultaneous causality in economics is supply and demand. Both prices
and quantities adjust until supply and demand are in equilibrium. A shock to demand or supply
causes both prices and quantities to move. As is well known, the price P of a commodity and
the quantity Q sold are determined by the intersection of the demand - and- supply curves
for that commodity.
Figure 3.1. Interdependence of Price and Quantity
Thus, assuming for simplicity that the demand - and- supply curves are linear and adding
the stochastic disturbance terms u1 and u2, we may write the empirical demand- and- supply
functions as:
������ ��������: �� = 0 + 1 ��� + 2 �� + ��� , 1 < 0…………………. 3.1
������ ��������: �� = 0 + 1 ��� + �2� , 1 > 0………………………. . ……. 3.2
����������� ���������: �� = ��
Where ��� =quantity demanded, ��� = quantity supplied t = time and the  ’s and β’s are the
parameters.
Now it is not too difficult to see that P and Q are jointly dependent variables. If, for example, u1t
in (3.1) changes because of changes in other variables affecting ��� (such as income, wealth,
and tastes), the demand curve will shift upward if u1t is positive and downward if �1� is
negative. These shifts are shown in Figure 3.1. As the figure shows, a shift in the demand
curve changes both P and Q. Similarly, a change in u2t (because of strikes, weather, import
or export restrictions, etc.) will shift the supply curve, again affecting both P and Q.
Because of this simultaneous dependence between Q and P, u 1t and Pt in (3.1) and u2t
and Pt in (3.2) cannot be independent. Therefore, a regression of Q on P as in (3.1)
would violate an important assumption of the classical linear regression model, namely,
the assumption of no correlation between the explanatory variable(s) and the disturbance
term.

200 | P a g e
Definitions of Some Concepts
 The variables P and Q are called endogenous variables because their values are
determined within the system we have created.
 The income variable Y has a value that is given to us, and which is determined outside
this system. It is called an exogenous variable.
 Predetermined variables are exogenous variables, lagged exogenous variables and
lagged endogenous variables. Predetermined variables are non- stochastic and hence
independent of the disturbance terms.
 Structural models: A structural model describes the complete structure of the relations
hips among the economic variables. Structural equations of the model may be expressed
in terms of endogenous variables, exogenous variables and disturbances (random
variables).
 Reduced form of the model: The reduced form of a structural model is the model
in which the endogenous variables are expressed a function of the predetermined
variables and the error term only.
Example: The following simple Keynesian model of income determination can be considered as
a structural model.
� = +� + �, ���  > 0, 0 < > 1………………………………………………. . 3.3
� = � + �…………………………………………………………………………………3.4
where: C = Consumption expenditure
Z = non - consumption expenditure
Y = national income
C and Y are endogenous variables while Z is exogenous variable.
Reduced form of the model:
The reduced form of a structural model is the model in which the endogenous variables
are expressed a function of the predetermined variables and the error term only.
Illustration: Find the reduced form of the above structural model.
Since C and Y are endogenous variables and only Z is the exogenous variables, we have to
express C and Y in terms of Z. To do this substitute Y=C+Z into equation (3.4).
� = +(� + �) + �
� = +� +� + �
� −� = +� + �
�(1 −) = +� + �
  1
�= + �+ �…………………………………………………………..3.5
1− 1− 1−

Substituting again (3.5) into (3.4) we get;


 1 1
�= + �+ �…………………………………………………………..3.6
1− 1− 1−

201 | P a g e
Equation (3.5) and (3.6) are called the reduced form of the structural model of the above. We can
write this more formally as:
Structural form of Equation Reduced form of Equation
  1
� = +� + � � = + � + �
1 − 1 − 1 −
� = 01 + 11 � + �11
�=�+�  1 1
�= + �+ �
1 − 1 − 1 −
� = 02 + 12 � + �12
Parameters of the reduced form measure the total effect (direct and indirect) of a change
in exogenous variables on the endogenous variable

3.2. Simultaneity Bias


If one applies OLS to estimate the parameters of each equation disregarding other
equations of the model, the estimates so obtained are not only biased but also inconsistent;
i.e. even if the sample size increases indefinitely, the estimators d o not converge to their
true values.
The bias arising from application of such procedure of estimation which treats each
equation of the simultaneous equations model as though it were a single model is known as
simultaneity bias or simultaneous equation bias. It is useful to see, in a simple model, that an
explanatory variable that is determined simultaneously with the dependent variable is generally
correlated with the error term, which leads to bias and inconsistency in OLS. The two- way
causation in a relationship leads to violation of the important assumption of linear regression
model, i.e. one variable can be dependent variable in one of the equation but becomes also
explanatory variable in the other equations of the simultaneous- equation model. In this case
E[XiUi] may be different from zero. To show simultaneity bias, let’s consider the following
simple simultaneous equation model.
� = 0 + 1 � + � …………………..3.7
� = 0 + 1 � + 2� + �
Suppose that the following assumptions hold true.
� � =0 � � =0
� �2 = 2� � �2 = 2�
� �� �� = 0 � �� �� = 0, ��� � �� �� = 0
where X and Y are endogenous variables and Z is an exogenous variable. The reduced form of
X of the above model is obtained by substituting Y in the equation of X.
� = 0 + 1 (0 + 1 � + �) + 2 � + �.
0+01 2  �+�
�= + �+ 1 ………………………………………………….3.8
1−11 1−11 1−1 1

202 | P a g e
Applying OLS to the first equation of the above structural model will result in biased
estimator because ��� �� �� = �(�� �� ) ≠ 0.
Now, let’s proof whether this expression.
��� �� = �[{� − � � }{� − � � ]
� � − � � � ……………………………………………………………………. . 3.9
0 + 0 1 2 1 � + � 0 + 0 1 2
� + �+ − −
� � ………………. . 3.10
1 − 1 1 1 − 1 1 1 − 1 1 1 − 1 1 1 − 1 1
Substituting the value of X in equation (3.8) into equation (3.9)

� (0 − 01 + 2 � + � − 0 + 0 1 − 2 �
1 − 1  1

� (1 � + �
1 − 11
1
�(1 �2 + ��)
1 − 1 1
1 1 2�
� �2 = ≠ 0, ����� �(��) = 0
1 − 1 1 1 − 1 1
That is, covariance between X and U is not zero. As a consequence, if OLS is applied to each
equation of the model separately the coefficients will turn out to be biased. Now, let’s examine
how the non - zero co- variance of the error term and the explanatory variable will lead to
biasness in OLS estimates of the parameters.
If we apply OLS to the first equation of the above structural equation (3.3)
� = 0 + 1 � + �, �� ������
�� � �−� �� � �
� = = = , ����� �� ����
�2 �2 �2 �2
� 0 + 1 � + � 0 � 1 �� ��
= = + +
�2
�2 �2 �2
��
But, we know that � = 0 ��� �2
= 1, ℎ����,
��
� = 1 + ……………………………………………………………………………………11
�2
Taking the expected values on both sides;
��
�(�) = 1 + � �2
. Since, we have already proved that �( ��) ≠ 0 which is the same

as �(��) ≠ 0. Consequently, when � �� ≠ 0; �(� ) ≠ 1 , �ℎ�� 1 will be biased by the


��
amount equivalent to �2
.

3.3. The Identification Problem

203 | P a g e
In simultaneous equation models, the Problem of identification is a problem of model
formulation; it does not concern with the estimation of the model. The estimation of the model
depends up on the empirical data and the form of the model. If the model is not in the proper
statistical form, it may turn out that the parameters may not uniquely estimated even
though adequate and relevant data are available. In a language of econometrics, a model is said
to be identified only when it is in unique statistical form to enable us to obtain unique estimates
of its parameters from the sample data.
By the identification problem we mean whether numerical estimates of the parameters of a
structural equation can be obtained from the estimated reduced-form coefficients. If this can be
done, we say that the particular equation is identified. If this cannot be done, then we say that the
equation under consideration is unidentified, or under identified. An identified equation may be
either exactly (or fully or just) identified or over identified.

It is said to be exactly identified if unique numerical values of the structural parameters can be
obtained. It is said to be over identified if more than one numerical value can be obtained for
some of the parameters of the structural equations. The circumstances under which each of these
cases occurs will be shown in the following discussion. The identification problem arises
because different sets of structural coefficients may be compatible with the same set of data. To
put the matter differently, a given reduced-form equation may be compatible with different
structural equations or different hypotheses (models), and it may be difficult to tell which
particular hypothesis (model) we are investigating. In the remainder of this section we consider
several examples to show the nature of the identification problem.

Consider the demand-and-supply model, together with the market-clearing, or equilibrium,


condition that demand is equal to supply.
�� = α0+α1Pt +u1t……………………………..……………………………………………………………………………………3.12
Qs = β0+β1Pt +u2t …………………………………………………………………………….3.13
By the equilibrium condition, we obtain:
α0+α1Pt +u1t =β0+β1Pt +u2t
Solving this equation, we obtain the equilibrium price
�� = 0 + �� ……………………………………………………………………………….3.14
0−0 � −�
where 0 = , �� = 2� 1�
1−1 1−1
Substituting Pt from Eq. (3.14) into Eq. (3.13) or (3.12), we obtain the following equilibrium
quantity:
�� = 1 + �� …………………………………………………………………………………3.15
  −   � − �
Where 1 = 1 0 0 1 , �1 = 1 2� 1 1�
1−1 1 −  1

204 | P a g e
Equations (3.14) and (3.15) are reduced-form equations. Now our demand and supply model
contains four structural coefficients 0, 1 ,0 ���1 . These reduced-form coefficients contain all
four structural parameters, but there is no way in which the four structural unknowns can be
estimated from only two reduced-form coefficients.
There is an alternative way of looking at the identification problem. Suppose we multiply Eq.
(3.12) by λ (0≤λ≤1) and Eq. (3.13) by 1−λ to obtain the following equations:
�� = α0+α1Pt +u1t
(1-) Qs = (1-)β0+(1-)β1Pt +(1-)u2t
Adding these two equations gives the following linear combination of the original demand and
supply equations:
�� = 0 + 1 �� + �� , …... …………………………………………………………………….3.15
Where, 0 = 0 + (1 −)0
1 = 1 + (1 −)1
��= �1� +(1 −)u2t
The “mongrel,” equation (3.15) is observationally indistinguishable from either supply or
demand equation because they involve the regression of Q and P. For an equation to be identified,
that is, for its parameters to be estimated, it must be shown that the given set of data will not
produce a structural equation that looks similar in appearance to the one in which we are
interested. If we set out to estimate the demand function, we must show that the given data are
not consistent with the supply function or some mongrel equation.
A function (an equation) belonging to a system of simultaneous equations is identified if it
has a unique statistical form, i.e. if there is no other equation in the system, or formed by
algebraic manipulations of the other equations of the system, contains the same variables as the
function(equation) in question.
Identification problems do not just arise only on two equation- models. Using the above
procedure, we can check identification problems easily if we have two or three equations
in a given simultaneous equation model. However, for ‘n’ equations simultaneous equation
model, such a procedure is very cumbersome. In general for any number of equations in a given
simultaneous equation, we have two conditions that need to be satisfied to say that the
model is in general identified or not. In the following section we will see the formal
conditions for identification.

3.4. Order and Rank Conditions of Identification (without proof)


i. The order condition for identification
This condition is based on a counting rule of the variables included and excluded from the
particular equation. It is a necessary but not sufficient condition for the identification of an
equation. The order condition may be stated as follows. For an equation to be identified the total
number of variables (endogenous and exogenous) excluded from it must be equal to or greater
than the number of endogenous variables in the model less one. Given that in a complete model
205 | P a g e
the number of endogenous variables is equal to the number of equations of the model, the order
condition for identification is sometimes stated in the following equivalent form. For an equation
to be identified the total number of variables excluded from it but included in other equations
must be at least as great as the number of equations of the system less one.
Let: G = total number of equations (= total number of endogenous variables)
K= number of total variables in the model (endogenous and predetermined)
M= number of variables, endogenous and exogenous, included in a particular equation.
Then the order condition for identification may be symbolically expressed as:

(K-M )  ( G-1)

excluded

For example, if a system contains 10 equations with 15 variables, ten endogenous and five
exogenous, an equation containing 11 variables is not identified, while another containing 5
variables is identified.
a. For the first equation we have
G=10 , K=15, M= 11
Order condition:
(K-M) G-1  (15-11=4) < (10-1=9); that is, by the order condition it is not satisfied.
The order condition for identification is necessary for a relation to be identified, but it is not
sufficient, that is, it may be fulfilled in any particular equation and yet the relation may not be
identified.
ii. The rank condition for identification
The rank condition states that: in a system of G equations any particular equation is identified if
and only if it is possible to construct at least one non-zero determinant of order (G-1) from the
coefficients of the variables excluded from that particular equation but contained in the other
equations of the model. The practical steps for tracing the identifiably of an equation of a
structural model may be outlined as follows.
Firstly. Write the parameters of all the equations of the model in a separate table, noting that the
parameter of a variable excluded from an equation is equal to zero.
For example let a structural model be:
�1 = 3�2 − 2�1 + �2 + �1
�2 = �3 + �3 + �2
�3 = �1 − �2 + 2�3 + �3
where the y’s are the endogenous variables and the x’s are the predetermined variables. This
model may be rewritten in the form
�1 − 3�2 − 0�3 + 2�1 − �2 + 0�3 = �1
0�1 + �2 − �3 + 0�1 + 0�2 − �3 = �2

206 | P a g e
−�1 + �2 + �3 0�1 + 0�2 − 2�3 = �3
Ignoring the random disturbance the table of the parameters of the model is as follows:

Secondly. Strike out the row of coefficients of the equation which is being examined for
identification. For example, if we want to examine the identifiability of the second equation of
the model we strike out the second row of the table of coefficients.
Thirdly. Strike out the columns in which a non-zero coefficient of the equation being
examined appears. By deleting the relevant row and columns we are left with the
coefficients of variables not included in the particular equation, but contained in the other
equations of the model. For example, if we are examining for identification the second
equation of the system, we will strike out the second, third and the sixth columns of the above
table, thus obtaining the following tables.

Fourthly. Form the determinant(s) of order (G -1) and examine their value. If at least one of
these determinants is non-zero, the equation is identified. If all the determinants of order (G-1)
are zero, the equation is underidentified .
In the above example of exploration of the identifiability of the second structural equation we
have three determinants of order (G-1) =3-1=2. They are:

(the symbol Δ stands for ‘determinant’) We see that we can form two non-zero determinants of
order G-1=3-1=2; hence the second equation of our system is identified.
Fifthly. To see whether the equation is exactly identified or overidentified we use the order
condition (K-M)(G-1).With this criterion, if the equality sign is satisfied, that is if (K-M)=(G-1),
the equation is exactly identified. If the inequality sign holds, that is, if
(K-M)>(G-1), the equation is overidentified .

207 | P a g e
In the case of the second equation we have:
G=3 K=6 M=3
And the counting rule
(K-M)(G-1)gives (6-3) > (3-1). Therefore the second equation of the model is overidentified.
3.5 Estimation of Simultaneous Equations Models
1. Indirect Least Squares (ILS) Method
In this method, we first obtain the estimates of the reduced form parameters by applying
OLS to the reduced form equations and then indirectly get the estimates of the parameters of
the structural model. This method is applied to exactly identified equations.
Steps:
a. Obtain the reduced form equations (that is, express the endogenous variables in terms of
predetermined variables).
b. Apply OLS to the reduced form equations individually. OLS will yield consistent
estimates of the reduced form parameters (since each equation involves only non
stochastic (predetermined) variables that appear as ‘independent’ variables).
c. Obtain (or recover back) the estimates of the original structural coefficients using the
estimates in step (b).
2. Two-Stage Least Squares (2SLS) Method
The 2SLS procedure is generally applicable for estimation of over- identified equations as it
provides unique estimators.
Steps:
a) Estimate the reduced form equations by OLS and obtain the predicted �� .
b) Replace the right hand side endogenous variables in the structural equations by
c) the corresponding �� and estimate them by OLS.
Consider the following simultaneous equations model:
�1 = �1 + �1 �2 + �1 �1 + �2 �2 + �1………………………………. …………. . (�)
�2 = �2 + �2 �1 + �3 �3 + �2 ……………………………………………………. (�)
Where �1 and �2 are endogenous while�1 , �2 and �3 are predetermined.
The 2- SLS procedure of estimation of equation (b) (which is over- identified) is:
 We first estimate the reduced form equations by OLS; that is, we regress Y1 on �1 ,
�2 and �3 using OLS and obtain �� . We then replace Y1 by �� and estimate equation
(b) by OLS, that is, we apply OLS to: �2 = �2 + �2 �1 + �3�3 + �2

CHAPTER FOUR
4. INTRODUCTION TO PANEL DATA REGRESSION MODELS
4.1. Introduction
In panel data the same cross-sectional unit (say a family or a firm or a state) is surveyed over
time. In short, panel data have space as well as time dimensions.
Hypothetical examples:

208 | P a g e
 Data on 200 Ethiopian Somali regional state school in 2004 and again in 2005, for 400
observations total.
 Data on 9 regional states of Ethiopia, each state is observed in 5 years, for a total of 45
observations.
 Data on 1000 individuals, in four different months, for 4000 observations total.
There are other names for panel data, such as pooled data (pooling of time series and
cross sectional observations), combination of time series and cross -section data (cross -
sectional time -series data), and micro panel data, longitudinal data (a study over time of a
variable or group of subjects).
Why Should We Use Panel Data? Their Benefits and Limitations
Baltagi (2005) list several benefits from using panel data. These include the following.
1. Controlling for individual heterogeneity. Panel data allows you to control for variables
you cannot observe or measure like cultural factors or difference in business
practices across companies; or variables that change over time but not across entities
(i.e. national policies, federal regulations, international agreements, etc.). This is, it
accounts for individual heterogeneity. Time- series and cross- section studies not
controlling this heterogeneity run the risk of obtaining biased results.
2. Panel data give more informative data, more variability, less collinearity among
the variables, more degrees of freedom and more efficiency. Time - series studies are
plagued with multicollinearity.
3. Panel data are better able to study the dynamics of adjustment. Cross- sectional
distributions that look relatively stable hide a multitude of changes.
4. Panel data are better able to identify and measure effects that are simply not detectable in
pure cross- section or pure time- series data.
5. Panel data models allow us to construct and test more complicated behavioral models
than purely cross- section or time- series data. For example, technical efficiency is
better studied and modeled with panels.
6. Micro panel data gathered on individuals, firms and households may be more
accurately measured than similar variables measured at the macro level. Biases
resulting from aggregation over firms or individuals may be reduced or eliminated.
Limitations of panel data include:
1. Design and data collection problems. These include problems of coverage
(incomplete account of the population of interest), non response (due to lack of
cooperation of the respondent or because of interviewer error), recall (respondent not
remembering correctly), frequency of interviewing, interview spacing, reference
period, the use of bounding and time- in - sample bias.
2. Distortions of measurement errors. Measurement errors may arise because of
faulty responses due to unclear questions, memory errors, deliberate distortion of
responses (e.g. prestige bias), inappropriate informants, misrecording of responses
and interviewer effects .
3. Selectivity problems. These include:
(a) Self - selectivity. People choose not to work because the reservation wage is higher
than the offered wage. In this case we observe the characteristics of these individuals
but not their wage. Since only their wage is missing, the sample is censored.

209 | P a g e
However, if we do not observe all data on these people this would be a truncated
sample.
(b) Non response. This can occur at the initial wave of the panel due to refusal to
participate, nobody at home, untraced sample unit, and other reasons. Item (or partial)
non response occurs when one or more questions are left unanswered or are found
not to provide a useful response.
(c) Attrition. While non response occurs also in cross- section studies, it is a more
serious problem in panels because subsequent waves of the panel are still
subject to nonresponse. Respondents may die, or move, or find that the cost of
responding is high.
4. Short time- series dimension. Typical micro panels involve annual data covering a
short time span for each individual. This means that asymptotic arguments rely
crucially on the number of individuals tending to infinity. Increasing the time span
of the panel is not without cost either. In fact, this increases the chances of
attrition and increases the computational difficulty for limited dependent variable
panel data models.
5. Cross- section dependence. Macro panels on countries or regions with long time
series that do not account for cross- country dependence may lead to misleading
inference.
Notation for panel data
A double subscript is used to distinguish entities (states, family, country, individuals,
etc.) and time periods.
Consider the following simple panel data regression model:
��� = 0 + 1 �1�� + 2 �� + ��� …………………………………………4.1
i =1,…,n, T = 1,…,T
Where i = entity (state), n = number of entities, so i = 1,…,n ,t = time period (year, month,
quarter, etc.), T = number of time periods, so that t =1,…,T
Panel data with k regressors:
��� = 0 + 1 �1�� + 2 �2�� + …����� + ��� …………………………………………4.2

4.2.Estimation of Panel Data Regression Model


4.2.1. The Fixed Effects (Entity/Time Fixed) Approach
You may apply entity fixed effects regression when you want to control for omitted variables
that differ among panels but are constant over time. On the other hand, if there are unobserved
effects that vary across time rather than across panels, we apply time fixed effects regression
model .Use fixed- effects (FE) whenever you are only interested in analyzing the impact of
variables that vary over time. FE explores the relationship between predictor and outcome
variables within an entity (country, person, company, etc.). Each entity has its own individual
characteristics that may or may not influence the predictor variables (for example being a
male or female could influence the opinion toward certain issue or the political system of a
particular country could have some effect on trade or GDP or the business practices of a
company may influence its stock price).When using FE we assume that something within

210 | P a g e
the individual may impact or bias the predictor or outcome variables and we need to control
for this. This is the rationale behind the assumption of the correlation between entity’s error
term and predictor variables. FE remove the effect of those time- invariant characteristics from
the predictor variables so we can assess the predictors’ net effect.
Another important assumption of the FE model is that those time- invariant characteristics
are unique to the individual and should not be correlated with other individual
characteristics. Each entity is different therefore the entity’s error term and the constant
(which captures individual characteristics) should not be correlated with the others. If the
error terms are correlated then FE is no suitable since inferences may not be correct and you
need to model that relationship (probably using random - effects.
Entity-demeaned OLS Regression
Think of the following two variables panel regression model in fixed effect form:
��� = � + 1 �1�� + ��� ……………………………………………………. …………4.3
 � is called an “entity fixed effect” or “entity effect” – it is the constant (fixed)
effect of being in entity i .
The state averages satisfy:
� �
1 1 1
��� = � + 1 ��� + ���
� � �
�=1 �=1
Deviation from entity averages:
� �
1 1 1
��� − ��� = (� − � ) + 1 �1�� − ��� + (��� − ��� )
� � �
�=1 �=1
1 � 1
 ��� =1 ��� +��� , �ℎ���: ��� = (�1�� − � � )
�=1 ��
, ��� = (��� − � ��� )
Then we apply OLS ��� =1 ��� +��� to estimate 1 .

211 | P a g e
3.4.2. The Random Effects (RE) Approach
If you believe that some omitted variables may be constant over time but vary among panels,
and others may be fixed among panels but vary over time, then you can apply random effects
regression model. Random effects assume that the entity’s error term is not correlated with the
predictors which allows for time- invariant variables to play a role as explanatory variables. In
random - effects you need to specify those individual characteristics that may or may not
influence the predictor variables.
The basic idea of random effects model is to start with (4.3):
��� = � + 1 ��� + ��� ……………………………………………………. …………4.3�
Instead of treating � as fixed, we assume that it is a random variable with a mean value of 
(no subscript i here). And the intercept value for an individual entity can be expressed as:
� = + it, i = 1……n…………………………………………………………………4.4
Where i is a random error term with a mean value of zero and variance of2 .

What we are essentially saying is that the entities included in our sample are a drawing
from a much larger universe of such population and that they have a common mean value
for the intercept (=) and the individual differences in the intercept values of each entity are
reflected in the error term i .
Substituting (4.4) into (4.3a), we get:
��� = + 1 ��� + it + ��� ……………………………………………………. …………4.5
��� = + 1 ��� + wit , where: wit = it + ���
In random effects model (REM) or error component model (ECM) it is assumed that the
intercept of an individual unit is a random drawing from a much larger population with a
constant mean value. The individual intercept is then expressed as a deviation from this
constant mean value. One advantage of ECM over FEM is that it is economical in degrees of
freedom, as we do not have to estimate N cross- sectional intercepts. We need only to estimate
the mean value of the intercept and its variance. ECM is appropriate in situations where the
(random) intercept of each cross sectional unit is uncorrelated with the regressors.

212 | P a g e
3.5. Choosing Between Fixed and Random Effects
If you aren't exactly sure which models, fixed effects or random effects, you should use, you can
do a test called Hausman test. To run a Hausman test in Stata, you need to save the coefficients
from each of the models and use the stored results in the test. To store the coefficients, you can
use "estimates store" command.

The hausman test tests the null hypothesis that the coefficients estimated by the efficient
random effects estimator are the same as the ones estimated by the consistent fixed effects

213 | P a g e
estimator. If they are, then it is safe to use random effects. If you get a statistically significant P
- value, however, you should use fixed effects. In this example, the P - value is statistically
significant. Therefore, fixed effects would be more appropriate in this case.
Fixed Effect Model Random Effect Model
Functional form ��� = ( + it ) + 1 ��� + ��� ��� = + 1��� + (it + ��� )
Intercepts Varying across groups and/or Constant
times
Error variances Constant Varying across groups and/or
times
Slopes Constant Constant
Estimation LSDV ,within effect method GLS,FGLS
Hypothesis test Incremental F test Breusch- Pagan LM test

Other Tests/Diagnostics
Testing for Time -fixed Effects
It is a joint test to see if the dummies for all years are equal to 0, if they are then no time
fixed effects are needed.

We failed to reject the null that all years’ coefficients are jointly equal to zero therefore no time
fixed effects are needed.

214 | P a g e
Testing for Random Effects:
Breusch-Pagan Lagrange Multiplier (LM)
The LM test helps you decide between a random effects regression and a simple OLS regression.
The null hypothesis in the LM test is that variances across entities are zero. This is no significant
difference across units (i.e. no panel effect).

Here we reject the null and conclude that random effects is appropriate. This is, there is
evidence of significant differences across countries, therefore you should run a random effects
regression.
Testing for Cross-Sectional Dependence/Contemporaneous Correlation:
Using Breusch-Pagan LM Test of Independence
According to Baltagi, cross- sectional dependence is a problem in macro panels with long time
series (over 20- 30 years). This is not much ofa problem in micro panels (few years and large
number of cases).The null hypothesis in the B- P/LM test of independence is that residuals
across entities are not correlated.
The command to run this test is xttest2 (run it after xtreg, fe) :
xtreg fatality beertax, fe
xttest2
Testing for Cross-Sectional Dependence/Contemporaneous Correlation:
Using Pesaran CD Test
Pasaran CD (cross- sectional dependence) test isused to test whether the residuals are correlated
across entities*. Cross- sectional dependence can lead to bias in tests results (also called
contemporaneous correlation). The null hypothesis is that residuals are not correlated.
The command for the test is xtcsd, you have to install it typing ssc install xtcsd
xtreg fatality beertax , fe
xtcsd, pesaran/xtcsd, frees/xtcsd, frees

Since there is cross- sectional dependence in our model, it is suggested to use Driscoll and
Kraay standard errors.

215 | P a g e
A test for heteroscedasticity is available for the fixed- effects model using the command xttest3.

The null is homoscedasticity (or constant variance). Above we reject the null and conclude
heteroscedasticity. Serial correlation tests apply to macro panels with long time series (over 20-
30 years). Not a problem in micro panels (with very few years). Serial correlation causes the
standard errors of the coefficients to be smaller than they actually are and higher R- squared.
A Lagrange- Multiplier test for serial correlation is available using the command xtserial.

The null is no serial correlation. Above we reject the null and conclude the data do have first
order autocorrelation.
Testing for Unit Roots/Stationarity
The Levin Lin - Chu (2002), Harris- Tzavalis (1999), Breitung (2000; Breitung and Das 2005),
Im - Pesaran Shin (2003), and Fisher- type (Choi 2001) tests have as the null hypothesis that
all the panels contain a unit root. The Hadri (2000) Lagrange multiplier (LM) test has as the
null hypothesis that all the panels are (trend) stationary. The top of the output for each test makes
explicit the null and alternative hypotheses. Options allow you to include panel - specific means
(fixed effects) and time trends in the model of the data- generating process.

216 | P a g e
217 | P a g e

You might also like