75% found this document useful (4 votes)

2K views160 pages

Econonometrics Module-1

This document provides an overview of econometrics. It defines econometrics as dealing with measuring economic relationships between variables. Econometrics integrates economic theory, economic statistics, mathematical statistics, and mathematical economics. It differs from these disciplines by incorporating random factors not reflected in economic theory or mathematics. The document outlines the goals of econometrics as testing economic theories, informing policymaking, and enabling forecasting. It also divides econometrics into theoretical and applied branches.

Uploaded by

Firanbek Bg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

75% found this document useful (4 votes)

2K views160 pages

Econonometrics Module-1

Uploaded by

Firanbek Bg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 160

Prepared by

Alemayehu Kebede

Edited by
Tesfaye Melaku

December 25/06

1
Chapter 1: Definition, Scope, & Division of Econometrics ............................................. 4
1.1 Econometrics & Other Disciplines of Economics .................................................... 4
1.2 Goals of Econometrics............................................................................................. 5
1.2.1 Analysis :Testing of Economic Theories.......................................................... 5
1.2.1 Policy Makings ................................................................................................. 5
1.2.3 Forecasting........................................................................................................ 6
1.3 Division of Econometrics ........................................................................................ 6
1.3.1 Theoretical Econometrics .................................................................................. 6
1.3.2 Applied Econometrics........................................................................................ 7
1.4 Methodology of Econometric Research................................................................... 7
Chapter 2: Simple linear regression model.................................................................. 15
OR..................................................................................................................................... 15
The Classical linear regression model .............................................................................. 15
2.1 The simple linear regression Analysis .................................................................. 15
2.2 Assumptions of the linear stochastic regression model. ........................................ 17
2.2.1 Assumption about Ui ...................................................................................... 17
2.2.1 Assumption about Ui & Xi .............................................................................. 17
2.2.2 Relation ship about explanatory variables ....................................................... 18
2.3 Estimation of the model ......................................................................................... 19
2.4 Statistical tests of Estimates................................................................................... 22
2.5 The relationship between r2 and the slope coefficient β̂ . ...................................... 25
2.6 Significance test of the Parameter estimates.......................................................... 26
Chapter 3 :Properties of the least square Estimates .......................................................... 44
3.1 The Gauss-Markov Theorem ................................................................................ 45
3.2 Importance of the BLUE Properties:- .................................................................... 46
3.3 Maximum Likelihood Estimation .......................................................................... 46
Chapter 4: Multiple Linear Regression............................................................................. 51
4.1 Variance & standard errors of OLS estimators..................................................... 53
4.2 Test of significance of the parameter estimates of multiple regressions. .............. 53
4.3 Importance of the statistical Test of Significance.................................................. 62
Chapter 5: Relaxing the assumptions of the classical model............................................ 65
5.1 Violation of the important assumptions................................................................. 65
5.1.1 Hetroscedaticty:- ............................................................................................ 65
5.1.1.1 Consequences of hetroscedasticity ............................................... 67
5.2.1 Graphic methods of detecting autocorrelation........................................................ 91
5.3.3 Test for detecting multicollinearity....................................................................... 116
Chapter 6 : Estimation with Dummy Variables.............................................................. 123
6.1 Some important uses of Dummey variables ......................................................... 127
6.2 Use of Dummy variables in Seasonal analysis ..................................................... 131
Chapter 7 : Simultaneous equation models..................................................................... 134
7.1 The nature of simultaneous – equation model ..................................................... 136
7.2 Inconsistency & Simultaneity Bias of OLS Estimators........................................ 137
7.3 Solution to the simultaneous equations ................................................................... 140
7.4 Some definitions of simultaneous equations ........................................................... 140
7.4 The Identification Problem ....................................................................................... 146
7.4.1 Under identification .............................................................................................. 147

2
7.4.2 Exact /Just/ Identification ..................................................................................... 148
7.4.3 Over identification ................................................................................................ 150
7.5.1 Order Condition for identification ........................................................................ 152
7.5.2 Rank Condition for identification ......................................................................... 153
7.5.3 Rank Condition ..................................................................................................... 155
7.6 Estimation of Simultaneous equation models.......................................................... 155
7.6.1 Ordinary Least Squares......................................................................................... 156

3
Chapter 1: Definition, Scope, & Division of Econometrics

Definition of Econometrics: - Econometrics deals with measurement of economic relationships

between economic variables (dependent & independent variables). The term econometrics is
derived from two Greek words. i.e. economy & measure. Different economists give different
definition for Econometrics, but all of them are arriving at the same conclusions and we can boils
down the whole definition in to the following.
“Econometrics is the positive interaction between data & ideas about the way the economy
works.”The central role of Econometrics was often regarded as one of estimating the parameters
of the model as efficient way as possible, given a particular set of data with which to apply
statistical & mathematical techniques. To test the validity of Economic theory Econometrics
provides us numerical values for the parameters of economic relation ships & using these
numerical values we can verify the economic theories. To arrive at these numerical values of
economic relationships we use economic theory, mathematics & statistics. Though econometrics
uses all these it is different from each one of them due to its distinctive nature. One of the most
distinctive natures of Econometrics is that it contains the random term which is not reflected in
mathematical economics & economic theory.

1.1 Econometrics & Other Disciplines of Economics

Econometrics is an integration of economic theory, Economic statistics, mathematical statistics &

mathematical Economics.
A) Economic theory states a qualitative relationship between the explanatory & explained
variable using Cetrus Peribus assumptions.
Ex.1. Consumption depends up on current income (Yt) & previous income (Yt-1) of an individual
other things being constant. This theory does not give any insight how current income & previous
income will affect consumption by giving numerical values.
B) Mathematical Economics: It explains the economic theory in the equation or
mathematical forms. Or mathematical economics explain the theory of economics in to
mathematical relationship between variables. /we can explain the above theoretical
relationship in mathematical form /example 1/ as follows

Ex. 2.
Ct = α + β 1Yt + β 2Yt − 1 − − − − − − − − − (1.1)
Where
Ct.: consumption expenditure
Yt: current income
Yt-1: previous income
Again this mathematical relation does not capture other factors that affect consumption
expenditure. Then mathematical economics explain the exact relation ship between the
dependent variable (Ct) & the independent variables (Yt &Yt-1) by ignoring other variables
that affects consumption expenditure.
C) Economic Statistics:- It is a descriptive aspects of economic theory. i.e. by collecting,
processing and presenting economic data in the form of table & charts. Though Economic
statistics provides numerical data like mean, median - standard deviation etc. but it does
not make reliable the relationship between the economic variables.
D) Mathematical statistics:- This is based up on the probability theory, which are developed
on the basis of controlled experiments. This statistical method can be applied in economic

4
relationships because such experiment can not be designed for economic phenomena.
This probability theory applied for very few cases in economics such as Agricultural or
industrial experimentations.
In all of the above methods they completely ignore the other factors that will affect the economic
relationship but econometrics by developing a method for dealing with the random term that will
affect the economic relation ships differentiate itself from the remaining.
Ct = α + β 1Yt + β 2Yt − 1 + ut : − − − − − − − − − − (1.2)
All variables have the same meaning as equation (1.1)
Ut: means the random term which represents all other factors that will affect consumption
expenditure. These factors may be many such as, invention of new product, wealth, wind fall gain
& loss, migration, tradition, etc. are affecting consumption expenditure. All these factors will
have their own influences on the consumption expenditure. Then econometrics by considering
other factors (represented by Ui) will find numerical values for coefficients of the variable that
will explain the relationship to verifying economic theories.

1.2 Goals of Econometrics

There are three main goals of Econometrics. These are:
1. Testing of Economic theories /Analysis/
2. Estimation of the coefficients of Economic relationship /Policy making/
3. Forecasting the future values of economic magnitudes. /Forecasting/
Successful econometric application should include some combination of all the three goals.

1.2.1 Analysis: Testing of Economic Theories

In the early stages of the development of economic theory, so called armchair economist, were
formulating basic economic principles using verbal explanation and applying deductive procedure
(from general to particular). During this period of time economists by pure logical reasoning
derive some general conclusions /laws/ concerning the working process of economic system. In
this period no attempt were made to examine whether the theories explained adequately the actual
economic behavior or not.
Now a day any economic theory is subject to the empirical test of econometrics. i.e. the
explanation power of economic theory to explain the economic behavior is tested by
econometrics. Then Econometrics aims primarily at the verification of Economic theories & there
by to know & decide how well they explain the observed behavior of the economic units.

1.2.1 Policy Makings

This means by applying different methods of econometrics techniques we can obtain individual
numerical values for the coefficients of economic relationship. Using these numerical values a
decision can be undertaken by different economic agents. Econometrics can supply MPC, elastic
ties, MC, MR etc. Using these magnitudes (numerical values) decision will be undertaken.
Ex.
D = α + β1 I + β 2 Ex + β 3 PI + β 4 PEx + Ui ---------- - - - -1.3
Where
D= devaluation,
I is volume of import,
PE volume of export;
PI is price of import,
PE x is Price of export.
Then devaluation will depend on all these explanatory variable coefficients. From these
coefficients we can have

5
B1 = Marginal propensity to import,
B2 = Marginal propnsity to export
B3 & B4 import & export propensities respectively
Then on the basis of these coefficients of numerical values the government will decide whether
devaluation will eliminate the countries deficit or not.

1.2.3 Forecasting
It means using the numerical values of the coefficients of economic relation ships we can judge
whether to take any policy measure in order to influence the future value of economic variables or
not
Assuming that the estimated result from the Ethiopian economy for the year 1985-1995.
Yˆ = −261.09 + 0.2453 Xi − − − − − − − − − − − − − − − − − − − −(1.4)
Where Yi = Ethiopian expenditure on imposed goods
Xti = Personal disposable income.
Then on the basis of the above result the government can able to know his expenditure in any
year after 1995 using the above equation. If disposable income (Xt) will be 1 million in 1999 his
expenditure on imported goods will be Yˆi = -261.09 + 0.2453 (1,000, 000) = 245,038.91 by the
year 1999. Then since the government knows the future values of expenditure on imported goods
& services he can take any measure to increase or cut down imports using these numerical
values.Forecasting is used for both developed & developing countries in different ways. i.e.
developed countries used it for regulation of their economics where as developing countries used
it for planning purpose.

1.3 Division of Econometrics

Just like any subjects Econometrics also decomposed in to two branches:- Theoretical & Applied
econometrics.
1.3.1 Theoretical Econometrics: - It is the development of appropriate econometric methods
for measuring economic relationship between variables in theoretical Econometrics.
i. the data used for measurement purpose are observations from the real world but are
not derived from controlled experiments
ii. Econometric relation ships are not exact.
The econometrics method that will be used in theoretical Econometrics may be classified in to
two
a) Single - equation techniques i.e. one side relationship between variables at a time.
Ex. Qd = α + β 1 Pi + ui − − − − − − − − − − − − − (1.5)
Means quantity demanded depends up on the price of the commodity but not price
depends up on quantity. Then we have only one side causations. Then we can apply
Econometrics techniques only for this equation.
b) Simultaneous equation model :- When there is two sided causation
Ex. Equation (1.5) explains that quantity demand depends on the price of the commodity
but if the price of the commodity is in turn depends on the quantity of commodity
supplied. Then we will have two side causations
Pi = α + β Qs. + Ui − − − (1.6)
Econometrics techniques will applied for three equations
Qd = α + β p i + Ui − − − − − dd − equation − − − − − − − − − − − −(1.6)
Pi = α + αq s i + ui − − − − − price − equation − − − − − − − − − − − (1.7)
Qd = Qs --------------------- Identity ---------------------------- - - - -(1.8)

6
Then in this case we applied Econometrics techniques simultaneously for all equations at a time.
1.3.2 Applied Econometrics:- This is the application of theoretical Econometrics methods to
the specific branch of economic theory i.e.application of theoretical Econometrics for verification
& forecasting of demand, cost, supply, production, investment, consumption & other related field
of economic theory.

1.4 Methodology of Econometric Research

In any econometrics research we may distinguish the following steps.

Economic theory

Mathematical model of the theory

Econometric model of the theory

Collecting data

Estimation of Econometric model

Evaluation of Estimates
(Hypothesis testing)

Application (forecasting)

Stage 1. The first step in Econometrics is to formulate the economic theory that will be tested
against reality using Econometrics.
Ex,
• The theory may hypothesize that "Aggregate saving in the economy is affected by the
average interest rate and the one year lag of income (Previous year income).
• Or if you take Keynes psychological law of consumption it hypothesize consumption is a
function of income to be precise. "Aggregate consumption in terms of wage units (Cw) &
aggregate income (Yw) in terms of wage units are called this relation ship propensity to
consume.
• Or consumption of an individual at any period of time depends upon income of an
individual at period t & future rate of interest.

Stage 2. Specification of the Model:-This is transformation of econometric theory in to

mathematical model that explain the relation ship between economic variables. Under this stage
we will have the followings.

A) Selecting variables:- it involves the determination of dependent (endogenous or explained)

variable and independent (exogenous or explanatory) variables of the theory. From the above
example we can identify that

7
Ex.1.
Aggregate saving is the dependent variable & the remaining variables (interest rate &
previous year income) are the independent variable.
E.x.2
Aggregate consumption in terms of wage units is the dependent variable & aggregate
income in terms of wage units is the independent variable

Ex. 3.
Consumption of an individual at time t is dependent variable and future rate interest & income are
independent variables.
B) Determine the theoretical values: a prior expectation of the sign & magnitude of the
parameters. This needs only a theoretical background to determine the relationship between the
dependent & independent variables i.e. negative or positive relation ship between variables. From
our example we can have the following sign or direction of relation ship between variables
Ex.1.
i. Interest rate & saving have positive relationship and also there is a positive relation
ship between income and saving. Then we can say that the sign of the parameters that
will explain the relationship between aggregate saving & interest rate & income have
to be positive.
ii. Aggregate consumption in terms of wage & Aggregate income in terms of wage are
positively related & the sign of the parameter has to be positive
iii. The relation ship between consumption at time t & income at time t has positive
relationship & consumption at time t & future rate of return have negative
relationship (if future rate of interest is high an individual will cut down his
consumption at time t & post pone his consumption for other period & increase his
savings).
C) Specification of the model: In this stage we specify the relationships between the
dependent & independent variables on the basis of economic theories. In this stage we also
determine the number of equations (single equation or simultaneous equation model) & the type
of equation i.e. whether the relationship between economic variables explained using linear or
non- linear equations. Let’s specify our previous theoretical relation ships.
Ex. 1.
S t = α + β 1Yt −1 + β 2 r − − − − − − − − − − − − − − − (1.9)
Where St = aggregate savings,
Yt-1 is previous income
r is rate of interest.
Ex- 2
Ct = α + β 1Yw − − − − − − − − − − − − − − − − − (1.10)
Where Ct = aggregate consumption in terms of wage
Yw is aggregate income in terms of wage.
Ex. 3.
Ct = αYi β1 ri β 2 − − − − − − − − − − − − − − − − − −(1.11)
Where Ct is consumption at time t,
Y is income at time t
re is future rate of interest.
All the above equation is single equation model but equation 1.9 & 1.10 are linear equations &
equation 1.11 is non linear equation. Magnitude of the coefficients of the variables ( α , β1 & β 2 ),
what will be the likely magnitudes of these coefficients? The magnitude or the size of the
numerical values of the coefficients of the variables ( α , β1 & β 2 ) are determined by the

8
economic theory & empirical observation of the real world. In equation 1.9 & 1.10the coefficient
β1 represents marginal propensity to consume and the magnitude of β1 is 0< <1 it is determined
by the economic theory. The explanation of the magnitude of equation 1.9 &1.10 are different
from equation 1.11. In equation 1.9 &1.10 which is a linear equation the coefficients of the
variable explains the marginal magnitudes but equation 1.11 explains the elastic ties.
Ex. In equation 1.9 & 1.10 if income increases by 1 birr on the average consumption will increase
by amount. The same thing is true in the interpretation of 2 in equation 1.10 i.e. if rate of
interest is increasing by one birr on the average saving will increased by 2 amount. But in
equation 1.11 1 & 2 explains elastic ties i.e. if income increases by 1% consumption will
increase on the average by 1% & for 2 if rate of interest is increasing by 1% consumption will
be cut down on the average by 2 %.

Possible errors committed at this stage are

i. Wrongly specifying the model (i.e. relationships that will be explained using non-linear
relationship may be specifying in linear form & the reverse.) - Mis-specification of the
model.
ii. Relevant explanatory variables may not be included (Omission of relevant explanatory
variables)
iii. Irrelevant explanatory variables may be included.

Step - 3. Specification of the econometric model

The above equation (1.9 to 1.11) explains the mathematical relationship between the explained &
explanatory variables is inexact form or deterministic form. i.e. all the dependent variable will be
affected only by those independent variables alone & any other variables can not affect the
dependent variable. But in reality different factors will affect the economic relationships that will
not be captured by our example.
Ex. In equation 1.9 states that saving will depend up on previous income & rate of interest alone.
But in reality many variables will affect savings such as wealth; consumption, windfall gain &
loss, health of the individual, etc. There are many factors that will affect the saving capacity of
individuals. Then those factors which are not incorporated in our model will make the
relationship between the dependent & the independent variables inexact. Then these factors make
the Economic relationship between the dependent & the independent variable is inexact & it can
be specified in the following model.
S t = α + β Yt −1 + β 2 r + Ui − − − − − − − − − − - - - - - - 1.12
Cw = α + β Yw + Ui − − − − − − − − − − − − − − − ------1.13
Ct = αYi β1 r − eβ 2 e ui − − − − − − − − − − − − − − − -------1.14
In all the above equations Ui represents all factors that affects the dependent variable but not
explained or taken in to account explicitly in the model. Then Ui is called the disturbance term or
error term or random term or stochastic variables. The inclusion of Ui in the mathematical
economics (exact relationship between variables) will transform the model in to Econometric
model (inexact relationship between variables or the unexplained variable in the model will
capture or explained by Ui.)

Step - 4 Obtaining Data

The data used in the estimation of econometric model may be of various types.
• Time series data: a data related to a sequence of observations over time on an individual
or group of individuals etc. Ex. 1996 E.C. represent by Ct where t indicates time from
1980 - 1996 E.C.

9
• Cross-sectional data: data collected on one or more variables collected at particular
period of time. Ex. Number of children registered for schooling in all K.G. Schools of
Bahir Dar in 1999 E.C. by sex, age, religion etc.
• Panel data:- These are the results of repeated survey of a single (cross sectional data)
sample in different periods of time. Ex. If consumption expenditure of a sample of
population from Bahir Dar city on Teff, Coffee, cloth is taken in 1985, in 1990 & 1996.
• Polled data:- These are data of both time series & cross sectional data.
• Dummy Variable: These are data constructed by econometricians when they are faced
with qualitative data. These qualitative data may not be measurable in any one of the
above methods ex. sex, religion, race, profession etc. The value of these data can be
approximated using dummy variables ex. if religion is appearing in the independent side
of the equation since we do not have qualitative data we can assign 1 for Christian & 0
Otherwise.
Accuracy of data:- Though plenty of data are available for research purpose but the quality of
data matters in arriving at a good results. The quality of data may not be good for different
reasons.
i. Most social science data are not - experimental in nature i.e. there will be omission,
errors etc.
ii. Approximation & round off the numbers will have errors of measurement.
iii. In questioner type of survey non-response and not giving an answer for all questions
may lead to selectivity bias. /rejecting non-response & excluding those questions
which is not answered by the respondent/
iv. The data obtained using one sample may be varying with the data obtained in another
sample & it is difficult to compare the results of these two samples.
v. Economic data are available at aggregated level & errors may be committed in
aggregation.
vi. Due to confidentiality of some data’s it is impossible to get the data or may be
published in aggregated form.
Because of the above reasons one can deduce that the results obtained by any researchers are
highly depending up on the quality of the data. Then if you get unsatisfactory results the reason
may be the quality of the data if you correctly specifying the model.

Step 5. Estimation of the Econometric Model

We can estimate the coefficients of the independent variables which explain the relationship
between economic variables in two different ways. Single or simultaneous equation methods.
• Single equation method:- This techniques of estimation is applicable only for one
equation at a time
Ex.
Qd=α+ Pi+Ui-- - - - - - - - ---1.15
Where Qd= quantity demanded
P is price.
For this & any this kinds of equation we can apply different methods of estimation. These are
OLS (Ordinary least squares,) In direct least square or reduced form techniques, two stages least
squares (2SLS), Limited Information (LI), Maximum likelihood (MLI) & Mixed estimation
method may be used.

• Simultaneous equation techniques!- When we have more than one equation & if the
numerical values of the coefficients are determined simultaneously at a time then we use
any one of the following methods of estimation, Three stage least squares (3SLS), & the

10
Full information Maximum Likelihood (FIML) method. The selection of the techniques
of estimation will depends upon many factors.
a) Nature of the relationship between economic variables and its identification. Under this
condition if we studied the economic relationship using a single equation method then the
best method is OLS. But if the relationships between economic variables are in a function
of simultaneous equation we may use any techniques from the above stated.
b) On the properties of the estimated coefficients obtained from each method agood estimate
should give the properties of unbiased ness, consistency, efficiency & sufficiency or a
combination of such properties.If one method gives an estimate which passes more of these
desirable characteristics than any other estimates from other methods, then that techniques
which possess more of the desirable characteristics will be selected.
c) On the purpose of Econometric research:- If the purpose of the model is forecasting the
property of minimum variance is very important i.e. the techniques which will give the
minimum variance of the coefficients of the variables will be selected. But if the purpose of
the research is for policy making (analysis) that techniques which gives unbiased ness of
the variables will be selected.
d) On the simplicity of the techniques: If our interest is simple computation we can select that
technique which involves simple computation & less data requirement.
e) Time & cost required for computation of the coefficients of the variables may determine
the selection of the Econometric techniques.
The estimation of the /coefficients of the variable/ the model can be computed using any one of
the above stated econometrics techniques. Some techniques which are theoretically applicable
may not be used for estimation purpose due to non-availability of data or defaults of the statistical
results obtained from the technique.
Having selected the econometric method that will be applicable for estimation of the model one
should take in to consideration whether the model is linear in variable & in parameters.
a) If the model is non - linear in parameters it is beyond to this level of Econometric
analysis
Ex.
Y = α + β 12 X 1 + β 13 X 2 + Ui -------------------------1.16
Since the coefficient 1 is the power of 2 & 2 is the power of 3 then we call these kinds
of model non-linear in the coefficients.
b) If the model is non-linear in variables then before estimation the model has to be
transformed in to linear model.
To know whether the model is linear or non-linear in variable we can take the first derivatives &
if the first derivative of the model gives us a constant number then the model is linear in variables
but if it doesn' t give us a constant number the model is non-linear in variable.
Example (1)
Yi = α + β1 X 1 + Ui − − − − − − − − − − − −(1.17)
If you take the first derivation of Yi w.r.t. X.
αyi
i.e = β 1 [Which is the coefficient of Xi] is a constant number then the
αxi
equation is linear in variable.
Example (2)
Yi = α + β X i 2 + Ui − − − − − − − − − − − (1.18)
αyi
= 2 βX i Then we can say that the model is non-linear in variable because
αxi
the first derivation w.r.t.x does not give us a constant number.

11
(Example 3)
Yi = β 1 + β 2 ( 1 ) + Ui − − − − − − − − − − − (1.19)
X2
αyi
= −2 β 2 X i or − β 2 ( 1 3 ) then since the first derivation is not equal to a
−3

αxi X
constant number then again the model is non-linear in the variable.
To estimate the model which is non-linear in variable we should first transform the model in to
linear model.
Equation (1.18)
Yi = α + βxi + Ui − − − − − − − − − − − − − − − −1.20
*

*
Where X i = x 2
Yi = β 1 + β 2 X i + Ui − − − − − − − − − − − − − − − 1.21
*
Equation (3)
Where X i = 1
*
xi
Again if you have the following models first transform as follows
−β
Yt= αxi eu transform in to lnyt= lnα- β , ln Xi +Ui
α + βxtu
Yt = e Transform in to lnyt =α+ Xi + Ui
e − y = α + x β tui Transform in to y=α+ log ex + Ui
Y = e α + β / xtui Transform in to y=α+ ( 1 ) + Ui
x
Having transformed the model from non-linearity in variable to linearity in variable then we can
estimate the model using the appropriate (selected) method of econometrics methods.

Step - 6 Evaluations of Estimates:

After estimating & obtaining the coefficients of the variables one has to precede to the evaluation
of the results obtained using econometric methods. At this stage we are evaluating the reliability
of the results whether they are theoretically meaningful & statistically satisfactory results.
To evaluate the reliability of the estimates we apply /follow/ the following steps
i. Economic interpretation of the results - Economic a priori criterion.
ii. Statistical interpretation of the results - statistical analysis on the basis of R2 ,t, test,
F- test, s.d.
iii. Test of Econometric criterion.

Step A. Economic a prior criterion: at this stage we should confirm that whether the estimated
values explain the economic theory or not i.e. it refers to the sign & magnitudes of the
coefficients of the variables.
Ex. 1. If we have the following consumption function
Ct= α+ 1Yt + Ut -------------------------------------------1.22
Where Ct: consumption expenditure,
Yt is income
From the economic theory (economic relationship between consumption and income) it is known
that represents MPC (Marginal Propensity to consume). Then on the basis of a priori-economic
criterion it is determined that the sign of has to be positive & the magnitude (size) again is in
between zero & one (0< <1). If the estimated results of the above consumption function gives
Cˆ i = −3.32 + 0.2033Yt ----------------------------1.23

12
From the economic relationship explained by economic theory states that if your income increase
by 1 birr your consumption will increase on the average by less than one birr i.e .203 cents. Then
the value of 1 is less than one & greater than zero in its magnitude (size) again the sign of 1 is
positive. Therefore, the estimated models explains the economic theory (economic relationship
between consumption & income) or satisfies the a priori - economic criterion. If another
estimation of the model using other data gives the following estimated results
Cˆ t = 24.45 − 5.091Yt -----------------------------1.24
Where Ct is consumption expenditure Yt is income. From Economic theory it is known that 1
has to be positive & its magnitude is greater than zero & less than one. But the estimated model
results that the sign of 1 is negative & its magnitude is greater than one then we reject the
model because the results are contradictory or do not confirm the economic theory.
In the evaluation of the estimates of the model we should take in to consideration the sign &
magnitudes of the estimated coefficients. If the sign & magnitudes of the parameter do not
confirm the economic relationship between variables explained by the economic theory then the
model will be rejected. But if there is a good reason to accept the model then the reason should be
clearly stated. In general if the apriori theoretical criterions are not satisfied, the estimates should
be considered as unsatisfactory.In most of the cases the deficiencies of empirical data utilized for
the estimation of the model are responsible for the occurrence of wrong sign or size of the
estimated parameters. The deficiency of the empirical data means either the sample observation
may not represents the population (due to sampling procedure problem or collecting inadequate
data or some assumption of the method employed are violated). In general if a prioriy criterion is
not satisfied, the estimates should be considered as unsatisfactory.

Step-B- First order test or statistical criterion: If the model passes a prior-economic
criterion the reliability of the estimates of the parameters will be evaluated using statistical
criterion. The most widely used statistical criterions are:
• The correlation coefficient - R2/r2/
• The standard error /deviation/ S.E of the estimate
• t- ratio or t-test of the estimates.
Since the estimated value is obtained from a sample of observations taken from the population,
the statistical test of the estimated values will help to find out how accurate these estimates are
(how they accurately explain the population?).
R2 will explain that the percentage of the total variation of the dependent variable
explained by the change of the explanatory variables (how much % of the dependent
variable is explained by the explanatory variables).
S.E. (Standard error or deviation) - measures the dispersion of the sample estimates
around the true population parameters. The lower the S.E. the higher the reliability (the
sample estimates are closer to the population parameters) of the estimates & vice -versa.

Step -C- Second order test /Economic Criterion/: after testing a prior test & statistical test
the investigator should check the reliability of the estimates whether the econometric assumptions
are holds true or not. If any one of the assumption of Econometric assumptions are violated.
The estimates of the parameters cease to posses some of the desirable properties (un
biased ness, consistency, sufficiency etc)
Or the statistical criterion loses their validity & became unreliable.
If the assumptions of econometric techniques are violated then the researcher has to re –
specifying the already utilizing model. To do so the researcher introduce additional variable in to
the model or omit some variables from the model or transform the original variables etc.
By re-specify the model the investigator proceeds with re- estimation & re-application of all the
tests (a priori, statistical & econometric) until the estimates satisfies all the tests.

13
Step 7. Forecasting or Prediction
Forecasting is one of the prime aim of econometric research the estimated model may
economically meaningful, statistically & econometrically correct for the sample period. But given
all these it may not have a good power of forecasting due to the inaccuracy of the explanatory
variables & deficiency of the data used in obtaining the estimated values.
If this happens the estimated value (i.e. forecasted) should be compared with the actual realized
value magnitude of the relevant dependent variable. The difference between the actual &
forecasted value is tested statistically. If the difference is significant we concluded that the
forecasting power of the model is poor. If it is statistically insignificant the forecasting power of
the model is good.

Exercise for chapter one

1) Define Econometrics? How does it differs from Mathematical Economics &

Mathematical statistics?
2) What are the goles of econometrics & explain it using example?
3) What is the difference between theoretical & applied econometrics?
4) What is the differencebetween the model which is linear in variable & & non
linear in variable? & how would you interperate the parameters?
5) What are the steps apply to evaluate the reliability of the estimates?
6) Explain the difference between economic theory & econometrics?
7) Given the following theory which is given by the well known monetary
economist Milton Fridman “ the theory of demand for money have a strong
positive relation ship with price & income but has no relation ship with rate of
interest”
• Write the mathematical relation ship
• Formulate the econometric relation ship
• What will be the size & magnitude of the relation
ship between the dependant & independent
variables?
8) Explain the stages in the methodology of econometrics?

14
Chapter 2: Simple linear regression model

The Classical linear regression model

Introduction:- In economics the relationship between variables are mainly explained in the
form of dependent & independent variables. The dependent variable is that variable which its
average value is computed using the already known values of the explanatory variable(s). But the
values of the explanatory variables are obtained from fixed or in repeated sampling of the
population.
Ex. Suppose the amount of commodity demanded by an individual is depend on the price of the
commodity, income of individual, price of other goods & etc. Then from this statement quantity
demanded is the dependent variable which its value is determined by the price of the commodity
and income of the individual, Price of other goods etc. And price of the commodity, income of
individuals & price of other goods are independent (explanatory) variables whose value is
obtained from the population using repeated sampling. The relationship between these dependent
and independent variable is a concern of regression analysis. i.e.

Qd = f (P, P0, Y etc) -------------------- (2.1)

If we study the relationship between dependent variable & one independent variable i.e. Qd= f (P)
this is known as simple two variable regression model because there are one dependent Qd & one
independent P regression model. However if the dependent variable is depending upon more than
one independent variables such as Qd: f (P, P0, Y) it is known as multiple regression analysis.
The functional relation ship between the dependent and independent variable may be linear or
non-linear.

2.1 The simple linear regression Analysis

The relationship between the dependent & independent variable suggested by economic theory is
usually specified as exact or deterministic relationships. But in reality the relationship between
economic variables are inexact or stochastic or in deterministic in nature.
Ex. Suppose consumption expenditure for a commodity is depending up on current income of the
individual citrus-paribus & assumes that the functional relationship is linear then we can write it.
Ct = α +βYt ------------------------------------------------------2.2
Then for each specific value of current income (Yt) there will be only one corresponding value of
consumption expenditure. This shows that consumption expenditure is depend upon current
income. But consumption expenditure is not only determined by income alone but different
variables such as wealth, previous income, tradition etc affect consumption expenditure. Then
there is inexact relation ship between these two variables and to capture those factors which
affects consumption expenditure in equation 2.2 we in corporate a variable ‘U’. Then we can
write the equation as follows
Ct = α +βYt + Ut----------------------------------------- (2.3)
Where Ct is the dependent variable, Yt is independent variable, α & β are regression parameters,
Ui is the stochastic disturbance term or error term.
We introduce ‘U’ – random term due to the following reasons.
i. Omission of variables from the function. In economic reality each variable is
influenced by very large number of factors and each variable may not be included in
the function because of
a) Some of the factors may not be known.

15
b) Even if we know them the factors may not be measured statistically example
psychological factors (test, preferences, expectations etc) are not measurable
c) Some factors are random appearing in an unpredictable way & time. Example
epidemic earth quacks e.t.c.
d) Some factors may be omitted due to their small influence on the dependent variables
e) Even if all factors are known, the available data may not be adequate for the measure
of all factors influencing a relationship
ii. The erratic nature of human beings:- The human behavior may deviate from the
normal situation to a certain extent in unpredictable way.
iii. Misspecification of the mathematical model:- we may wrongly specified the
relationship between variables. We may form linear function to non- linearly related
relationships or we may use a single equation models for simultaneously determined
relationships.
iv. Error of aggregation: - Aggregation of data introduces error in relationship. In many
of Economics data are available in aggregate form ex. Consumption, income etc is
found in aggregate form which we are added magnitudes referring to individuals
where behavior is dissimilar.
v. Errors of measurement:- when we are collecting data we may commit errors of
measurement
In order to take in to account the above source of error we introduce in econometric functions a
random term variable which is usually denoted by the latter U & is called error term, random
disturbance term or stochastic term of the function. By introducing this random term variable in
the function the model will be just like equation number (2.3). The relationship between variables
will be split in to two parts.
Ex. From equation (2.3)
• α+βYt represents the exact relationships explained by the line
• Apart represented by the random term Ui is the unexplained part by the line. This
can be explained using the following graph.
Y

Yn Ct= = α +β yt

Un
Ct Consumption

Un
Yn

Y1 X
0 Current income (Yt)
Figure 1

16
The line Ct: = α +βYt shows /explain/ the exact relation ship between consumption & income but
other variables that affect consumption expenditure are scattered around the straight line. Then
the true relationship is explained by the scatter of observations between Ct &Yt.
Ct = α +βYt + Ut ------------------------ (2. 4)
Variation in Explained
+= ++ Unexplained
consumption variation variation
To estimate this equation we need data on Ct, Yt &Ut, since Ut is never observed like other
variables (Ct & Yt) we should guess the value of ‘U’, that is we should make some assumptions
about the shape of each Ui (mean, S.E, Covariance etc)

2.2 Assumptions of the linear stochastic regression model.

To guess the value of ‘U’ we make some assumptions about Ui & divided these assumptions in to
three
a) Some refer to the distribution of random variable Ui.
b) Some to the relationship between Ui & the explanatory variables
c) Some refer to the relationship between the explanatory variables themselves

2.2.1 Assumption about Ui

i) Ui – is a random real variable:- The value which Ui-may assume in any one period depends
on chance, it may be positive, negative or zero.
ii) The mean value of Ui in any particular period is zero i.e.

E(Ui) = 0 or ui = 0
i =0
iii) Homoscedasticity: (Constant Variance). The variation of each Ui around all values of the
explanatory value is the same i.e. the deviation of Ui around the straight line (in figure 1) is
remain the same var (Ui)= δ u
2

iv) The variable Ui has a normal distribution with mean zero & variance of Ui.
Ui is N(0, δ u )
2

v) Ui is serially independent:- the value of U in one period is not depend up on the value of Ui
in other period of time means the co-variance between Ui & Uj is equal to zero
Cov (UiUj) = 0

Cov (UiUj) = E [ Ui – E (Uj)] [Uj –E(U)]

By assumption ii – the E(Ui) = 0 then

= E [Ui-0] [Uj-0]

= E(Ui) E(Uj)

Again by assumption E(Ui) = 0

Cov (UiUj ) = 0

2.2.1 Assumption about Ui & Xi

i- The disturbance term Ui is not correlated with explanatory variables. It means Ui’s &
Xi’s are not moving together or the covariance between Ui & Xi’s are zero

17
Cov (UiXi) =0
Cov (UiXi)
E {[Ui − E (Ui )][ Xi − E ( Xi )]}
By assumption we have E(Ui)=0 then
= E{[Ui-0][Xi-E(Xi)]}
= E{UiXi-UiE(Xi)]
= E(UiXi)-E(Ui)E(Xi)]
Again by assumption E(Ui)=0
= E(UiXi)-0E(Xi)]
= E(UiXi) since the value of Xi'
s are fixed then
=XiE(Ui)=0
Cov (UiUj)=0
ii- The explanatory variables Xi' s are measured with out error i.e no problem of
aggregation, round off etc. If there is such problem in the measurement it will be
absorbed by the random term Ui.

2.2.2 Relation ship about explanatory variables

If there are more than one explanatory variable the relationships is assumed that they are not
perfectly correlated with each other.
Ex.
Yt= α + β 1 X 1 + β 2 X 2 + β 3 X 3 + Ui
X1&X2, X2&X3, X1&X3 are not correlated with each others. i.e. no multicollinearity.

The distribution of the dependant variable Y

Given the following relationship between variables

Yi = α + β Xi + Ui
Mean of Yi (Expected value of Yi) can be found as follow

E(Yi)= E[ α + β Xi + Ui ]
E(Yi) = α + β Xi + E (Ui )
Where E (Ui) = 0 by assumption
E(Y) = α + β Xi ----- is the mean value of the dependent variable Yi
Variance of Yi =
Var (Yi) = E [Yi-E (Yi)]2
Substitute in place of E(Y) = α + β Xi
Var (Yi) =E [Yi-E ( α + β X ) ]2
Again in place of Yi substitute Yi = α + β Xi + Ui
Var (Yi) = E [ α + βXi + Ui − α − βX i ]2
Var (Yi)= E(Ui)2
From our previous assumption the variance of Ui is equal to E (Ui)2 =δu2 then

Var (Yi)=E(Ui)2 = δu2 which is constant.

The distribution of Y with mean & variance will be
Yi_N ( α + β Xi, δ u )
2

18
2.3 Estimation of the model
The relationship

Yi = α + β Xi + Ui − − − − − − − − − − − − − − − −2.5
Holds for population of the values X&Y. Since these values of the population are unknown we do
not know the exact numerical values of α & 's. To calculate or obtain the numerical values of α
& we took sample observations for Y & X. By substituting these values in the population
regression we obtain sample regression which gives an estimated value of α & given by
αˆ & βˆ respectively then the sample regression line is given by

Yˆi = αˆ + βˆXi − − − − − − − − − − − − − − − −2.6

The true relationships between variables (that explain the population) is given by

Yˆi = α + βXi + Ui − − − − − − − − − − − − − − − − − 2.7

If you estimated this relationship using sample observation we get the estimated relationship
which has the following

Yˆi = αˆ + βˆXi + Ui − − − − − − − − − − − − − − − −2.8

We can estimate the value of α& using least square method (OLS) or classical least squares
(CLS).The reasons to start or use OLS or CLS methods are many
i. The parameters obtained by this methods have some optimal properties i.e. BLUE (Best,
Linear, Unbiased Estimators).
ii. The computational procedure of OLS is fairly simple as compared to other econometric
methods.
iii. OLS is one of the most commonly employed methods in estimating econometric models.
iv. The mechanics of OLS is simple to understand.
v. OLS is an essential component of most other econometric techniques
From the sample observations we will have

Yˆi = αˆ + βˆXi + ei − − − − − − − − − − − − − − − 2 .9
ei = Yˆi − αˆ − βˆXi − − − − − − − − − − − − − − − −2.10
Finding values for the estimates αˆ & βˆ which will minimize the square of residuals ei 2

^ ^
ei 2 = [Yi − αˆ − βˆxi ]2 − − − − − − − − − − − − − 2.11
i =1 i =1
To find the values of α & that minimize this sum, we have to differentiate with respect to
αˆ & βˆ & set the partial derivatives equal to zero

∂ ei 2
= −2 (Yi − αˆ − βˆXi ) = 0 − − − − − − − − − − − −2.12
∂αˆ
∂ ei 2
= −2 (Yi − αˆ − βˆXi ) Xi = 0 − − − − − − − − − − − − − −2.13
ˆ
∂β
First take equation number 2.12to find the value of α̂

19
−2 (Yi − αˆ − βˆXi ) = 0 Run the sum over the equation

−2 (Yi − (αˆ − βˆXi ) = 0

αˆ = nαˆ

−2 Yi − 2nαˆ − 2 βˆ Xi = 0

−2 Yi + 2nαˆ + 2 βˆ Xi =0

2 Yi − 2 βˆ Xi = 2nαˆ Divided by 2n to get α̂

Yi Xi
αˆ = − βˆ
n n
Yi
=Y
n
&
Xi
=x
n
αˆ = Y − βˆx − − − − − − − − − − − − − − − −2.14
Take equation number (2.13) to find the value of β̂

−2 (Yi − αˆ ) − (2 βˆ Xi ) X = 0 Multiply by X

−2 (YiXi − αˆXi −βˆXi 2 ) = 0 Sum it over

− 2( YiXi − αˆ Xi −βˆ Xi 2 ) = 0

YiXi = αˆ Xi + βˆ Xi 2 − − − − − − − − − − − − − − − 2.15

Substitute equation (2.14 ) in to equation (2.15 )

YiXi = (Y − βˆx ) Xi + βˆ Xi 2
YiXi = Y − Xi − βˆx Xi = βˆ Xi 2
YXi − Y Xi = − βˆx Xi + βˆ Xi 2
Y X
We know that Y = &x = substituted
n n

20
Yi Xi Xi
YiXi − = − βˆ Xi + βˆ Xi 2
n n
Yi Xi ( Xi ) 2
YiXi − = − βˆ + βˆ Xi 2 Multiplied both sides by n
n n
n YiXi − Yi ˆ
Xi = − β ( Xi ) 2 + βˆ Xi 2 n
n YiXi − Yi Xi = nβˆ Xi 2 − βˆ ( Xi )2
n YiXi − Yi Xi = βˆ n ( Xi 2 − ( Xi ) 2 )
n YiXi − Yi Xi
βˆ = − − − − − − − − − − − − − − − −2.16
n Xi − (
2
X) 2

The numerical value of αˆ & βˆ can be found in deviation forms. To write the above equation
number 2.16 in deviation form
Take the numerator which is

n Xi Yi − Yi Xi
Added & subtracted Yi Xi
n Xi Yi − Yi Xi = n Xi Yi − Yi X +( Xi Yi Xi Yi)
=n Xi Yi − Yi Xi + Xi Yi − Xi Yi
=n Xi Yi − Yi Xi − Xi Yi + Xi Yi
Yi Xi X Y
=n Xi Yi − Xi −n Yi + n n
n n n n
Take n in common
n [ XiYi − Y X −x Y + nx y ]
n{ ( Xi − x )(Yi − y )} -------------------------2.17

This equation is equal to the numerator of equation number 2.16.

Again from equation 2.16 take the denominator
n Xi 2 − ( Xi ) = n
2
Xi 2 − 2 ( Xi + (
2
) Xi )
2

Xi + ( Xi )
2
=n Xi 2 − 2 Xi
=n Xi 2 − 2nx X + n2 x 2
= n( Xi 2 − 2 x X + nx 2
=n ( xi 2 − x ) 2 − − − − − − − − − − − − − − − − − 2.18

By taking equation 2.17 as numerator & 2.18 as denominator

n ( X 1 − X )(Y1 − Y )
βˆ =
n X1 − X )2
X 1 − X = xi & Yi − Y = yi Substitute in the above equation

21
n xyi
βˆ =
n xi 2
xiyi
βˆ = = − − − − − − − − − − − − − − − − −2.19
xi 2

2.4 Statistical tests of Estimates

The two most commonly used tests in econometrics are r2 i.e. square of correlation coefficient &
the standard error of tests ( s.e.).

• The Square of Correlation Coefficient = r2 /R2/

When we estimate a model of two variable case (one independent (X) & one dependent variable
Y) we find r2. But if we have more than two variable case (one dependent variable & more than
one independent variables (X1,X2...Xn) we will have the coefficient of determination R2.
Definition of r2/R2/After estimation of βˆ & αˆ from the sample data observations of Y & X
using OLS method, we need to know how ' good'is this fit of the line to the sample observations
of Y&X. Means measure the dispersion of the sample observation around the regression line. The
closer the observations to the line the better is the explanation of the variation of Y by the change
in the explanatory variables (X' s)
r2 shows the percentage of the total variation of the dependent variable that can be explained by
the independent variable X.
Y Unexplained variation X
Yi = αˆ + βˆxi

Total variation

Y
Explained variation

X
Suppose a researcher may have Yi=α+ Xi+ Ui model. To estimate this model he took some
sample observation to estimate the value of α & . In his estimation all the data may fall below,
above or on the line. Then using R2 he can observe that whether the regressions line will give the
best fit for these data or not.
• Yi is the observed sample value

22
• Y is the mean value of the sample
• Yˆ is the estimated regression line using sample data
• Yi - Y = shows by how much the actual sample value is deviating from the sample mean
value. This is called total variation represent by small y.
• Yî − Y Explain by how much the estimated values are deviating from the sample mean
value. This is called explained variation & represent by ŷi
• Y − Yî This also shows that the difference between the actual value of Yi & the
estimated value of Yi ( Yî ). This is called unexplained variation represent by ei.
Therefore;
Total variation: yi = Yî − Y − − − − − − − − − − − − − − − −2.20
Explained variation yˆ = Yˆ − Y − − − − − − − − − − − − − − − 2.21
Unexplained variation ei = Yi − Yî − − − − − − − − − − − − − − − 2.22
Sum it over each equation & squared it. We will have

yi 2 = (Yi − Y ) 2 − − − − − − − − − − − − − − − −2.23
We square it because the sum of deviation of any variable around its mean value is zero then to
avoid this we make it squared.

yˆ i 2 = (Yˆi − Y ) 2 − − − − − − − − − − − − − − − − − 2.24
ei 2 = (Yi − Yˆ ) 2 − − − − − − − − − − − − − − − − − −2.25
We can write equation no (2.20) as follows

yi = Yî − Y Yi = yi + Y − − − − − − − − − − − − − − − −2.26
From equation (2.21)
yî = Yˆ − Y Yî = yˆ i + Y − − − − − − − − − − − 2.27
Substitute these equations in equation number 2.21
ei = Yi- Yˆ from the above equation (2.26 &2.27 ).
Yi = yi + Y
&
Yˆ = yˆ + Y
ei = ( yi + Y ) − ( yî − Y )
ei = yi + Y − yˆ − Y
ei = yi − yî
yi = ei + yˆ i − − − − − − − − − − − − − − − 2.28
This shows that each deviation of the sample observed values of Y from its mean Yi − Yˆ = yi
consists of two components
i. yî = Yî − Y which shows the explained amount by the regression line
ii. ei + Yi − Yˆ = the unexplained variation by the regression line
By Taking equation number 2.28 yi = yˆ i + ei Sum it over

23
yi = ( yˆ i + ei ) Squared it
^ ^
yi 2 = ( yˆi + ei ) 2
i =1 i =1

yi = 2
yˆ i 2 + 2 yˆei + ei 2 − − − − − − − − − − − − − − − − − 2.29

From this equation 2 yˆ i ei is equal to zero. We can prove it.

yˆ = Yˆ − Y We know that from equation 2.14

Y = α + βˆxi again
Y = αˆ + βˆx Then if we substitute these in yi = y − Y
yi = αˆ + βˆXi − αˆ − βˆx
yi = βˆ ( Xi − x )
( Xi − x ) = xi is in deviation from
yi = βˆxi − − − − − − − − − − − −2.30
We also know that ei = yi − yˆ i since yi = β̂ xi
Substitute ei = yi − βˆxyi − − − − − − − − − − − − − − − 2.31
=2 yˆiei = from − equation − 2.29 βˆxi ( yi − βˆxi )
= βˆ xi ( yi − βˆxi )
= βˆ xyi − βˆ 2 xi 2
xiyi
From equation 2.19 we know that βˆ = then substitute in the above equation in place of
xi 2
β̂
yˆ ei 2 = βˆ xiyi − βˆ 2 xi 2
= βˆ [ xyi − βˆ xi 2 ]
xiyi xiyi
= 2
xiyi − 2
xi 2
xi xi
xiyi
ŷiei =
xi 2
[ xiyi − ]
yixi = 0

ŷiei = 0 Then equation number 2.29 can be written as follow

yi 2 = yi 2 + ei 2 − − − − − − − − − − − − − − − − − −2.32
Total variation = Explained variation + Unexplained variation
2
Divided both sides of equation number (2.32) by yi & you will get

24
yi 2 yˆ 2 ei 2
= +
yi 2 yi 2 yi 2
yˆ 2 ei 2
1= +
yi 2 yi 2
yˆ 2
2
Is the % of explained sum of squares which is r2
yi
ei 2
1= r2 + − − − − − − − − − − − − − − − 2.33
yi 2
yˆi 2
Then r 2 = − − − − − − − − − − − − − − − 2.34
yi 2
Again from equation 2.32
ei 2
r2 = 1− − − − − − − − − − − − − − − − 2.35
yi 2

2.5 The relationship between r2 and the slope coefficient β̂ .

We know that
yˆ i 2
r =
2
.
yi 2
From equation 2.21
yˆi = Yˆ − Y
yˆi = αˆ + βˆXi − αˆ − βˆx
yˆ i = βˆ ( xi − x )

yˆi = β̂ xi Substitute in equation 2.34

From equation
( βˆxi 2 ) βˆ
2
yˆ 2 xi 2
r =2
= =
yi 2 yi 2 yi 2
From equation number 2.19 we know that
xiyi
βˆ =
xi 2
Then substitute in place of β̂
xiyi xi 2
βˆ .
xi 2 yi 2

25
xyi
r 2 = βˆ .
yi 2
xiyi
Or if you further substitute β̂ again =
xi 2
xiyi xiyi ( xiyi ) 2
r 2= = βˆ . = − − − − − − − − − − − −2.36
xi 2 yi 2 xi 2 . yi 2

Then the value of r2 can be found in different ways

yˆi 2 e2 xiyi β̂ xi 2 ( xiyi) 2

r 2=
. or 1− or β or or
yi 2 yi 2 yi 2 yi 2 xi 2 . yi 2

Limiting values of r2
ei 2
r = 1−
2

yi 2
We have three options
1) If all the sample observations lie on the regression line there will be no scatter point’s i.e
the difference between the actual value & observed value is zero i.e.
ei = Yi- Y =zero. Then r2 will be 1
If the regression line explains a part of the actual values i.e. some are not explained by the
ei 2
regression line then ei will have some values will be greater than zero but less than one
yi 2
then the r2 will lie between zero & one
0 ≤ r2 ≤ 1
ei 2
3) If the regression line does not explain any part of the variation then will be one then
yi 2
r 2 will be zero.

2.6 Significance test of the Parameter estimates

The value of αˆ & βˆ which are obtained from the sample observation must be tested whether
they are statistically reliable to explain the population parameters or not. To test the statistical
reliability of the sample estimates we should know their mean & variance. To do this we follow
the following steps.
a) develop the formula for the computation of the mean & variance of the least square
estimates
b) Explain the statistical significance of the estimates using standard error & t-test
c) Construct confidence intervals for the estimates αˆ & βˆ .

Mean & Variance of the least square estimates

• Mean of β̂

26
We can obtain mean value of β̂ from equation number 2.20
xiyi
βˆ = .
xi 2
In place of yi substitute equation number 2.21 then
xi(Yi − Y )
βˆ = .
xi 2
xiYi − Y xi
= 2
.
xi
By definition it is known that the sum of any variable deviations from its mean is equal to zero.
Then xi = 0 then Y xi = 0

xiYi xiYi
βˆ = 2
. = − − − − − − − − − − − − − − − − − − − 2.37
xi xi 2
The value of the independent variables is a set of fixed values, which do not change from sample
xi
to sample. Then will be a constant number & lets’ represent it by K & equation 2.37 can
xi 2
be written as
βˆ = kiYi − − − − − − − − − − − − − − − − − − − − − − − 2.38
We know that from equation number 2.5

Yi = α + β i + ui Substitute in equation 2.38

βˆ = ki (α + βXi + ui )
βˆ = α ki + β ki Xi + kiui − − − − − − − − − − − − − 2.39
ki = 0 We can prove it as follows
xi xi
ki = = ki = By definition
xi 2 xi 2
x = 0 Because the sum of any variables deviations from its mean is zero
xi
Then = ki = 0
xi 2
Again kiXi = 1
xi
To prove it we know that Ki =
xi 2
xiX
KiXi = − − − − − − − − − − − − − − − − − −2.40
xi 2
xi = ( Xi − x ) Substitute in the above equation 2.40

27
( X − x)x
kiXi =
xi 2
X2 −x x
= 2
− − − − − − − − − − − − − − − − − − − − − 2.41
xi
We can further simplify this as follows. Take the denominator
xi 2 = ( X − x)2 = [X 2
− 2 xX + x 2 ]
= X − 2x
2
Xi + nx 2

2
Since x is constant number summation of xi means multiply by n. Again here x means
X
then substitute in place of x .
n
2
X X
X −2
2
. X +n
n n
( X )2 ( X)
2

X −2 2
+ n.
n n2
( X )2 ( X)
2

X −2
2
+
n n2
X X
X2 −2 X + X
n n
X 2 − 2x X +x X
x2 = X2 −x X substitute in the denominator of equation number 2.41
X2 −x X
KiXi = =1
X −x 2
X
Then kiXi = 1
xi = Xi − x
Xi = xi + x
Substitute in the equation kiXi in place of Xi
xi
ki ( xi + x ) We know that ki =
xi 2
xi ( xi + x ) xi 2 + x xi
= 2
= 2
xi xi
2
xi xi
= 2
+ again by definition xi = 0
xi xi 2
2
xi
kiXi = =1
xi 2

28
Now we turn in to equation number 2.39 & write it as follows (Remember
that xi = 0 & kiXi = 1 )
β̂ = β + kiUi
xi
Again ki = substitute it
xi 2
xiui
βˆ = β +
xi 2
Take the expected (average or mean value of β̂ )
xiui
E ( βˆ ) = E ( β ) + E ( ) Since xi’s are constant
xi
xiE ui
= E (β ) + 2
xi
Again from the assumption number 2 of OLS E (ui)=0
E ( βˆ ) = β - - - - - - - - - - - - - - - - - - - - - - - - - - - - -2 . 42
The mean of OLS estimate β̂ is equal to the true population parameter mean value i.e β
• Variance of β̂
Var ( β̂ ) + E[ β̂ - ε ( βˆ )] 2
From equation 2.42 we know that ε ( βˆ ) = β substitute it
Var ( β̂ ) = E[ β̂ - β ] 2
From equation 2-38 we know that
xiyi xi (Yi − y )
βˆ = 2
= = kiYi
xi xi 2
Then Var ( β̂ ) = var ( kiYi ) = ki 2 var ( Yi ) - - - - - - - - - - - - - - -- - - 2.43
xi
ki =
xi 2
First let’s find var (Yi) = E[Yi − E (Yi)] 2
We know that Yi = α + β Xi + Ui
E (Yi) = E (α ) + β E ( Xi ) + E (ui )
E (Yi) = α + β xi + 0
By assumption number 2 E (ui ) = 0
Substitute α + β Xi + Ui in place of Y & α + β Xi in place of E (Yi). then
2
Var (Yi) = E[α + β Xi + Ui − (α + β Xi )]
= E[α + β Xi + Ui − α − β Xi ] 2
Var Yi = E[Ui ] 2 = σu 2 - - - - - - - - - - - - - - - - - - -2.44
From equation no 2.43 we have the following
Var ( β̂ ) = ki 2 Var ( Yi )

29
From equation no 2.44 Var (Yi) = σu 2
Var ( βˆ ) = ki 2σu 2
2
xi
= σu 2 ki 2 - - - - - We know that ki 2 = E
xi 2
2
xi
= σu 2

xi 2
xi 2
= σu
( xi 2 ) 2
1
Var ( ( βˆ ) = σν 2 - - - - - - - - - - - - - - -- - - - - - -2.45
xi 2
σν 2
Now we can say that β̂ has a mean of β & Variance of
xi 2
σu
(β ,
Then β̂ ~N xi

• Mean of α̂
xiyi
From Equation no 2.14 we have αˆ = yˆ − βˆxˆ a gain from equation 2.19 βˆ = & from
xi 2
equation number 2.38 we have β̂ = kiyi substituted
αˆ = y − ( kiYi) x
= y−x− kiYi
Yi
= −x kiYi
n
Take Yi & as common b/c x & ki are constant
1
α̂ = − x ki Yi - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - 2.45
n
Take the expected Value of α̂ since n, x & ki are constant the expected value of a
constant is a constant it self.
1
E (αˆ ) = − x ki ε (Yi) W know that
n
E (Yi) = E (α + β xi + Ui)
= α + β xi + ε (νi )
By definition E (ui ) = 0 then it will be α + β xi
1
E (αˆ ) = − x ki (α + β Xi )
n
α 1
=ε − αx ki + β X + β Xix ki
n n

30
nα xi
= − αx ki + β − β x i kixi ]
n n
= α − αx ki + β x kiXi
We know that ki = 0 & kixi = 1
= α + βx − β x
E (αˆ ) = α - - - - - - - - - - - - - - - - - - - -- - - - - - -- -- - 2.46

• Variance of α̂
1
Var (αˆ ) = E[α − E (αˆ )] 2 we know that from equation no 2.45 that αˆ = ε − x ki yi
n
then substitute it
1
Var (αˆ ) = Var E[ − x ki ]Yi
n
Since n, x & ki are constant number their variance is constant.
2
1
Var (αˆ = − x ku var Yii
n
We know that Var Yi= σu 2 from equation no 2.43 substitute in place of Var Yi. Then we will
have.
2
1
Var (αˆ ) = σu 2 − x ki
n
1 1
= σu 2 2
+ x 2 ki 2 − 2 x ki
n n
We know that the summation over a constant number is equal to multiplying the constant number
by n
n 2x ki
= σu + x2 ki 2 −
n2 n
We proved that ki = 0
2
xi xi 2 1
Again ki =
2
= =
xi 2 ( xi 2 xi 2
Substitute these values
n 1 − 2 x ( 0)
= σu 2 + x2 −
n2 xi 2
n
1 x2
= σu 2 + If you take the common denominator n xi 2
n xi 2
xi 2 x 2 n
Var (αˆ ) = σu 2
n xi 2
We proved that xi 2 = ( Xi − x ) 2 = xi 2 − nx 2 (you can refer the devation from equation
no 2.40)
We can summarize the mean & variance of the variables as follows

31
[
Yi ~ N (α + β xi ), σu 2 ]
xi 2 + x 2 n
αˆ ~ N α , σu 2
n xi 2

ˆ σu 2
β ~ N β,
xi 2
• Standard Error (S.E) values of αˆ & βˆ
S.E is the square root of variance

var(βˆ ) = σˆ
2
S.E ( βˆ ) = − − − − − − − − − − − − − (2.47)
xi 2
X2
S.E ( αˆ ) = var(αˆ ) = σˆ 2 − − − − − − − − − − − − − − (2.48)
n xi 2
ei 2
Sine σˆ can not be easily computed it will be substituted by
2
Equation 2.47 &
n−2
2.48 can be written as follows

ˆ ei 2 1
S.E ( β ) = .
n−2 xi 2
ei 2 . X2
S.E ( αˆ ) =
(n xi 2 )(n − 2)

• Standard error test of the least square estimates

From the sample observation of Y&X we can estimate or obtain the value of αˆ & βˆ . Since the
observations are sample taken from the population then due to sampling errors the estimates may
not truly explain the population. Then it is necessary to apply test of significance in order to
a) Measure the size of the error committed
b) Determine the degree of confidence that the estimates will lie.
Though there are other methods used for the purpose, we use standard error test which is very
popular in many econometric research. The standard error test (S.E) will help us to determine
whether the estimates αˆ & βˆ are coming from a population whose parameters are zero called the
null hypothesis.
Ho = β i=0
This means there is no relation ship between the dependent & independent variables.
Or the samples from which αˆ & βˆ are coming from a population whose parameters are different
from zero are known as alternative hypothesis
H1 = β i ≠ 0
This means there is a relationship between the dependent & independent variables.
• Interpretation of S.E

32
Once we obtain the standard deviations of αˆ & βˆ using equation number 2.47 & 2.48
respectively& by comparing these estimated values with their s.e we can interpret the results as
follows.
Economic interpretation
If we have Yi = αˆ + βˆxi
Case A) Acceptance of the null hypothesis means
Accept Ho = β i=0
βˆ
Reject the alternative that H1 = β i ≠ 0. This will occur if S.E ( βˆ ) >
2
It means the estimation are statistically insignificant or
- We accept the null hypothesis that the true parameter β is zero or
- The independent variable is insignificant.
- The sample parameter do not explain the population parameter or
- The independent variable do not influence the dependent variables (no relationship
between the dependent & independent variables)
Then the above equation will be written as
Yi = α̂
Because the value of Ho = β̂ is not different zero or the slope of the line is zero
i. Again in case of the intercept α̂
If we accept the null hypothesis HO= α =0 & reject the alternative H 1 = α ≠ 0 .
αˆ
This will occur if S.E ( αˆ ) >
2
And the meaning of α̂ is statistically insignificant or the equation will not have intercept or the
intercept is not differed from zero. Then
Yi = β̂ xi Since αˆ = 0
Case B) Reject the null hypothesis & accept the alternative
i. Reject Ho = β̂ 0 =0 & accept the alternative that H1 = β1 ≠ 0 .
βˆ
This will occur if S.E ( βˆ ) <
2
It means
- The estimates are statistically significant
- The estimates are significantly different from zero
- The independent variables will influence the dependent variable or there is a relation ship
between the dependent & independent variables.
ii. Reject HO= α̂ =0 & accept the alternative that H 1 = αˆ ≠ 0 . This will be observed if
αˆ
S.E( αˆ ) <
2
Again it means the equation will have intercept or α̂ is statistically significant

• Geometric interpretation of the S.E. test

αˆ
1) If we find that if S.E ( αˆ ) > we accept he null hypothesis that HO= α̂ =0 & reject the
2
alternative H 1 = αˆ ≠ 0 . In this case if we have the equation Yi = αˆ + βˆxi since αˆ = 0 we

33
will write this equation as Yi = β̂ xi and we do not have the intercept. Then the equation will
pass through the origin (fig. a)

Yi = β̂ xi Yi = αˆ + βˆxi

0
X
0 0
Fig (a) fig. (b)
αˆ
2) If we find S.E. ( αˆ ) < we will reject the null hypothesis that HO= α̂ =0 & accept the
2
alternative H 1 = αˆ ≠ 0 . Then in this case we will have intercept term & the equation
will be Yi = αˆ + βˆxi (fig. b)
βˆ
3) If we find that S.E ( βˆ ) > we accept the null hypothesis that Ho = β̂ 0 =0 & reject
2
the alternative that H 1 = β1 ≠ 0 . Then the equation will be Yi = α̂ because βˆ = 0 set
Y

Yˆ = αˆ + βˆxi
Y=α̂i

0 0 X
X Fig. d
Fig. c

34
βˆ
4) If we find that S.E ( βˆ ) > we reject the null hypothesis Ho = β̂ =0 & accept the
2
alternative that H = βˆ ≠ 0 . Then the equation will be Yˆ = αˆ + βˆxi
1 1

• The student t- test

The second statistical test, next to S.E. will be under taken using t-test. This t-test will be
applicable if the sample size is less than 30 & the population parameters distribution is normal.
To apply t-test we follow the following steps.
i. Define the null hypothesis & alternative hypothesis
The null hypothesis Ho = β̂ i =0 Alternative hypothesis = H1 = βˆ1 ≠ 0 .
ii. Choose the level of significance (5% or 1%)
Level of significance shows that rejecting a null hypothesis when it is true [committing type I
error). Or choosing a certain level of probability with which we would be willing to risk error
Type I is called significance level.
Ex. If the level of significance is 5%, then there are 5 chances out of 100 that we would reject the
null hypothesis (i.e. you commit error & reject Ho = β̂ i =0 when it is correct & accepted)
OR we are 95% confident that we have made the right decision; & only with 5% of probability
that we might have done wrong.
iii. Define the number of degrees of freedom (d.f.). i.e. N-K
N = total number of sample size
K = number of estimated variables (αˆ & βˆ )
The d.f = N-K
βˆ
t=
S .E ( βˆ )
βˆ
a) If t > we reject the null hypothesis Ho = β̂ 0 =0 & accept the
S .E ( βˆ )
alternatives, i.e. H = βˆ ≠ 0 .
1 1
OR t= falls in the critical region
βˆ
b) If t < we accept the null hypothesis Ho = β̂ 0 =0 & reject the
S .E ( βˆ )
alternatives H1 = βˆ1 ≠ 0 .

OR t= falls in the acceptance region

35
Ho = β̂ 0 =0
H1 = βˆ1 ≠ 0 . Acceptance region
H1 = βˆ1 ≠ 0 .
Rejection region
Rejection region

-2.228 0 + 2.228
Figure (e)
• Interpretation of t test
βˆ
If t > it means reject the null hypothesis Ho = β̂ 0 =0
S .E ( βˆ )
Accept t the alternative H1 = βˆ1 ≠ 0
The estimated values are statistically significant or
The sample observation can explain the population parameters.
There is a relation between the dependent & independent variables.
Then the t-value will lie in the critical region (shaded) Suppose if we chose 5% level of
significance and the total number of sample size is 12 & the number of estimated variables are
~
( αˆ & β ) only two. Then the d.f will be 12.-2=10. Again if we are undertaking two tall test then
5%
we will have 0.025 in each side (see figure e)
2
Then to find the t-value at 5% (0.25 each side) & 10 d.f. We can read from the t-table as follows
a) In the first raw find 0.025
b) In the first column find 10 then at the intersection point of the raw & the column
you will get t-table value 2.228.
βˆ
Again if t < it means accept Ho = β̂ 0 =0
S .E ( βˆ )
Reject the alternative hypothesis H = βˆ ≠ 0 1 1
The estimated values are statistically insignificant
No relationship between the dependent & independent variables
The sample observations do not explain the population parameters.
Then the t-table value will lie in the acceptance region

36
• Simple inspection to calculate t-test
βˆ
At 5% level of significance we can follow the following rules. Calculate t-values by .
S .E ( βˆ )
Then if this value is greater than +2 or less than -2 we reject the null hypothesis Ho = β̂ 0 =0 &
accept the alternative H 1 = βˆ1 ≠ 0 . And the t-value will lie in the critical region. If calculated
βˆ
t= smaller than +2 & greater than -2 or (if -2<t<2) we accept the null hypothesis &
S .E ( βˆ )
reject the alternative.

• Confidence intervals for αˆ & βˆ

When we accept the alternative hypothesis that H 1 = βˆ1 ≠ 0 & reject the null hypothesis that Ho
= β̂ 0 =0. It doesn’t mean that our estimates β̂ is the correct estimate of the population
parameter β̂ . It simply means that our estimates come from a population whose parameters are
different from zero (there is a relation ship between the dependent & independent variables).
Then by constructing confidence intervals we can define how our estimates are closer to the true
population parameters. Interpretation of confidence intervals. If we have 95% confidence level.
i. In the long run 95 out of 100 cases will contain the true parameter β i in the limit or
intervals. OR
ii. We are 95% confident that the unknown population parameters ( β i ) will lie with in
the limit /interval. OR
iii. In 5% of the cases the population parameter will lie outside the confidence limits.
But it doesn’t mean that or we can not say that the confidence interval contains the true
population parameter ( β i ). Because the probability contains specific fixed interval is either 1 or
0. Confidence interval from the student t- distribution

βˆi − β
t= − − − − − − − − − − − − − − − − − − − − − − − − − − − −2.48
S ( βˆ )
With in( n-k) df.
β̂ = estimated value from the sample
β = is the population parameter
If we choose any confidence level say 95%. We find from the t-table i.e the value
+ t0.025 with d.f of (n-k).
P {-t0.025<t<t+0.025} = 0.95-------------------------------------------2.49
Substitute equation number 2.48 in equation number 2.49
βˆi − β
P{-t0.025 <t0.025} = 0.95
S .E ( βˆ )
Multiply both sides by S.E. ( βˆ )
P{-t0.025 (S.E. ( βˆ ) ) βˆ − β <t0.025.S.E. ( βˆ ) } = 0.95
βi = βˆ ± t 0.025S .E.( βˆ ) − − − − − − − − − − − − − − − − 2.50

37
This confidence interval shows that the unknown population parameter β i will lie with in the
defined limit 95 times out of 100. OR we are 95% confident that the unknown population
parameter β i will lie with in this limit.
Ex.1.
In Table 2.1 the investment expenditure and the long run interest rate over the ten year period is
given.Test the hypothesis that investment is interest elastic by fitting regression line to the data
given & conducting the relevant test of significance. To answer the question we follow the steps
of econometric methodology.
i. The economic theory states that “investment is interest sensitive.”
ii. Mathematical model of the theory
a) Selecting variables:- Investment is the dependent variable (endogenous) and interest rate
is exogenously determined (independent variable)
b) A priori expectation of the sign of the parameters:- There is a negative (inverse
relationship) between Investment & rate of interest
c) Magnitude of the parameter:- The theory says that investment is interest elastic then the
coefficient is greater than one.
d) Specification of the model:- This will be a single equation model which investment will
be affected by rate of interest but investment will not affect rate of interest. The
relationship between investment and rate of interest is assumed to be linear (This will be
determined by the researcher).
Then
Yt = α + β Xt
Where Yt= is the level of interest
Xt = is rate of interest, β >1 & α is the intercept
iii. Specification of the Econometric model.
There are other variables which will affect investment these are marginal efficiency of capital
(MEC), saving, consumption, political stability etc. Since these & other variables are not
incorporated in the mathematical model of investment function we can capture these
variables by incorporating a random term Ui in our model.

Yt = α + β Xt +Ui
Then by adding the random (error, or stochastic or disturbance) term Ui we convert the
mathematical (exact) relationship between investment & rate of interesting in to in exact
relationship of Econometric models.
iv. Obtaining data:- A sample of 10 years observation data are given to estimate the
model & the type of data are time series data.
v. Estimation of the econometric model
The economic relationship is explained using a single equation model then the most
appropriate method of estimating this equation is OLS method. To estimate this model we use
table 2.1 & obtaining the following results in deviation forms.
yixi = −1.42 xi 2 = 0.00064
yi 2 = 25826.4 ei 2 = 25,654
yi xi
= Y = 753.4 = x = 0.056
n n
xiyi
From equation number 2.18 we have βˆ = then
xi 2

38
− 1.424
βˆ =
= −2225
0.00064
αˆ = Yˆ − βˆxˆ From equation number 2.14
αˆ = 753.4 − (−1.424 x0.056) = 878
Then the estimated regression line will be
Yˆ = 878 − 2225 Xi
vi. Evaluation of the estimates:-
ˆ
After estimation & obtaining values of the coefficient ( αˆ & β ) of the variable we should have to
evaluate the results obtained using Econometrics method
a) Economic interpretation of the results:- at this stage check whether the obtained results
are economically meaningful or not.
Yˆ = 878 − 2225 xi
i. If interest rate is zero i.e Xt =0 then Yˆ = 878 means if the interest rate is zero investment
will be equal to 878 birr. This is the interpretation of the constant term α̂ in case of
linear equation
ii. β̂ i Indicates that if rate of interest increase/decrease by 1 birr then investment will
decrease/increase on the average by 2225 birr. Therefore it passes the economic criterion
because it explains the inverse relationship between investment & interest rate.
b) Since the model passes a prior economic criterion, the next step is to test the reliability of
the estimated parameters using statistical tests using r2, & S.E tests.
i. the correlation coefficient test r2
To estimate r2 we can use the formula of 2.31-2.35 let’s us
xiyi
r 2 = βˆ
yi 2
We know that β̂ = -2.225 xiyi = −5.424. yi 2 = 25,826.4
− 2.225(−1.424)
r2 = = 0.123
25.8264.4
This means 12.3% of the change investment is accounted (explained) by interest rate &
the remaining 87.71, is not explained by rate of interest but by some other factors
represented by Ui in our model.
ii. Standard error test:-
x2
S.E ( αˆ ) = σˆu 2 2
since σˆu 2 can not be obtained we can
n xi
ei 2
approximately measure using
n−2
2
ei xi 2
S.E( αˆ ) =
n−2 n xi 2
From our estimation we obtain the following values ei 2 = 22657
Xi 2 = .032 n-2=10-2=8 xi 2 = 0.00064

39
(22658) (0.32) 90.632
S.E( αˆ ) = = = 17701.563 = 133.046
(8) 8(0.00064) 0.00512
Again S.E ( ( βˆ ) can be obtained using equation number 2.47
ei 2 22658 22658
S.E ( ( βˆ ) = = = =2103.661
n−2 xi 2
8(0.00064) 0.00512
Having obtained the value of S.E (αˆ ) and S.E ( ( βˆ ) we can undertake the S.E Test using
hypothesis testing.
αˆ
Test of S.E (αˆ ) : If > S .E (αˆ ) then we can reject the null hypothesis Ho= αˆ = 0 &
2
accept the alternative that H1= αˆ ≠ 0 .
αˆ = 878 & S .E (αˆ ) = 133.046
αˆ 876
= = 439> 133.046
2 2
αˆ
Then > S .E (αˆ ) we can conclude that rejecting the null hypothesis & accept the
2
alternative.
β
Test of S.E ( βˆ ) : If S .E ( βˆ ) < we can accept the null hypothesis and reject alternative.
2
β 2225
S .E ( βˆ ) - 2103.661 & = = 1112.5
2 2
β
Since S .E ( βˆ ) > we accept the null hypothesis & reject the alternative
2

The student t- distributions

Given the 5% level of significance level we can have the following approximate measurements of
t-test
887
t= = 6.66 since t-value is greater than 2 we can reject the null hypothesis & accept
133.046
the alternative
βˆ
t=
S .E ( βˆ )
Again if t> + 2 we reject the null hypothesis & accept the alternative
− 2225
t= = −1.058 since the t value is greater than -2 then we accept the null hypothesis &
2,103.661
reject the alternative. To summarize we can writ the results as follows
Yˆt = 887 - 2225xt
S.E. (133.046) (2103.66)
t (6.66) (-1.058)
The equation passed the economic or a priori criterion but we found different results in statistical
test.
a) α̂ Satisfied the statistical test because

40
α̂
S.E. (αˆ ) < & also t>2 we can say that we reject the null hypothesis H 0 = αˆ = 0 & accept
2
the alternative H 1 = αˆ ≠ 0 . It means that the equation should have an intercept term.
b) βˆ Does not satisfies the statistical test because
β̂
S.E. β̂ > & which means acceptance of the null hypothesis that H 0 = βˆ = 0
2
βˆ is statistically insignificant
Investment (Y) is not affected by the change in the interest rate or investment is not
interest sensitive.
No relation ship between investment and interest rate because βˆ =0
And & reject the alternative H = βˆ ≠ 0 means βˆ is not different from zero.
1
Again this is supported by the t-test. Since t= -1.058 & which is greater than -2 or t is found
between + 2 then we can conclude that we accept the null hypothesis H 0 = βˆ = 0 and
- The estimates β̂ is statistically insignificant
- β̂ is not different from zero
- No relationship between investment & interest rate.
Example 2
Given the data on table 2.2. fit the data and estimate the income elasticity.

41
Table 2.1

Invest- Rates of
ment interest (Yi−Y) (Xi−x) ei =

Year Yi Xi Yi Xi yi2 xi2 yixi Estimated Yˆi yi - Yˆi ei2

1958 656 0.05 -97.4 -0.006 9476.76 0.000036 0.5844 766.75 -110.75 1265.56

59 804 0.045 50.6 -0.011 2560.36 0.000121 -0.5566 777.875 26.125 682.5156
60 836 0.045 82.6 -0.011 6822.76 0.000121 -0.9086 777.875 58.125 3378.516
61 765 0.055 11.6 -0.001 134.56 0.000001 -0.0116 755.625 9.375 87.89
62 777 0.06 23.6 0.004 556.96 0.000016 0.0944 744.5 32.5 1056.25
63 711 0.06 -42.4 0.004 1797.76 0.000016 -0.1696 744.5 -33.5 1122.25
64 755 0.06 1.6 0.004 2.56 0.000016 0.0064 744.5 10.5 110.25
65 745 0.05 -6.4 -0.006 40.96 0.000036 0.0384 766.75 -10.75 390.0625
66 696 0.07 -57.4 0.14 3294.76 0.000196 -0.8036 722.25 -26.25 689.0625
67 787 0.065 33.6 0.009 1128.96 0.000081 0.3024 733.375 53.625 2875.641

Yi Xi
2 2
=7,534 =0.56 yi xi yi 2 xi 2 yixi yˆ i ei ei
Yˆ =753.4 X =0.056 =0 =0 =25826.4 =0.00064 =-1.424 =7534 =0 =22658

42
Table 2.2

Log Y-

Consum Income (Logy) log Yˆ log Yˆ =

Year ption Xt Log Yt LogXt Log yt Log xi Log y2 Log xi2 (logx) estimated (e) ei2
Yt

0
58
8 17 0.9 1.23 -0.317 -0.399 0.1004 0.1596 0.1286 0.9163 -0.0132 . 000176

59 12 27 1.079 1.431 -141 -0.1986 0.0198 0.0394 0.02797 1.069 0.0101 0.000103

60 15 36 1.176 1.556 -0.044 -0.0737 0.00193 0.00543 0.00323 1.164 0.0121 0.000146

61 18 46 1.255 1.663 0.0352 0.0327 0.001244 0.00107 0.00115 1.245 0.01037 0.000108

62 22 57 1.342 1.756 0.1224 0.1258 0.0149 0.0158 0.0154 1.316 0.0267 0.000716

63 23 67 1.361 1.826 0.1417 0.196 0.02 0.0384 0.0278 1.369 -0.00729 0.0000531
64 26 81 1.415 1.908 0.1949 0.278 0.038 0.0776 0.0543 1.431 -0.01668 0.000278
2
logxi
log X logyi2 logYi logŶ ei 2
logYi =11.37
=0.1965 =0.3374 logx =8.51 =0.00158
8.53 Y X =0.2564
=1.22 =1.624

43
Exercise for chapter two

1) Given the following table about the relationship between corn production & fertilizer utilization

Corn 40 44 46 48 52 58 60 68 74 80

Fertilizer 6 10 12 14 16 18 22 24 26 32

If the relation ship between the production of corn & fertilizer is linear.
a) Estimate the parameters coefficients, standared errors & test the significance of the
parameters?
b) What variations of the corn production are explained by fertilizers?
c) Interpreate the results obtained from the linear relationship? Had the relationship was non
linear what would be the interpretation of the cofecients?
2) Suppose that Mr.A is estimating a consumption function and obtain the following results
Yˆ = 15 + 0.81xi .

t (3.1) (18.7)

Given that Y is consumption & X is disposable income. If r2 is 0.99 & n is 19

a) Test the significance of X statistically using t-ratios.

b) Determine the estimated standared error of the parameters

c) Construct a 95% confidence interval for the coefficient of disposable income

44
Chapter 3: Properties of the least square Estimates

3.1 The Gauss-Markov Theorem

Given the assumptions of CLS model, the least square estimates obtained from a sample possess some
optimal properties. These are the least square estimators are linear & unbiased estimators. These are also
the best of linear unbiased estimators; it means the estimators have minimum variance with in the classes
of linear unbiased estimators. This theorem is sometime referred to as the BLUE theorem. An estimator of
OLS said to be BLUE (best linear unbiased estimator) if the following holds true.
a) It is linear, if both the sample parameters are a linear function of the dependent
variable. αˆ & βˆ are a linear function of Yi
b) It is unbiased if the expected value of the sample parameters is equal to the true
value of the population parameter.
E (αˆ ) = α E ( βˆ ) = β i
Where αˆ & βˆ are sample parameters, and αˆ & βˆ are the true population parameter & E means
expected or average values.
It has minimum variance i.e. if the variance of αˆ & βˆ has smallest variable
c)
as compared to any other variance obtained from other econometric methods.
Gauss Markov Theorem: - Given the assumption of CLS model, the least square estimators satisfies the
properties of BLUE

1) The property of linearity

From equation 2.19 we have value of β̂
xiyi
βˆ = And from equation number 2.38 we proved that
xi 2
xi
β̂ = kiYi Where ki =
xi 2
From the above of X is constant in the sample & ki is constant then
β̂ = kiYi i.e. an estimate β̂ 2 is a linear function of Y'
s or β̂ is a linear function of the values of
the dependent variable.
By the same analogy equation 2.14 we have
αˆ = Y − βˆx
Substitute equation no 2.46 in place of β̂
α̂ = Y − x kiYi
Yi
α̂ = −x kiYi
n
1
α̂ = Yi − x kiYi
n
1
α̂ = − x ki Yi
n
1
We know that , x & ki are constant then α̂ is a linear function of the dependent variable Yi
n

45
Thus both αˆ & βˆ are expressed as linear function of the Y'
s.
2) Unbiased ness:- The bias of the estimation is defined as the difference between its expected
value & the true parameter.
Bias =E ( βˆ ) − β
If the estimation is unbiased its bias is zero
i.e. E ( βˆ ) = β we proved this in the previous page equation number 2.42
Again bias =E (αˆ ) − α if the estimation is unbiased its bias is zero E (αˆ ) = α Also we proved this & you
can refer equation number 2.46
3) The minimum variance property (best estimator)
The property of minimum variance is the main reason for the popularity of the OLS method. Best in
this sense means definitely superior. One should know that when we say OLS estimator is best estimator
it will have a minimum variance as compared to any estimators obtained using other econometric
methods such as 2SLS, 3SLS. Maximum Likely hood estimators etc.

3.2 Importance of the BLUE Properties:-

One may ask that why do we attach so much importance to the BLUE properties of the OLS estimates.
But the reasons are
a) The property of linearity is desirable because it facilitates the computation of the estimates.
b) The property of unbiased ness by itself is not important.If we have a very large number of
samples, the estimators of the parameters obtained from these samples will on the average give
the true value of β 's
c) Best (minimum variance as compared to others). The least square variance property will be
desirable if it is combined with small bias because an estimate may have zero variance and yet
have an enormous bias. The importance of this property is apparent when we want to apply the
standard test of significance for αˆ & βˆ and to construct the confidence intervals for estimates.

3.3 Maximum Likelihood Estimation

A concept frequently utilized in econometrics is that of maximum-likelihood. The basic important
concept of maximum likelihood is the fact that different statistical populations generate different samples.
i.e. any one sample being scrutinized is more likely to have come from some populations rather than from
others.
Ex.1
If one where sampling coins tosses & a sample mean of 0.5 were obtained (representing half heads or
half tails), it would be clear that the most likely population from which the sample were drawn would be a
population with a true mean of 0.5
Ex.1.
Assuming that [X1,X2,X3,X4,X5,X6,X7,X8] are drawn from a normal population with a given variance but
unknown mean. Again assume that these sample observations are drawn from distribution A or
distribution B.

46
Distribution A Distribution B
Probability

x6 x2 x3 x4 x5 x8 x7 x1

Given the distribution A & B. then if the true population were B, then the probability that we would have
obtained the sample shown would be quite small. But if the true population were A then the probability
that we would have drawn the sample would be substantially large. Select the observation from
population A as the most likely to have yielded the observed data.We define the maximum likelihood
estimator of β as the value of β̂ which would most likely generate the observed sample observations
Y1, Y2, Y3 --- Yn. Then if Yi is normally distributed & each of Y' s is drawn independently then the
maximum-likelihood estimation maximizes. P(Y1) P(Y2) . . . P(Yn)Where each P represents a probability
associated with the normal distribution.P(Y1) P(Y2)--- P(Yn) is often referred to as the likelihood function.
The likelihood function depends up on not only on the sample values but also in the unknown parameters
of the problems (αˆ & βˆ ) . In describing the likelihood function we often think of the unknown parameters
as varying while the Y' s (dependent variables) are fixed.This seems reasonable because finding the
maximum likelihood estimation involves a search over alternative parameter estimators which would be
most likely to generate the given sample. For this reason the likelihood function must be interpreted
differently from the joint probability distribution. In the latter case the Y' s are allowed to vary & the
underlying parameters are fixed & the reverse is true in case of maximum likelihood. Now we are in a
position to search for the maximum likelihood estimators of the parameters of the two variable regression
models.

Yi = α + βxi + ui − − − − − − − − − − − − − − − − − − − − − − − 3.1
We know that Yi ∼N (α + βxi, σ 2 )
I.e Y is normally distributed with mean (α + βxi ) & variance σ 2
Assume that all the assumptions of least squares & further assume that the disturbance term has normal
distribution. Will the estimators (α + β ) different from the least square estimators? Will such estimators
possess the desirable properties? In our model Yi = α + β Xi + Ui sample consists ' n'observations in
Y&X. Then we will have a mean value of (α 1 , β1 , x1 ) , (α 2 , β 2 , x 2 ) , (α 3 , β 3 , x3 ) ... (α n , β n , x n ) but a
common variance of σ 2 . Why we will have a different mean but a constant variance? The reason is

47
simple that a random variable X assumes different value with a probability of density function of f(x) but
fixed values of Yi. The joint probability density function can be written as a product of n individual
density function
(
F [Y1 , Y2 , Y3 ...Yn / α 1 + β 1 x1 , σ 2 ] = f Y1 / α 1 + β 1 x1 , σ 2 f Y2 / α 2 + β 2 x 2 , σ 2 )( )
( ) (
f [Y3 / α 3 + β 3 x3 , σ 2 ] f Y4 / α 41 + β 4 x 4 , σ 2 ... f Yn / α n + β n x n , σ 2 − − − −3.2 )
This probability distribution may be written as follows

F (Y ) = 2Π σ 2 ( 1
2
)
(Yi − α − βxi )2 ... f y n / α n + β n xn , σ 2 − − − ---------------3.3
1
2
Exp ( )
Where exp. denotes exponential function. Then the likelihood function
L[Y1 , Y2 ...Yn , α , β , σ 2 ] = P (Y1 )P (Y2 ) − − − P (Yn − 1) P (Yn )
Let the likelihood function is represented by L
2 2
1 Y1 − α 1 − β 1 xi 1 Y2 − α 2 − β 2 x 2
L = (2 Π σ 2
) − 1
2
ex −
2 σ
(2Πσ ) 1
2 − 2
exp −
2 σ
2 2
1 Y3 − α 3 − β 31 x3 1 Y4 − α 4 − β 4 x 4
* ( 2Π σ 2
) −1
2 exp −
2 σ
(2Πσ ) 1
2 − 2
exp −
2 σ
−1 n
1 Yn − α n − β n x n
( )
2
. . . 2Π σ 2
exp − − − − − − − − − − − − − − − − −3.4
2 σ
Sum it over i.e. from 1 up to n.
−1 2
1 Yi − α i1 − β ixi
( ) n
2
L = 2Π σ ) exp − 2 n
− − − − − − − − − 3 .5
2 σ
If you represent the N product factor by Π
2
1 Yi − α − β iXi
−1

N (2Πσ )
^ 2
L= 2
exp − − − − − − − -----------------------------3.6
i =1 2 σ
The value of Y1,Y2 ...Yn are given but the value of α & β , σ 2 are not known then this
function can be called likelihood function and denoted by Lf (α , β & σ 2 )
2
1 Yi − α − βixi 1
Lf (α , β ,σ ) = N (2Πσ ) exp −
2 2 2
− − − − − − ------------------3.7
2 σ
And the equation can be written as
2
1 Yi − α − β ixi 1
Lf (α , β , σ ) = N 2
exp − − − − − − − -------------------3.8
2Π σ 2 2 σ
Take log of L
−n n
ln Lf = log 2Π − log σ 2 − 1 2
2 2 2σ [Yi − εi − βxi]2 − − − − − − − − − − − − − − − 3.9
Differentiate w.r.t. α , β & σ 2
setting these with equal to zero.
∂ ln Lf 1
=− (Yi − α − β Xi )(−1) = 0 − − − − − − − − − − − 3.10
∂α 2σ 2

48
∂ ln Lf 1
=− (Yi − α − β Xi )2 Xi = 0 − − − − − − − − − − − − − −3.11
∂β 2σ 2
∂ ln Lf n 1
=− + (Yi − α − β Xi )2 Xi = 0 − − − − − − − − − − − − 3.12
∂σ 2
2σ 2
2σ 4
Equation number 3.10 will be equal to
(Yi − α − βXi )2 Xi = 0 − − − − − − − − − − − − − 3.13
Since − 1 σ 2 = 0
2
Equation no 3.11 will be again equal to
(Yi − α − βXi )( Xi ) = 0 − − − − − − − − − − − − − − − − − 3.14
Since 1 σ 2 = 0
2
Substitute equation 2.13 in to equation 3.14 and we will have
~
Yi = nα~ + β xi − − − − − − -------------------------------------3.15
~ ~
YiXi = N X +β Xi 2 − − − − − − − − − − − − − − − −3.16
The two equations again will give us the same normal equation OLS. From equation number
3.12 we can to obtain the value of σ 2
−n 1
+ = Yi − α − βXi ) 2 =0
2σ 2
2σ 4

1
Since 4 = 0 we left with
2σ
−n
+ (Yi − α − βXi) 2 = 0
2σ 2

n
(Yi − α − βXi ) 2 =
2σ 2
σ 2 (Yi − α − βXi) 2 = n
1
σ2 = (Yi − α − β Xi ) 2
n
1
σ2 = (Yi − αˆ − βˆXi ) 2
n
We know that Yˆ = βˆ − βˆXi
Then Yi- Yˆ − Ui or Yi = α + β Xi + Ui
Yi = αˆ + βˆXi Then Yi- − Yˆ = Yi = α + β Xi + Ui
Therefore
1
σ2 = u 2 − − − − − − − − − − − − − − − − − -3.17
n
ei 2
The ML estimation is different from OLS estimator of OLS. The variance was in OLS
n−2
1
but it is ui 2 in case of ML.Thus the variance of ML is biased estimator of σ 2 but it is
n

49
unbiased in the case of OLS. But as the sample size increases the ML variance converges to the true
population variance.

Exercise for chapter three

a) What are the properties that make the OLS estimators BLUE?
b) What are the important properties of BLUE as compaired to other
methods of Econometrics?

50
Chapter 4: Multiple Linear Regression

In a simple regression we study the relationship between the dependent variable (Yi) & only one
independent (explanatory) variable Xi. In this simple regression analysis, for example the Quantity
demanded (Y) is depend up on the price of the product alone other things being constant (absorbed by
Ui). But in a multiple regression analysis when a dependent variable (Yi) is depends upon many
explanatory variables (independent variables).
Ex. If quantity demanded is a function of price of the product, price of other goods (substitute &
complementary goods) income, wealth, previous year income consumption behavior etc. For the sake of
simplicity let’s consider a three variable case

Yi = α + β1 X 1 + β 2 X 2 i + Ui − − − − − − − − − − − − − 4.1
Where Yi is dependent & X1 & X2 are independent (explanatory variables) & Ui is the stochastic
disturbance term. Alternatively we can write this equation as follows.

Yi = α 1−12 + β 11− 2 X 1i + β 12.−1 X 2 + Ui − − − − − − − − − 4.2

• α1.12 Measures the constant term & measuring the average value of Y when X1 & X2 are
zero.
• β11.− 2 Measures the change in Y for a unit change in X1 alone (the effects of X1 on Y
given that X2 is constant.)
• β12 −1 Measures the change in Y for a unit change in X2 alone (the effects of X2 on Y
given that X1 is constant.)
The value of β11.− 2 & β12 −1 are called partial repression coefficients or marginal coefficients but not
elastic ties & these value measures only the average value of Y because the regression line is a linear
regression line. But if the regression equation is non linear like the following Cob-Douglas production
function

Yi = αLi k e − − − − − − − −4.3
B1 B2 u

Let Y= output, L is labor & K is capital and α 1 , β1 & β 2 are parameters then to estimate this equation
first transform in to linearity using log.

Log Yi= log α 1 + β1 log L + β 2 log k + Ui ------------------------4.4

Then here β1 & β 2 measures elastic ties not average values. In order to find the value of α & β we use
all OLS assumptions regarding the disturbance term Ui.

E (Ui ) = 0, E (Ui ) 2 = σ 2 R(Ui,Uj)=0

E (UiXi ) = E (UiXj ) = 0
In addition to this we assume further that there is no exact linear relationship between explanatory
variable X1& X2 (no multi collinarity between X1 & X2)
Given n observations on Y1, X1 & X2 our problem is to estimate the value of βˆ1 , βˆ2 & αˆ . Then we apply
OLS to obtain their estimates βˆ1 , βˆ2 & αˆ . In the simple linear regression model we have already defined
that

51
ei = Yi − Yˆ − − − − − − − − − − − − − − − − − − − −4.5
Yi = αˆ − βˆ1 X 1 − βˆ 2 X 2
∴ ei 2 = (Yi − αˆ − βˆ X − βˆ X ) 2 1 1 2 2

Differentiate w.r.t. to βˆ1 & βˆ & αˆ setting these values equal to zero.

∂ ei 2
= −2 Yi − αˆ − β 1 X 1i − βˆ 2 X 2 i ) = 0 − − − − − 4.6
∂αˆ
∂ ei 2
= −2 X 1i[Yi − αˆ − βˆ1 X 1i − βˆ 2 X 2 i ] = 0 − − − − − 4.7
∂βˆ 1

∂ ei 2
= −2 X 2 i[ yi − αˆ − βˆ X 1i − βˆ 2 X 2 i ] = 0 − − − − −4.8
∂βˆ 2
From the above equations we obtain the followings normal equations

Yi = nαˆ + β 1 X 1i + βˆ 2 X 2 i − − − − − − − − − − − − − − − −4.9
YiX 1i = αˆ X 1i + βˆ1̂ X 1i 2 + βˆ 2 X 2 i X 1i − − − − − − − − − 4.10
YiX 2 i = αˆ 1 1̂ 1 X i + βˆ
2 2 X i X i + βˆ
2 X i − − − − − − − − − − − −4.11
From equation number 4.9 we have
α̂ = Y − βˆ1 x1 − βˆ 2 x 2 − − − − − − − − − − − − − − − − − − − − − − − 4.12
By substituting these values in α̂ in to equation 4.10 & 4.11 yields
YiX 2 i = X 1i (Y − β 1 x1 − βˆ 2x 2 ) + βˆ1̂ X 1i + βˆ 2 X 2 i X 1i − − − − − − − −4.13
From equation number 4.12 & 4.13 by rearranging we obtain
( X 1 i − x1 )(Yi − Y ) = βˆ1 ( X 1i − x1 ) 2 + βˆ 2 ( X 1i − x1 )) X 2 i − x 2 )
2
( X 2 i − x 2 )(Yi − Y ) = βˆ1 ( X 1i − x1 ) ( X 3i − x3 ) + βˆ 2 ( X 2i − x2 )
If you write the above equation in lower cases letters for deviations from means we can write the
above equations in the following ways.

xiyi = βˆ1 1 ix 2 x i + βˆ 2 x1ix 2 i

x iyi = βˆ
2 2 x ix i + βˆ
1 2 2 x2i

If you solve for βˆ1 & βˆ 2 , we obtain

βˆ1 =
x1iyix i − x ix i x 2 1 2 2 iyi
- - - - - - - - - - - - - - -4.14
2 2
x i x i − ( x ix i)
2
1 2 1 2
2
x iyi x i − x ix i x iyi
βˆ 2 =
2 1 x 2 1
2 2
- - - - - - - - - - - - - - ..4.15
x i x i − ( x ix i)
2
x 2 1 x

52
4.1 Variance & standard errors of OLS estimators.
Having obtained the particular regression coefficients of (αˆ & βˆ ) we can derive the variance & standard
errors of these estimations in the manner as follows.

2 2 2 2

Var ( (αˆ ) =
1
+ x x + x x − 2 x x x x σu . . .. . . . . . .. 4.16
1 2
2
2
2
1 1 2 1 2 2

n x x −( x x )
2
1 2 1 2
2

Var ( ( βˆ ) = σu 2 x 2
. . . . . . . . . . . . . . . . . . . . . 4.17
2 2
x x − xx
2
( )1 2 1 2
2

Var β̂ = σu
2)
2 x 1
. . . . . . . .. . . . . . . . . . .. . . . 4.18.
2 2
x x − xx
2
( 1
) 2 1 2

2
Where σu 2 = e1 , n is number of sample, k is number of parameters which are
n−k 1
estimated .

Standard errors of αˆ & βˆ 's

S.E (αˆ ) = Var (αˆ ) . . . . . . . . . . .. . . . . .4.19

S.E ( βˆ1 ) = Var ( βˆ1 ) . . .. . . …. . . . . . . . 4.20
S.E ( βˆ ) = Var ( βˆ 2 ) . . . . . . . . .. . . . . . . 4.21

4.2 Test of significance of the parameter estimates of multiple regressions.

In the two variable regression case we have seen how to undertake the statistical test of significance. In
this multiple regression analysis or more than two variable cases we follow the same procedures of
statistical test of significance just like the two variable cases.

The Standard error test of significance. (S.E)

S.E test is the comparison of the numerical values of estimates ( αˆ , βˆ 1 , βˆ 2 ) with their standard
errors.First we set the null hypothesis and the alternatives as follows.
The null hypothesis of α̂ Ho= αˆ = 0
The alternative hypothesis β̂ Ho= βˆi = 0
The null hypothesis means the intercept term (αˆ ) and the coefficients of the variables ( βˆ 1 , βˆ 2 ) are
equal to zero.
Given yˆ i = αˆ + βˆ 1 X i + βˆ 2 X 2
i
If the estimated values of αˆ & βˆ 1 βˆ 2 are equal to zero i.e. if we the accept the above null
hypothesis it means

53
a) The dependent variable Yi is not explained (depend on) by the explanatory variables X1i & X2i or
there is no relation ship between Y & Xi'
s.
OR
b) We accept the null hypothesis that there is no relation ship between
Yi & Xi' s
OR
c) The estimates are not significantly different from zero or the estimates are insignificant.
The alternative hypothesis
H2 =αˆ ≠ 0
ˆ
H1 = β i ≠ 0
It means
a) The value of estimators αˆ & βˆ 1 βˆ 2 is different from zero i.e there is a relation ship between
Yi & X1 & X2 or Y is explained by X1 & X2
OR
b) The estimators αˆ & βˆ 1 βˆ 2 are significantly different from zero. ( they are not equal to
zero)
OR
c) αˆ & βˆ 1 , βˆ 2 are statistically significant. Having these ideas in your mind we can undertake
the test as follows
The null hypothesis H0= αˆ = 0
Ho = βˆi = 0

The alternative hypothesis H1= αˆ ≠0

H1= βˆi ≠0
α̂ β̂
a) If we find that S.E ( α̂ ) > & S.E ( β̂ )'
s> (if the estimated S.E is greater than half of the
2 2
estimators).
We accept the null hypothesis & interpret as stated above or
• There is no relationship between Y & X1, X2
• αˆ & βˆ 1 βˆ 2 are insignificant ( X1 & X2 are not affected Yi )
• αˆ & βˆ 1 , βˆ 2 are equal to zero
• Reject the alternative hypothesis which says the αˆ & βˆ 1 , βˆ 2 are different from
zero
αˆ
2. If we find that S.E ( αˆ ) <
2
βˆ
& S.E ( ( βˆi ) <
2
It means if the S.E are less than half of the estimators then we can interpreter it as follows.
a) Accepting the alternative means the value of αˆ & βˆ 1 , βˆ 2 are different from zero and reject
the null hypothesis which means the value of αˆ & βˆ 1 , βˆ 2 is equal to zero.
b) There is a relation ship between the dependent variable Yi and the independent variables X1 &
X2 or X1 & X2 explains Yi

54
c) αˆ & βˆ 1 , βˆ 2 are significant

The student's t- test

In the t- test analysis first we should obtain t- values for each estimator & we compaire it with the
theoretical or table values as follows.
A) Calculated value of t.
αˆ Calculated t- value for the
t=
S .E (αˆ ) intercept term α

βˆi Calculated t- values of the slope

t= parameters
S .E ( βˆ )

b) Theoretical or table value of t. This can be obtained from the table as follows
• First count the number of sample size & represent it by n.
• Second determine the level of significance represented by α
• Third determine whether you are taking one tail or two tail test
• Forth count the numb of estimated values i.e αˆ & βˆ 1 , βˆ 2 & represent by k
• Then having determine it we can write the table value of t as follow
α
t , (N-K)
2
α
Shows two tail test
2
N-k – gives degree of freedom (d.f)
α
This value will be obtained from any t- table of statistical books. From the table can be found in the
2
α
top of row & the d.f. can be obtained in the first columns of that table. Then the point where
2
(significance level) & the d.f. are interesting with each other gives you the table value of the t- test.
c) The last stage will be compare a & b i.e. the calculated value with the table value.
Now one should be aware of the following idea i.e the computed value of t should be calculated for each
estimator but you will have only one table value of t that will be obtained from the statistical table. Then
each calculated value of t from the estimators should compare with the fixed table value of t & the
procedure of comparison is.
1. If the calculated value of t test is less than the table value of t then accept the null hypothesis &
reject the alternative. It means
αˆ α
If t < t , (N − k)
S .E.αˆ 2
βˆ βˆ
t < t ,(N − k)
S . E ( βˆ ) 2
We accept the null hypothesis that Ho= αˆ = 0
Ho= βˆ i = 0

55
Reject the alternatives that H1 = αˆ ≠ 0 & H 1 = βˆ i ≠ 0
Means
i. There is no relation ship between Y& Xi' s or the dependent variable Yi is not explained or
depend on the in depend X1 & X2
ii. αˆ & βˆ 1 , βˆ 2 are not different from zero or they are in significant
2. If the calculated t- value is greater than the table t- value we reject the null hypothesis i.e.
Ho= αˆ = 0 & Ho= βˆ = 0 and accept the alternatives i.e. H1= αˆ ≠ 0 & H1= βˆ ≠ 0
αˆ α
If t >t , (k − N )
S . E (αˆ ) 2
βˆ α
t >t (k − N )
S .E ( βˆ ) 2
In this case we can interpreter it as follows
i. the estimates are significantly different from zero i.e. αˆ & βˆ 1 βˆ 2 are not equal to zero
ii. the estimates ( αˆ & βˆ 1 βˆ 2 ) are statistically significant
iii. the explanatory variables X1 & X2 will affect the dependent variable Yi.

Unadjusted R2, (Adjusted) Corrected R 2 & F-test

R 2 is the coefficient of determination or called the unadjusted R . This unadjusted coefficient of
2

determination will have the same meaning just like in the two variable cases the only difference here
is that the inclusion of additional variables. Then R 2 explain the proportion of the variation in Y
explained by the variables X1 & X2 jointly.
R 2 Can be derived from equation no 3.1 just by following the same procedures of single
equation model calculation of r2.
ˆ
ESS β 1 yi x1 + β 2 x2 y
2
ˆ
R= =
TSS yi 2
If you compare this equation with the previous r2.
xiyi
βˆ
βˆ x
2
r = the only difference is we added y because we
yi 2 2
2

incorporate the 3rd variable X2i. Then as you increase your explanatory variables the value of R2 will
increase because as there numerator is increasing the denominator remains the same. Due to this
reason we call it unadjusted R2. There fore, if we have two models with different explanatory
variables, we can not compare them on the basis of R2. But to compare two equation with different
number of explanatory variables we should have to use Adjusted (corrected) R 2 .

Adjusted R 2
2
iuˆ n−k
R 2 =1 -
yi 2 n −1
Where n-sample size

56
k-degree of freedom

OR
n −1
R 2 =1-(1- R 2 )
n−k
If the sample size is small R 2 < R 2 but if the sample size is very large R 2 & R 2 will be very close to each
others. Here we should be aware of that for very small size of sample R 2 may be negative but taken as
zero. Note that if R 2 =1, R 2 =1, when R 2 =0, R 2 = (1-k)/(n-k) in this case R 2 will be negative if k>1.

F- tests:-
Under this task we compare the computed F-values with the table value of F-tests. Importance of –F-
tests: - This test is undertaken in multiple regression analysis for the following reasons.
a) To test the over all significance of the explanatory variables i.e. whether the explanatory variables
X1,X2 actually do have influence on the explained variable Yi or not
b) Test of improvement, means by introducing one additional variable in the model to test that
whether the additional variable will improve the influence of the explanatory variable on the
explained variable or not

Ex. Yt=α +β1X1i +β2 X2i +Ui

Yt=α +β1X1i +β2 X2i +βX3i +Ui
Now if you take the first equation it has only two explanatory variables X1 &X2 but in the second we
included X3. Then the addition of X3 may affect positively or negatively the relationship between Y & X1,
X 2.
C) To test the equality of coefficients obtained from different sample size (chaw test). Suppose you
may have a sample data of agricultural output of Adet woreda from 1974 E.C. up to 1994 E.C.
Now if you want to know the change in agricultural out put before & after the fall down of Derge
(1983). By splitting these data in to two you can compare the coefficients & by doing so you can
undertake F-test and see whether there is a change in agricultural output or not.
D) Testing the stability of the coefficients of the variables as the number of sample size increase
Ex. You may take first a 10 year sample for your study & estimate your estimators. But if you
increase the number of sample size in to 15 years, will the coefficients of the variables are stable
or not will be tested using F-tests.
F test can be calculated for a given estimated equation using the following formula.
yˆ i / k − I R 2 /( k − 1)
F* = =
ei 2 / N − k (1 − R 2 ) /( N − K )
Table value of F-test can be read from the statistical table as follow.
Fv1 , v 2 or F( k −1),( N − k )
Where V1= (K-1) & V2= (N-K) both of them explain degrees of freedom V1 the numerator & V2
the denominator degree of freedom. To undertake F test we should compare the calculated value with the
table value.
1) If the calculated F value is less than the table value i.e.
R 2 / k − 1)
F* = < FV 1,V 2
(1 − R 2 / N − k
Where F* is calculated F value
F is the table value

57
It means we accept the null hypothesis that H0= αˆ = 0 & H0= βî = 0 and reject the alternative hypothesis
which says H0= αˆ ≠ 0 & H1= βî ≠ 0 . The interpretation will be
a) The estimators ( αˆ , βˆ , βˆ2 ) are equal to zero then the estimates are insignificant.
b) The explanatory variables (X1,X2) do not have influence on the explained variable Yi.
2) If the calculated F value is greater than its table value i.e.
R 2 / k − 1)
F* = > FV 1,V 2
(1 − R 2 / N − k
It means we reject the null hypothesis that H0= αˆ = 0 & H0= βî = 0 & accept the alternatives that
H1= αˆ ≠ 0 & H1= βî ≠ 0 . The interpretation of this is just the opposite of a & b in the above sentences.

Example:-
Suppose the quantity supplied of commodity is assumed to be a linear function of the price of the
commodity itself & the wage rate of labor used in the production of the commodity- If the supply
equation is given by
Q1 = α + β1 PX 1 + β 2W + u
Where Q1 = quantity supplied. Px, price of commodity X & W is wage rate
Using the following sample data
a) Estimate the parameters using OLS
b) Test the statistical significance of the individual coefficients ( αˆ , βˆ & βˆ2 ) at 5% significant level
c) Test the over all significance of the coefficients (F-test)
d) Compute the price elasticity of supply

Y Q1 20 35 30 47 60 68 76 90 100 105 130 140 125 120 135

X Px 10 15 21 26 40 37 42 33 30 38 60 65 50 35 42
Z W 12 10 9 8 5 7 4 5 7 5 3 4 3 1 2

From these we will have the following values.

Let Qs =Y, Px = X & W=Z
Y = 85.4 x =36.27 yx =7207.4 Zy =-1553 xz =-514.667 yi 2 =23211.6
xi 2 =3192.93 Z 2 =135.3 ei 2 4386.49 N=15
Answer
a) Estimate the parameter using OLS

x1 yi z2 − xz zy
βˆ1 =
z −( xz )
2 2 2
xi
(7207.4)(135 − 3) − ( −514 − 667)( −1553)
(3192.93)(135.33) − ( −514.667) 2

58
173668
= = 1.16
162,712
zy x 2 − Xz xy
βˆ2 =
xi 2 Z 2 − ( xz )
2

( −1553)(3192) − ( −514)(7207)
=
162,712
( −495,7176) − (−3704398) − 1,252778
= = ≈ −7.69
162712 162,712
αˆ = Y − βˆ x − βˆ Z
1 2
85.4-(1.06x36.2)-(-7.699x5.6)=
α̂ =85.4-38.372-(-42.946)=89.974
Y = αˆ + βˆ1 x − βˆ2 Z
Y= 89.974 + 1.06X - 7.69Z. This equation will be read as follow
• α̂ = 89.974 means if the price of the commodity & the wage rate is zero the supplier will
supply 89.974 units of goods. But it is meaningless to interpret the constant term ( α̂ ). (In
some analysis it doesn’t give sense.)
• β̂1 is the coefficient of price of the commodity. The value 1.06 signifies that if the price
of the commodity is increasing by one birr given the price of wage is constant quantity
supplied will increased on the average by 1.06 units.
• β̂ 2 is the coefficient of the wage rate. If the wage rate is increased by 1 birr keeping
constant the price of the commodity quantity supplied will decrease on the average by
7.69.
• β̂1 & β̂ 2 are coefficients of the explanatory variables & they are containing the marginal
values. If you take the first derivative of the equation, you will have marginal values.
Ex. Yˆi = 89.974+16px-7.697Z
∂yi 2Yˆi
= 1.16 & = −7.697
∂px 2Z
b) Test the statistical significance of ( αˆ , βˆ & βˆ2 ) .

Standard error test

To test this we have to have variance & the standard error of the parameters ( αˆ , βˆ & βˆ2 ) .The variance of
αˆ , βˆ & βˆ2 will have a value of σ 2 (see equation number 4.16 – 4.18).
ei 2
Where σu 2 = from our example n=sample size of 15
n−k
K is number of estimated parameter which is 3( αˆ , βˆ & βˆ2 )
ei 2
4386.49
σu 2 = = = 365.5
N −k 15.3
Using equation number 4.16 we can calculate Var (αˆ )

59
1 (36.2) 2 (135) + (5.6) 2 (3192) − ()2 x36.2 x5.6 x − 514)
Var (αˆ ) = + 365.5
5 162712
(0.0666 + 176,909.4 + 100101 + 207244.8
= 365.5
162712
(0.0666 + 484,255.2
Var (αˆ ) = 365.5 = 1,110.54
162712
Var ( βˆ ) again we can calculate using equation number 4.18
1

135.33
Var ( βˆ1 ) = 365.5 = 0.303
162,712
Var ( βˆ1 ) = the Var of β̂1 can be calculated using equation number 4.19
3192.933
Var ( βˆ2 ) = 365.5 = 7.17
162,712
From the above values we can calculate S.E (αˆ ) , S.E ( βˆ1 ) & S.E ( βˆ2 ) as follows
S.E (αˆ ) = Varα̂ = 1110.54 ≈ 31.748
S.E ( βˆ1 ) = Var ( βˆ1 ) = 0.303 ≈ 0.521
S.E ( βˆ2 ) = Var ( βˆ2 ) = 7.17 ≈ 2.529
Having calculated S.E of the coefficients of the variables ( αˆ , βˆ & βˆ2 ) we can undertake S.E.
tests – as follows
α̂
- If S.E( α̂ )> we can accept the null hypothesis & reject the alternative
2
αˆ 89.974
S .E (αˆ ) = 31.76 & = = 44.987
2 2
α̂
Then S.E (αˆ ) =i.e 31.76 is less than which is 44.987. There fore we can conclude that α̂
2
is statistically significant
βˆ 1.16
S.E ( βˆ1 ) = 0.521 1 =0.58
2 2
βˆ2
From this we can see that S.E ( βˆ1 ) is less than then we can conclud that ( βˆ2 ) is
2
βˆ2 7.697
significant. Lastly S.E ( βˆ2 ) = 0.529 & = = 3.848 Again here S.E ( βˆ2 ) is less
2 2
βˆ2
than . All the estimators are statistically significant or we reject the null hypothesis that
2
H0= αˆ = 0 , H0= βˆi = 0 (we reject the hypothesis which says ( αˆ , βˆ & βˆ2 ) are equal to zero) &
accept the alternative that H1= αˆ ≠ 0 , βˆ ≠ 0 H1= βˆi ≠ 0 ( αˆ , βˆ & βˆ ) are different from
i 2
zero.)

60
The economic interpretation of rejecting the null hypothesis and accepting the alternative states the
following
i) The estimators are statistically significant
ii) The explanatory variables X1 & Z (price of commodity X & wage rate) influence the supply of
commodity (Y)
Student –t –test
In the t-test analysis we compare the calculated t with table value of t. How to get calculated t-value
αˆ 89.974
t= Computed value of t = =2.23
S .E (αˆ1 ) 31.76
βˆ1 1.16
t= Computed t = =2.83
S .E ( βˆ1 ) 0.521
βˆ2 − 7.697
t= Computed t = =3.043
ˆ
S .E ( β 2 ) 2.529

How to get t-value from t- table

Given 5% significance level, the total sample used to calculate the estimators are 15 & the number of
estimators is 3 (αˆ , βˆ & βˆ2 ) . If we take two tail test
α
t , (N − K ) α = Significance level
2
α
Show two tail test
2
N = sample size
K = estimators
N-K = degree of freedom
0.05
t , (15 − 13) = t0.25,12
2
From the t- table in the top of the raw find 0.025 & in the first column of the table find 12 then when
these two values are intersecting with each other, that point will give you the table value of t. From our t
0.025,12 the table value is 2.179. Compare this table value with the computed value & if the computed
value is greater than the table value we reject the null hypothesis & accept the alternative. Again if the
computed value is less than the table value we accept the null hypothesis & reject the null hypothesis.

Compare computed with table value

Compared t value of α̂ is 2.83 & the table value is 2.179 here the computed t value is greater
than the table value for α̂
Again the computed t value for βˆ1 & βˆ2 is 2.23 & 3.043 respectively they are greater than the
table value.
Since in all this cases αˆ , βˆ & βˆ2 computed t-value is greater than the t-table value. We will have the
following interpretation
i. αˆ , βˆ & βˆ2 are statistically significant
ii. The quantity supplied is influenced by the price of the commodity & wage rate

61
Coefficient of determination /R2/

R2=
βˆ 1
yi x1 + βˆ
2
yzi
(1.16)(7207.4) + ( −7.47)(−1553)
=
2
yi 23211.6
(8,360.58 + 11,600.91) 19961.49
R2= = = 0.8599
23,211.6 23,211.6
This means 85.99% of quantity supplied is explained by price of the commodity & wage rate.
Adjusted R 2
n −1 15 − 1
R 2 =1-(1- R 2 ) = 1-(1-0.8599) = 0.8365
n−k 15 − 3
F - test
The over all significance of the explanatory variables can be tested using F-test. Just like t- test in the case
of F test we will have computed & table value of F. Calculated value of F* can be obtained using the
following formula
* R2 / k − 1 0.8599 / 3 − 1 0.8599 / 2
F = = =
(1 − R ) / N − k 0.1401 / 15 − 3 0.1401 / 12
2

0.42995
= =36.826
0.11675
Table value of F can be obtained by taking the enumerator degree of freedom (k-1) & the denominator
degree of freedom (N-K). Then F (K-1), (N-K). Given the level of significance of 5% we can get F2, 12
i.e. K-1 = 3-1 = 2 & N-K = 15-3 = 12. Then from the table we find the enumerator degree of
freedom (K-1) in the top row of the table & the denominator degree of freedom (N-K) in the first column
of the table at the intersection point of these values you will get the table value. From our example F2 , 12
at 5% level is 3.89. Comparison of calculated F & table value of F. If the calculated value of F is greater
than the table value then we can reject the null hypothesis & accept the alternative. From our example the
calculated F value is 36.826 is greater than the table value 3.89. The economic interpretation of this is
i) All the estimators are significant or statistically different from zero
ii) Quantity supplied is affected by the price of the commodity & wage rate
Presentation of regression results:- Different books used different presentation methods but the most
commonly is the one which we write under here using our previous example.

Y = 89.974 + 1.16PX - 7.697Z

S.E = (31.748) (0.521) (2.529)
t= (2.83) (2.23) (3.043)
R2 = 0.8599
R 2 = 0.8365
F = 36.826
N = 15
ei 2 = 4019.714

4.3 Importance of the statistical Test of Significance

There is no general agreement among econometricians as to which of the two statistical criteria is more
important a high R2 ( R 2 ) a lower standard error of the parameter estimates
The choice would not be difficult if the model produces high R2 and lower S.E. However this is not
usually the case in most applications we found high R2 and higher or in significant standard errors.

62
- In this event some econometrician tends to attach great importance for R2 & accept the parameter
estimates; despite the fact that some of them are statistically insignificant.
- Other suggests that acceptance or rejection of the estimates which are not statistically significant
depends on the aim of the model.
i) If the aim of the model is forecasting majority of econometricians attach importance for high
R2.
ii) If the aim of the model is for policy analysis of economic phenomena great importance is
attached for lower S.E.
A high r2 has clear merit only when it is combined with significant estimates (low standard errors). When
high R2 & low S.E are not found in any particular study the researcher should be very careful in his
interpretation & acceptance of the results. Priority should always be given to the fulfillment of the
economic a priori criteria (sign & magnitude of the estimates). Only when economic criterions are
satisfied should one proceed with the application of different tests.

Exercise for chapter four

1) From the following data compute the regression of automobile expenditure on consumer expenditure &
other travel expenditer (Take the linear regression analysis)

Automobile 212 158 180 253 175 429 437 419 318 355
expendtuer

Other 29 46 28 26 29 64 119 81 74 66
travel
expense

Consumer 2437 2476 2132 2256 2258 3566 4486 3602 3446 3736
expenditure

a) Calculate the coefficient of parameters using OLS regression equation of Consumer

expenditure on other travel expense & other travel expense?
b) Write the equation
c) Test the significance of the parameters using standared error & t-test?
d) Construct the 95% confidence interval for the parameters?

e) Calculate the unadjusted & adjusted R2. Why the difference is arising?

f) Test the overall significance of the regression?

g) If the relation ship is non linear how would you inerpreate the results (coefficients)

h) Calculate the partial correlation coefficients of the parameters & interpreate the results?

2 Given the following data & answer the question from question one i.e. from a up to h?

63
Ye 1985 1986 1987 1988 1989 1990 199 1992 1993 1994 1995 1996 19 199 199
ar 1 97 8 9

Y 40 45 50 55 60 70 65 65 75 75 80 100 90 95 85

X1 9 8 9 8 7 6 6 8 5 5 5 3 4 3 4

X2 400 500 600 700 800 900 100 1100 1200 1300 1400 1500 16 170 180
0 00 0 0

Y= quantity demanded, X1 = is the price of the commodity & X2 is consumers income

Assume that income is the dependent variable & the remainings are the independent variable.

64
Chapter 5: Relaxing the assumptions of the classical model

In the previous chapters to estimate the coefficients of variables in the two variable & more than two
variable cases we utilize two basic assumptions, one about the random term Ui & the other about X(
independent variable). These are:

• The first assumption about the random term Ui

a) Ui is a random term. We introduce this random term to capture those variables which are not
incorporated (omitted) in the model, misspecification of the model, error committed in the
measurement of variables & the erratic behavior of human beings.
Though we incorporate Ui to solve the above problems, there is no formal test for this random
term measurement. Ui' s are not observable & their estimates e's are obtained with the assumption
of randomness.
b) Assumption of zero mean of Ui. We have said that the expected value (average value) of Ui is
equal to zero E(Ui)=0 In reality it is impossible to prove or disprove it; but it is necessary to
make the zero mean assumption so as to be able to apply the rule of algebra to stochastic
phenomena & nature of relationships. Otherwise it would be impossible to estimate with the
common rule of mathematics.
If E (Ui)> 0 or E (Ui) < 0 then the estimated line would be biased. This is the basic reason why the
assumption of E (Ui) =0 is forced up on us if we are establish the true relationship.
c) The assumption of homoscedasticity:- we assume that the distribution of the random term Ui is
remains the same over all observation of the explanatory variables; in particular the variance is
constant. Var (U)= σ 2 constant. This assumption is known as homoscedasticity. But if this
assumption does not holds true what will be the var (Ui) or if the var (Ui) ≠ σ 2ui i.e. if the
variance of Ui is not constant. This is called hetroscedasticity
d) Assumption of no- autocorrelation or No- serially correlated.
We assume that cov (UiUj)=0 i ≠ j. i.e. the covariance between the successive random term Ui is zero or
the successive random terms are not influenced with each other. i.e. Ut (current random term) occurring
will not be affected by Ut-1 (previous year random term). But there is a chance the Ui' s to be influenced
with each others i.e. if Ut is influenced with each others i.e. if Ut is influenced by Ut-1 cov (UiUj) =0 (
There is autocorrelation or Uis are serially correlated.

• The 2nd Assumption about X’s (explanatory variables).

In the estimation of the model using OLS method we assume that the explanatory variables are not
perfectly correlated with each others & we call this there is no- multi co linearity. But in reality the X’s
may be correlated with each others & problem of multicollinearity may be observed in the model. Now
the question is that if we violate these two assumptions about Ui & Xi’s what will be the consequence on
the reliability of our estimated numerical values of the coefficients of variable (estimates ) or will the
estimates still contains the desirable properties of OLS estimates? The answer is short and precise that is
the estimates will not satisfies the desirable properties of OLS.

5.1 Violation of the important assumptions

5.1.1 Hetroscedaticty:-
If the probability distribution of Ui remains the same over all explanatory variables this assumptions is
called homosceasticity i.e. var (Ui)= σu 2 constant Variance. In this case the variation of ui around the
explanatory variables is remains constant. But if the distribution of ui around the explanatory is not

65
constant we say that ui’s are hetro scedastic (not constant variance).Var (ui) = σu 2 i. signifies the fact
that the individual variance may be different.
The assumption of homoscedasticity states that the variation of each random term (Ui) around its zero
mean is constant and does not change as the explanatory variables change whether the sample size is
increasing, decreasing or remains the same it will not affect the variance of Ui which is constant.
Var (Ui) = σu 2 ≠ f(Xi)-------------------------------------------------------5.1
This explains that the variation of the random term around its mean does not depend upon the explanatory
variable Xi. This constant variance is called homoscedastic (constant Variance)
But the dispersion of the random term around the regression line may not be constant or the variance of
the random term Ui may be a function of the explanatory variables. Var (Ui) = σ 2 ui = f ( Xi ) here i-
signifies the individual variance may all be different. This is called hetroscedasticity or not constant
variance. The case of hetroscedasticity is shown by the increasing or decreasing depression of the random
term around the regression line as shown in fig b, c & d.

Y Y

X X
Fig. (a) Fig. (b)

Y Y

X
X
Fig. (c) Fig. (d)

66
On diagram (a) you can see that the random term Ui is dispersed with in a constant variance around the
regression line.
In figure (b) as the value of X is increasing the variance of the random term Ui is also increasing.
Ex1.
St = α + β Yt + Ut ---------------------------------------------------------5.2
Where St is saving, Yt is income α & β parameter & Ut is random term.
If you wants to estimate this saving function & collect cross sectional data from a lower income group
level the variation of saving is lower where as there is greater variation in the saving behaviors of high
income-families. Thus the random term Ui is very low at lower income and it tends to increase as income
increases due to variation in the behavior of saving.
Ex2. Suppose we try to study the consumption expenditure from a given cross sectional sample of family
budget.
Ct = α + β Yti + U ----------------------------------------5.3
Where Ct= consumption expenditure
Yt= disposable income of the h.h
Again at a lower income level the consumption expenditure is almost equal i.e. no variation in
consumption expenditure but at a higher level of income there is variation in consumption between higher
& lower income level. Then there is a possibility of increasing variation of the random term called hetro
scedasticity. In figure (c) we can see that as the value of X is increasing the variation of random term Ui is
decreasing. By doing & learning the error of committing errors will decrease. In this case the variation of
the term Ui is decreasing as Xi' s is increasing. Again here we will have hetroscedasticity. In figure (d) we
may have a complicated hetroscedasticity. i.e. in the beginning there is a high variation of U at lower Xi' s
& a higher variation of Ui at a higher Xi' s.

5.1.1.1 Consequences of hetroscedasticity

If the assumption of homoscedasticity is violated, it will have the following consequences
1) The variance of the coefficients of OLS will be incorrect & under the assumption of
homoscedasticity we have
X2
var (αˆ ) 2 = σu 2
xi 2
1
Var ( ( βî ) = σu 2
xi 2
Since the variance of σ 2 is assumed to be constant we took it out from the summation in equation
number 2.47 and 2.48 But under hetroscedasticity condition σ 2 is not constant & we will have
x 2σu 2i
var (αˆ ) = σu 2
&
xi 2
Var ( ( βî ) = kiσ 2ui
σ 2ui in this case is not a constant number but varies as X changes. To calculate var (αˆ ) and var ( βî )
we should know the value of σ 2ui i.e. i=1, 2...n. The problem encountered here is that since we can not
have observable value of Ui we have to estimate it from the sample data using residuals as proxies
[(Yi − Yˆ ) = ei] to the unobservable errors. Then we have to estimate n variables from n- observables i.e.
one for each variance, a situation in which estimation is impossible.

67
2) OLS estimators shall be inefficient: - If the random term Ui is hetroscedastic, the OLS estimates
do not have the minimum variance in the class of unbiased estimators. Therefore they are not
efficient both in small & large samples.
In case of hetroscedasticity
var(βˆ ) = σui 2
xi 2
= ki 2σu 2
= ki 2 E (Ui) 2
2
xi
= E (ui ) 2
xi 2
xi 2
var(βˆ ) = 2 2
.σ 2ui Under hetroscedasticity.
( xi )
The variance of ( βˆ ) under homoscedasticity is
σ 2u
Var ( βˆ ) = Since σ 2u is constant = σ 2u 1
xi 2 xi 2
Now Var ( βˆ ) under hetroscedasticity the variance of Ui is proportional to Ki (increasing, decreasing or a
combination of both. i.e. what ever its relation with the X is given by this proportion of Ki) then we will
have
σ 2ui = kiσ 2 ------------------------------------5.4

This means the hetroscedasticity is a proportion of (Ki) the homoscedasticity & Ki is non-stochastic
constant weight.
Substitute σ 2ui = kiσ 2 in the variance of ( βˆ ) when it is hetroscedastic
xi 2σui 2 xi 2 ( kiσ 2 u )
Var ( βˆ ) = 2
=
( xi ) 2 ( xi 2 ) 2
xi 2 ki
=σ u 2
2
( xi 2 )
xi 2 ki σu 2 xi 2 ki
Var ( βˆ ) = 2
=
xi 2 . xi xi 2 ( xi 2 )
From the equation we will have two components.
σu 2 kixi 2
is variance of ( βˆ ) under homoscedastic assumption &
xi 2 xi 2
kixi 2
If x & ki are positively correlated then is greater than 1.
xi 2

68
a) The var ( βˆ ) under hetroscedasticity will be greater than its variance under homoscedastic.
Following this true standard error of β̂ shall be underestimated & the t- value associated with it
will be overestimated.
i. This leads to the conclusion that in a specific case at hand β̂ is statistically significant
(which in fact may not be true).
ii. The consequence of var β̂ of hetroscedastic is that the confidence limits & the test of
significance will not be applicable
iii. If we proceed with our model under false of homogeneity variances then our inference &
predictions about the population coefficients would be incorrect.

5.1.1.2 How to detect Hetroscedasticity

Hetroscedasticity have a serious effect on OLS estimates. If so how does one know the existence of
hetroscedasticity in the model? Various tests have been suggested for establishing hetroscedasticity &
some of them are described below.

• Test 1. The spearman rank- correlation Test

This is the simplest & approximate test for defecting hetroscedastic which will be applied either to
small or large samples & may be outlined as follows.
i. we regress Y on X
Y = α + β Xi + Ui
Obtain the value of residuals ei = Yi − Yˆ which are estimate of the U’s. Then arrange ei’s (ignore the
sign or take the absolute value of ei) in ascending or descending values of X & we compute the rank
correlation coefficient.
6 D2
r 1e− x = 1 − − − − − − − − − − − − − − − − − − 5 .5
n(n 2 − 1)
Where: Di is the difference between the ranks of pairs of
corresponding pairs of X & ei & n is the number of sample observations.
A high rank correlation coefficient suggests the presence of hetroscedasticity. If we have more
explanatory variable we may compute the rank correlation coefficient between ei & each one of the
explanatory variables separately.

• Test 2. The Goldfeld Quandt test

This test is applicable for large samples & the number of observations (at least) i.e sample size is twice
the number of explanatory variables
Yi = α + β Xi + β 2 X 2 i + β 3 X 3i + Ui
The numbers of explanatory variables are 3(X1, X2, X3) then the sample size is at least must be 6.In
addition to the size of sample this test assumes normality & no autocorrelation. The steps described are as
follows
a) Formulate hypothesis testing.
The null hypothesis H0 =Uo are homoscedastic
The alternative H1= U1 are not homoscedastic (U’s are hetroscedastic.)
b) Order the observations according to the magnitude of the explanatory variable X.i
c) Omit certain number of central observations say C amount. Now if you deduct the amount that
will be omitted C we left with n-c number of observations. Then divided this remaining sample

69
n−c
in to two parts. i.e . These samples contain two pars one part includes the small values of
2
X while the other parts the large values of X.
d) Fit separate regression by OLS procedure & obtain the sum of squared residuals from each of
them
2
e) Let e1 is the residual squared from the sample of low values of X & e2 2 from the large
sample values of X. Then calculate F- test using ratio of the residuals variances.
2
e 2

n−c
−k
2 e22
F* = = − − − − − − − − − − − − − − − − − 5 .6
e12 e12
n−c
−k
2
n−c n−c
The numerator degree of freedom is -k & the denominator degree of freedom is -k.
2 2
From this one can see that the numerator is equal to the denominator i.e. V1=V2. Using V1&V2 we can
find the table value of F v1,v2 & compare this table value with the above calculated value.
e22
1) If > Fv1,v2 if the calculated value is greater than the table value of F then our decision
e12
will be
a. Reject the null hypothesis that Ho=Ui are homoscedastic
b. Accept the alternative that H1=Ui’s are hetroscedastic. Then the variance of Ui’s are not
constant (homoscedastic) but the variance of Ui’s are hetroscedastic.
e22
2) If < Fv1,v2 the above decision will be reversed.
e12
Ex. Given the following hypothetical data on consumption expenditure Y & income X.
Rank data in ascending order of X

70
Y X Y X
55 80 55 80
65 100 70 85
70 85 75 90
80 110 65 100
79 120 74 105
84 115 80 110 Lower values of X’s
98 130 84 115
95 140 79 120
90 125 90 125
75 90 98 130
74 105 95 140
110 160 108 145
113 150 113 150
125 165 110 160
108 145 125 165 Middle observations C=4
115 180 115 180 that will be omitted
140 225 130 185
120 240 135 190
145 185 120 200
130 220 140 205
152 210 144 210
144 245 152 220
175 260 140 225 higher values of X’s
180 190 137 230
135 205 145 240
140 265 175 245
178 270 189 250
191 230 180 260
137 250 178 265
189 275 191 270

71
Take the first 13 observations i.e lower observations & run regression

Y = α + βXi + Ui
Then you will find
Y =3.4094 +0.6968Xi
(8.7049) (0.0744)
r2= 0.8887
2
e1 = 377.11
Again from the remaining observations 13 sample run regression Y on X & you will have

Yˆ − 28.0272 + 0.7941xi
(30.6421) (0.1319)
r2= 0.7681
2
e2 = 1536.8
e22 1536.8
Computed value of F = 2
= = 4.07
e1 377.11
Table of Fv1,v2
n−c 30 − 4
v1 = −k = − 2 = 11
2 2
n−c 30 − 4
v2 = −k = − 2 = 11
2 2

F11,11, from the table of F at 5% level & you will get 2.82. Compare the F table value with F computed
value. If the computed value is greater than the table value rejects the null hypothesis that there is
homoscedasticity & accepts the alternative that there is hetroscedasticity. The computed value 4.07 is
greater than the table value i.e 2.82. Then we can say that there is hetroscedasticity.

• Test 3 the Glejser- test for homoscedasticity

This test can be outlined as follows
a) We perform the regression of Y on the explanatory variables ( X) & we compute the residuals
i.e. ei.
Yi = α + βX 1i + β 2 X 2 i + β 3 X 3i + Ui − − − − − − − − − − − − − 5.7
Estimate this you will have
Yi = αˆ + βˆX 1i + βˆ 2 X 2 i + βˆ3 X 3i + Ui − − − − − − − − − − − − − 5.8
Find
ei = Yi − Yˆi − − − − − − − − − − − − − − − − − − − − − 5.9
b) Regress /ei/ on each explanatory variables Xi’s independently using different power.

72
/ e / = α 0 + β1 Xi 2 + vi
/ e / = α 0 + β1 Xi + vi
1
/ e / = α 0 + β 1 + vi -------------------------5.10
X
1
/ e / = α 0 + β1 + vi e.t.c.
X

But here it is assumed that vi satisfies all the assumptions of OLS. We choose the form of regression
which gives the best fit in light of
i. R2 & the standard error of coefficients of αˆ & βˆ
ii. Formulate the hypothesis testing
The null hypothesis H0 = U0 s are homoscedastic
The alternative hypothesis is H1 = Ui’s are hetroscedastic.
Using standard error test we can accept or reject the null hypothesis as follows
αˆ βˆ
• If S .E (αˆ ) < & S .E ( βˆ ) < we reject the null hypothesis (the existence of homoscedasticity
2 2
is rejected) & accept the alternatives which says that there is hetroscedasticity. This kind of
hetroscedasticity i.e if we reject the null hypothesis of αˆ & βˆ & accept the alternative is called
mixed hetroscedasticity.
αˆ βˆ
• If S.E ( (αˆ ) > & S .E ( βˆ ) < i.e we accept the null hypothesis for α̂ & reject the null
2 2
hypothesis for β̂ this is called pure hetroscedasticity.
• In addition to the standard error test we use the F & t-tests of the significance for the coefficients.
On the basis of t & F test if the α̂ & β̂ are significantly different from zero we reject the null
hypothesis & accept the alternative and concluded that there is hetroscedasticity.
One of the advantages of the Glejser test is that it gives the form of hetroscedasticity i.e. what is the
relationship between σ 2 ui & f ( X ) i.e increasing or decreasing relationship etc. This test is very
important in correcting or removing hetroscedasticity.

5.1.1.3 Solutions for Hetroscedasticity

When hetroscedasticity is found in the model under consideration the appropriate solution is
a) Transform the original model so as to obtain a homoscedastic disturbance term i.e constant
variance. Then we apply the method of CLS to the transformed model.
b) The transformation of the model will change the original data of values of explanatory variable.
c) The transformation of the model depends up on the nature of hetroscedasticity that will be
explained by the functional relation ship between the variance of the disturbance term σui 2 and
the values of the explanatory variables (X’s)

σ 2 ui = f ( X ) − − − − − − − − − − − − − − − − − − − − 5 . 11
d) The transformation of the original model consists in dividing through the original relationship by
the square root of the term which is responsible for hetroscedasticity. It means simply dividing

73
the original model by the square root of the variable which is identified or responsible for the
existence of hetroscedasticity. Suppose lets have the following type of original model.

Yi = α + βXi + Ui − − − − − − − − − − − 5.12
Where Ui satisfies all the assumptions but Ui is hetroscedastic
Ui ≈ N (0, σ 2 ui ) − − − − − − − − − − − −5.13
E (Ui ) 2 = σ 2 ui = f ( Xi ) . When there is hetroscedasticity there is functional relationship
between σ 2 ui = f ( Xi ) . But the problem is what the exact functional relationship between these two.
Practically it is impossible to obtain the actual form of relation ship but we can only assume possible
types of hetroscedastic structures.
Case (a) suppose the hetroscedastic type is the form of

E (Ui ) 2 = σ 2 ui = Ki 2 Xi 2 ------------------------------------5.14
Where Ki 2 is a finite constant term to be estimated from the model. It explains the variance of the
random term increasing by Ki 2 (proportionately) as the explanatory variables increases by Xi 2
σ 2 ui = Ki 2 Xi 2
Solve for Ki 2
σ 2 ui
Ki 2 = − − − − − − − − − − − − − − − − − − − −5.15
Xi 2
This suggests that the appropriate transformation of the original model is the division of the original
relationship by X 2
The reason for this is that
σui 2 σui 2
K2 = = Ki =
Xi 2 Xi 2
σui
K = ------------------------------------5.16
Xi
This shows divided the orginal model by Xi or X 2 then
Yi α Ui
= + − − − − − − − − − 5.17
2 2
Xi Xi Xi 2 i
Yi α β Xi Ui
= + + − − − − − − − − − − − − − −5.18
Xi Xi Xi Xi
Ui
From this equation the new transformed random term is homoscedastic (constant variance).
Xi
Ui
To prove this is homoscedastic.
Xi
2
Ui Ui 1
Var =E = 2 E (Ui ) 2 − − − − − − − − − − − − − 5.19
Xi Xi Xi
We know that E (ui ) = σ ui substitute in equation number 5.13
2 2

Ui 1
Var = 2
σui 2 ----------------------------------------5.20
Xi Xi

74
The type of hetroscedasticity we assume was that just like in equation 5.15 then substitute it in equation
number 5.18 & you will get
Ui 1
Var = 2 (k 2 X 2 )
Xi Xi
Ui
Var = ki 2 is a constant number which proves that the new random term in the model have a finite
Xi
constant variance Ki2. Then we can apply OLS to the transformed original model of equation number
(4.16)

Yi α β Xi Ui
= + +
Xi Xi Xi Xi
Yi 1 Ui
=α +β + − − − − − − − − − 5.21
Xi Xi Xi
In this transformed model the position of the coefficient has changed i.e. the constant term α in the
original model equation number 5.12 now will be the coefficient of the 1 & the β which was the
Xi
coefficient of Xi in the original model appear to be a constant term in the transformed model of equation
(5.21). Then if you want to get the original model multiply by Xi the transformed model (equation 5.21).
Case (b) Suppose the form of hetroscedasticity is

E (ui ) 2 = σ 2ui =k2Xi -------------------------------4.22

It means as X is increasing or decreasing the variance of the random term Ui is increasing or decreasing
by K2. Then from equation number 4.22 we can get the value of K2
σui 2
K2 =
Xi
σui
K = − − − − − − − − − − − − − − − − − −4.23
Xi
This means the original model equation number 4.11 will be divided by Xi

Yi α Xi Ui
= +β = − − − − − − − − − 4.24
Xi Xi Xi Xi
or
Yi α Ui
= + β Xi + − − − − − − − − − 4.25
Xi Xi Xi
Ui
The disturbance term in the transformed model is homoscedastic
Xi
2
(Ui 1 1
Var (Ui ) = E = E (Ui ) 2 = σui 2 − − − − − − − 4.26
Xi Xi Xi
We have assumed that in equation number 5.22 the type of hetroscedasticity is substituted in
equation number 5.25. Then

75
1 1
Var (Ui ) = σui 2 = ( K 2 X ) = K 2 − − − − − − − − − − − − − 4.2.7
Xi Xi
Var (Ui) = K2 shows that the variance of the random term Ui after transformation of the original data will
give us a constant number equal to K2. Therefore we can apply OLS to equation number 4.25. In the
transformed model we do not have intercept term then the equation pass through the origin to
estimate αˆ & βˆ . In this model if you wants to get the original model we shall have multiply the
transformed model by Xi .
Case (C) suppose the form of hetroscedasticity is the form of

E (Ui 2 ) = σui 2 = K 2 ( E (Yi)) 2 − − − − − − − 4.2 8

In this equation we assumed that the variance of the disturbance terms is proportional to the square of the
dependent variable Y.

[
Var (Ui2)= σ 2 ui = K 2 (E (Yi) ) = K 2 (α + βXi ) 2
2
]
σ 2 ui = K 2 [(α + β Xi ) 2 ]
σui 2
Ki 2 =
(α + β Xi ) 2
σui 2
K= -----------------------------------------------4.29
(α + β Xi ) 2
σui
K=
α + βXi
The required transformation of equation no (5.12) would be
Y α β Xi Ui
= + + ----------5.30
α + β Xi α + β Xi α + β Xi α + β Xi

Ui
The new transformed random term is homoscedastic
α + β Xi

2
Ui Ui 1
Var =E = E (Ui 2 )
α + β Xi α + βXi (α + βXi ) 2

1
σui 2 -----------------------------------------5.31
(α + βXi ) 2

Substitute in equation number of (5.27)

1
Var (Ui)= (k 2 (α + β Xi ) 2 )
(α + βXi ) 2

Var (Ui)= K2

Lets explain using example

Suppose personal saving is dependents upon personal disposable income over a period of 31 year. Given
the collected data in the following table 5.1.

76
S= α + β Yd + Ut − − − − − − − − − 4.30
Where S= personal saving, and Yd is disposable income.

77
Table 5
No. Saving Disposable (Yd Yd )=yd
(S) income(Yd) (S- S )=s s2 yd 2 yd*s Sˆ = αˆ + βˆYd e=(S- Ŝ ) ei 2

1 -
264 8777 -986.32258 13646.03226 972832.2318 186214196.4 13459389.75 95.06082 168.9392 28540.45
2 -
105 9210 -1145.32258 13213.03226 1311763.812 174584221.5 15133184.2 131.7186 -26.7186 713.8836
3 -
90 9954 -1160.32258 12469.03226 1346348.49 155476765.5 14468099.68 194.7056 -104.706 10963.27
4 -
131 10508 -1119.32258 11915.03226 1252883.038 141967993.8 13336764.65 241.6073 -110.607 12233.97
5 -
122 10979 -1128.32258 11444.03226 1273111.845 130965874.4 12912560.01 281.4821 -159.482 25434.55
6 -
107 11912 -1143.32258 10511.03226 1307186.522 110481799.2 12017500.52 360.4699 -253.47 64247
7
406 12747 -844.32258 -9676.03226 712880.6191 93625600.3 8169692.522 431.161 -25.161 633.0769
8
503 13499 -747.32258 -8924.03226 558491.0386 79638351.78 6669130.813 494.8253 8.17466 66.82507
9
431 14269 -819.32258 -8154.03226 671289.4901 66488242.1 6680782.749 560.0135 -129.014 16644.49
10
588 15522 -662.32258 -6901.03226 438671.2 47624246.25 4570709.491 666.0925 -78.0925 6098.442
11
898 16730 -352.32258 -5693.03226 124131.2004 32410616.31 2005783.814 768.3618 129.6382 16806.06
12
950 17663 -300.32258 -4760.03226 90193.65206 22657907.12 1429545.169 847.3496 102.6504 10537.11
13
779 18575 -471.32258 -3848.03226 222144.9744 14807352.27 1813664.493 924.5595 -145.56 21187.57
14
819 19635 -431.32258 -2788.03226 186039.168 7773123.883 1202541.268 1014.299 -195.299 38141.74
15
1222 21163 -28.32258 -1260.03226 802.1685379 1587681.296 35687.36449 1143.66 78.34042 6137.221
16
1702 22880 451.67742 456.96774 204012.4917 208819.5154 206402.0098 1289.021 412.9792 170551.8
17
1578 24127 327.67742 1703.96774 107372.4916 2903506.059 558351.7528 1394.592 183.4082 33638.56
18
1654 25604 403.67742 3180.96774 162955.4594 10118555.76 1284084.85 1519.635 134.3654 18054.05
19
1400 26500 149.67742 4076.96774 22403.33006 16621665.95 610230.0127 1595.49 -195.49 38216.34

78
20
1829 27670 578.67742 5246.96774 334867.5564 27530670.46 3036301.755 1694.542 134.4578 18078.9
21
2200 28300 949.67742 5876.96774 901887.2021 34538749.82 5581223.561 1747.878 452.122 204414.3
22
2017 27430 766.67742 5006.96774 587794.2663 25069725.95 3838729.109 1674.224 342.7762 117495.5
23
2105 29560 854.67742 7136.96774 730473.4923 50936308.52 6099805.175 1854.55 250.4504 62725.4
24
1600 28150 349.67742 5726.96774 122274.2981 32798159.5 2002591.304 1735.179 -135.179 18273.36
25
2250 32100 999.67742 9676.96774 999354.9441 93643704.64 9673846.144 2069.586 180.414 32549.21
26
2420 32500 1169.67742 10076.96774 1368145.267 101545278.8 11786801.63 2103.45 316.55 100203.9
27
2570 35250 1319.67742 12826.96774 1741548.493 164531101.4 16927459.69 2336.265 233.735 54632.05
28
1720 33500 469.67742 11076.96774 220596.8789 122699214.3 5202601.63 2188.11 -468.11 219127
29
1900 36000 649.67742 13576.96774 422080.7501 184334053 8820649.373 2399.76 -499.76 249760.1
30
2100 36200 849.67742 13776.96774 721951.7181 189804840.1 11705978.4 2416.692 -316.692 100293.8
31
2300 38200 1049.67742 15776.96774 1101822.686 248912711.1 16560726.79 2586.012 -286.012 81802.86
Sum
38760 695114 20218310.77 2572501037 217800819.7 -0.35124 1778203
mean 1250.322
58 22423.03226

79
Lower values of the disposable income Table 5.2
N.o Saving Disposable (Yd Yd )=yd
S income(Yd) (S- s2 yd 2 yd*s Sˆ = αˆ + βˆYd e=(S- Ŝ ) ei 2
S )=s
1
264 8777 -67.363 -3414.54 4537.773769 11659083.41 230013.658 32.6414 231.3586 53526.8
2 -
105 9210 226.363 -2981.54 51240.20777 8889580.772 674910.339 70.832 34.168 1167.452
3 -
90 9954 241.363 -2237.54 58256.09777 5006585.252 540059.367 136.4528 -46.4528 2157.863
4 -
131 10508 200.363 -1683.54 40145.33177 2834306.932 337319.125 185.3156 -54.3156 2950.184
5 -
122 10979 209.363 -1212.54 43832.86577 1470253.252 253861.012 226.8578 -104.858 10995.16
6 -
107 11912 224.363 -279.54 50338.75577 78142.6116 62718.43302 309.1484 -202.148 40863.98
7
406 12747 74.637 555.46 5570.681769 308535.8116 41457.86802 382.7954 23.2046 538.4535
8
503 13499 171.637 1307.46 29459.25977 1709451.652 224408.512 449.1218 53.8782 2902.86
9
431 14269 99.637 2077.46 9927.531769 4315840.052 206991.882 517.0358 -86.0358 7402.159
10
588 15522 256.637 3330.46 65862.54977 11091963.81 854719.263 627.5504 -39.5504 1564.234
11
898 16730 566.637 4538.46 321077.4898 20597619.17 2571659.359 734.096 163.904 26864.52
Sum
3645 134107 0.007 0.06 680248.5455 67961362.73 5998118.818 -26.8474 150933.7
mean
331.36364 12191.5455

80
Large values of the disposable income Table 5.3
N.o Saving Disposab (Yd Yd )=
S le (S- S )=s yd s2 yd 2 yd*s Sˆ = αˆ + βˆYd e=(S- Ŝ ) ei 2
income(Y
d)

1
1829 27670 -261.363 -4823.63 68310.61777 23267406.38 1260718.408 1936.483 -107.483 11552.6
2
1600 28150 -490.363 -4343.63 240455.8718 18867121.58 2129955.438 1951.795 -351.795 123759.7
3 -
2200 28300 109.637 -4193.63 12020.27177 17586532.58 459777.0123 1956.58 243.42 59253.3
4 -
2105 29560 14.637 -2933.63 214.241769 8606184.977 42939.54231 1996.774 108.226 11712.87
5 -
2250 32100 159.637 -393.63 25483.97177 154944.5769 62837.91231 2077.8 172.2 29652.84
6
2420 32500 329.637 6.37 108660.5518 40.5769 2099.78769 2090.56 329.44 108530.7
7 -
1720 33500 -370.363 1006.37 137168.7518 1012780.577 372722.2123 2122.46 -402.46 161974.1
8
2570 35250 479.637 2756.37 230051.6518 7597575.577 1322057.038 2178.285 391.715 153440.6
9 -
1900 36000 -190.363 3506.37 36238.07177 12294630.58 667483.1123 2202.21 -302.21 91330.88
10
2100 36200 9.637 3706.37 92.871769 13737178.58 35718.28769 2208.59 -108.59 11791.79
11
2300 38200 209.637 5706.37 43947.67177 32562658.58 1196266.288 2272.39 27.61 762.3121
Sum 135687054.5
22994 357430 0.007 0.07 902644.5455 46 4341055.455 0.073 763761.7
mean
2090.3636 32493.6364

81
Order table 5.1 in assending order of disposable income Table 5.4
Saving(S) Disposable (Yd Yd )=yd
income(Yd) (S- S )=s s2 yd 2 yd*s Sˆ = αˆ + βˆYd e=(S- Ŝ ) ei 2

1 -
264 8777 -986.32258 13646.03226 972832.2318 186214196.4 13459389.75 95.06082 168.9392 28540.45
2 - -
105 9210 1145.32258 13213.03226 1311763.812 174584221.5 15133184.2 131.7186 -26.7186 713.8836
3 - -
90 9954 1160.32258 12469.03226 1346348.49 155476765.5 14468099.68 194.7056 -104.706 10963.27
4 - -
131 10508 1119.32258 11915.03226 1252883.038 141967993.8 13336764.65 241.6073 -110.607 12233.97
5 - -
122 10979 1128.32258 11444.03226 1273111.845 130965874.4 12912560.01 281.4821 -159.482 25434.55
6 - -
107 11912 1143.32258 10511.03226 1307186.522 110481799.2 12017500.52 360.4699 -253.47 64247
7
406 12747 -844.32258 -9676.03226 712880.6191 93625600.3 8169692.522 431.161 -25.161 633.0769
8
503 13499 -747.32258 -8924.03226 558491.0386 79638351.78 6669130.813 494.8253 8.17466 66.82507
9
431 14269 -819.32258 -8154.03226 671289.4901 66488242.1 6680782.749 560.0135 -129.014 16644.49
10
588 15522 -662.32258 -6901.03226 438671.2 47624246.25 4570709.491 666.0925 -78.0925 6098.442
11
898 16730 -352.32258 -5693.03226 124131.2004 32410616.31 2005783.814 768.3618 129.6382 16806.06
12
950 17663 -300.32258 -4760.03226 90193.65206 22657907.12 1429545.169 847.3496 102.6504 10537.11
13
779 18575 -471.32258 -3848.03226 222144.9744 14807352.27 1813664.493 924.5595 -145.56 21187.57
14
819 19635 -431.32258 -2788.03226 186039.168 7773123.883 1202541.268 1014.299 -195.299 38141.74
15
1222 21163 -28.32258 -1260.03226 802.1685379 1587681.296 35687.36449 1143.66 78.34042 6137.221
16
1702 22880 451.67742 456.96774 204012.4917 208819.5154 206402.0098 1289.021 412.9792 170551.8
17
1578 24127 327.67742 1703.96774 107372.4916 2903506.059 558351.7528 1394.592 183.4082 33638.56

82
18
1654 25604 403.67742 3180.96774 162955.4594 10118555.76 1284084.85 1519.635 134.3654 18054.05
19
1400 26500 149.67742 4076.96774 22403.33006 16621665.95 610230.0127 1595.49 -195.49 38216.34
20
1829 27670 578.67742 5246.96774 334867.5564 27530670.46 3036301.755 1694.542 134.4578 18078.9
21
2200 28300 949.67742 5876.96774 901887.2021 34538749.82 5581223.561 1747.878 452.122 204414.3
22
2017 27430 766.67742 5006.96774 587794.2663 25069725.95 3838729.109 1674.224 342.7762 117495.5
23
2105 29560 854.67742 7136.96774 730473.4923 50936308.52 6099805.175 1854.55 250.4504 62725.4
24
1600 28150 349.67742 5726.96774 122274.2981 32798159.5 2002591.304 1735.179 -135.179 18273.36
25
2250 32100 999.67742 9676.96774 999354.9441 93643704.64 9673846.144 2069.586 180.414 32549.21
26
2420 32500 1169.67742 10076.96774 1368145.267 101545278.8 11786801.63 2103.45 316.55 100203.9
27
2570 35250 1319.67742 12826.96774 1741548.493 164531101.4 16927459.69 2336.265 233.735 54632.05
28
1720 33500 469.67742 11076.96774 220596.8789 122699214.3 5202601.63 2188.11 -468.11 219127
29
1900 36000 649.67742 13576.96774 422080.7501 184334053 8820649.373 2399.76 -499.76 249760.1
30
2100 36200 849.67742 13776.96774 721951.7181 189804840.1 11705978.4 2416.692 -316.692 100293.8
31
2300 38200 1049.67742 15776.96774 1101822.686 248912711.1 16560726.79 2586.012 -286.012 81802.86

83
From table 5.1 we can find the following values

S = 1,250.322 s 2 = 20,218,311 Yd 2 = 1815906482

Yd = 22,423.0322 yd 2 = 257,250,1037
Yds = 217,800,819.7 ei 2 = 1778,203 N=31

From equation 5.31 we can estimate α & β

yds 217,800,819.7
βˆ = 2
= = 0.08466
yd 257,250,1037
αˆ = S − βˆY d = 1250.322 − 0.08466(22,423) = −648
Sˆ = − 648 + 0.08644Yd

e2 1,778,203
S .E ( βˆ ) = = = 0.004882
( n − 2) yd 2
31 − 1)2572501037

e2 x2
S .E (αˆ ) = = 118.162
( n − 2) n x2
ei 2 1,778.203
R = 1.
2
2
= 1− = 0.909
yi 257,2501037
βˆ 0.08466
tβˆ = == 17.34
S .E ( βˆ )
0.004882
αˆ − 648
tαˆ = = = − 5.483
S .E (αˆ ) 118.162
Sˆ = −648 + 0.08466Yd
s.e. (118.16) (0.00488) R2=0.909
t (-5.485) (17.34) F=300.73

Now test whether there is hetroscedasticity or not in our model. Let’s test hetroscedasticity using
the goldfeld & Qundt test as follows using the table 5.2 up to 5.4.

Goldfeld&Qundat test

84
1st order the observation (given in the table 5.4 in ascending order of disposable income (Yd)
2nd omit 9 central values (quarter of the total sample size i.e. 1 . Then you will have two sets of
3 x31 ≈ 9
regression equation one which contains small values of Yds (using table5.2) & the large values of Yds
(using 5.3.)
For small values of Yds (from table number 5.2) we will have

S1 = 331.36 s12 = 680,248.54 syd = 5998118.81

Y d1 = 12191.54 yd 2 = 67961362.73 ei 2 = 150933.7
syd 5998118.81
β̂ = 2
= = 0.0882
yd 67961362.73
αˆ = S − βˆY d = 331.364 − (0.0882)(12191.54) ≈ −743
If you calculate S .E (αˆ ) = 189.4 & S .E ( βˆ ) = (0.015)
ˆ 0.0882
Calculated t values will be t β = β ˆ = = 5.88
S .E ( β ) 0.015
αˆ − 743
tα = = = − 3.922
S .E (αˆ ) 189.4
The estimated equation which holds small values of Yd can be written as follows

Sˆ = −743 + 0.882Yd
s.e. (189.4) (0.015)
t(-3.922) (5.88)
ei 2
R2 = 1− = 0.778
Si 2
ei 2 = 150,933.7
Again the estimated for large value of Yds (see in the table 5.4)

S 2 = 2090.36
s 2 = 902,644.54
syd = 4,34`,055.45
Y d 2 = 32,493.63
yd 2 = 135687054.54
e22 = 763,761.7
syd
βˆ = = 0.319
yd 2
αˆ = s − 0.0319Y d = 1053.47
Again s.e. (αˆ ) = 710.3 s.e. ( ( βˆ ) = 0.025
Calculated t values will be

85
ˆ 0.0319
tβ = β ˆ = = 1.276
S .E ( β ) 0.025
αˆ 1053
tα = = = 1.48
S .E (αˆ ) 710.3
We can write the estimated line for large values of Yds.

Ŝ = 1053.47 + 0.0319Yd
s.e. (710.3) (0.025
t (1.276) (1.48)
R2=0.846
e22 = 763,761.7
Now to test whether there is hetroscedasticity or not we should undertake Goldfeld Quadat – using F tes
e22 763,761.7
F test = 2
= ≈ 5.06. this is the computed F value
ei 150933
N −C
The table value will have V1 = − k which is the numerator degree of freedom again equal to
2
the denominator degree of freedom
N= 31 number of samples C = 9 omitted variables
K is the number of estimated parameters in our case 2 (αˆ & βˆ )
31 − 9
V1 = − 2 = 9 Again V2=9 F9, 9 from the table is equal to 3.18 and compare the calculated F*
2
value which is 5.06 with table value of F i.e. 3.18. Since the calculated value is greater than the table
value we reject the null hypothesis (i.e. there is homoscedasticity) & accept the alternative and then in this
model there is hetroscedasticity or the variance of Ui is not constant.

The spearman test of Hertoscedasticity

To calculate the spearman test which is used to test the existence of Hetroscedasticity 1st give a rank
following the value of Yd. & then giving the rank for e, which you obtain from table number

86
Order the the value of Yd in assending order & following this order the value of ei ( from table
5.1) Table 5.5
disposable Order of Order of
Yd ei
income (Yd) ( Yd-ei) = D (Yd-ei)2=D2
1
8777 1 23 22 484
2
9210 2 15 13 169
3
9954 3 13 10 100
4
10508 4 12 8 64
5
10979 5 8 3 9
6
11912 6 5 -1 1
7
12747 7 16 9 81
8
13499 8 17 9 81
9
14269 9 11 2 4
10
15522 10 14 4 16
11
16730 11 20 9 81
12
17663 12 19 7 49
13
18575 13 9 -4 16
14
19635 14 7 -7 49
15
21163 15 18 3 9
16
22880 16 30 14 196
17
24127 17 25 8 64
18
25604 18 21 3 9
19
26500 19 6 -13 169
20
27670 21 22 1 1
21
28300 23 31 8 64
22
27430 20 29 9 81
23
29560 24 27 3 9
24
28150 22 10 -12 144
25
32100 25 24 -1 1
26
32500 26 28 2 4

87
27
35250 28 26 -2 4
28
33500 27 2 -25 625
29
36000 29 1 -28 784
30
36200 30 3 -27 729
31
38200 31 4 -27 729

Sum 4826

r = 1- D2
n(n 2 − 1)
1-- 4826 /29760 = 0.837
Since the Spearman rank correlation is high it indicates the existence of hetroscedasticity

5.1.14 How to remove Hetroscedasticity

Using table 5.1 we detect hetroscedasticity but the question is how to remove hetroscedasticity. Let’s
assume that the pattern of hetroscedasticity is
σui 2 = k 2 Y 2
σui 2 σui
So that k 2 = =k=
Yd 2 Yd
To remove hetroscedasticity transform the original model by dividing X. Then our saving equation will
be transformed as follows

S 1 Ut
=α +β +
Yd Yd Yd
Take table number 5.1 and change it as follows
& it is presented in table number 5.6.

88
N.o. Saving Disposable
( S) (Yd) S/Yd I/Yd (S/Yd)2 i / Yd 2 1/yd*s/yd e=(S/yd- Sˆ / yd ) ei 2

1
264 8777 30.0786 1.1393 349.0196 0.3532 -11.1035 -722.4 521858.3
2
105 9210 11.4007 1.0858 1395.7717 0.2924 -20.2034 -722.4 521865.2
3
90 9954 9.0416 1.0046 1577.6060 0.2113 -18.2557 -722.4 521875.8
4
131 10508 12.4667 0.9517 1317.2536 0.1654 -14.7592 -722.4 521882.6
5
122 10979 11.1121 0.9108 1417.4138 0.1338 -13.7730 -722.4 521887.9
6
107 11912 8.9825 0.8395 1582.3005 0.0867 -11.7142 -722.4 521897.2
7
406 12747 31.8506 0.7845 285.9497 0.0574 -4.0499 -722.4 521904.3
8
503 13499 37.2620 0.7408 132.2192 0.0383 -2.2514 -722.4 521910.0
9
431 14269 30.2053 0.7008 344.3006 0.0243 -2.8913 -722.4 521915.2
10
588 15522 37.8817 0.6442 118.3519 0.0098 -1.0797 -722.4 521922.6
11
898 16730 53.6760 0.5977 24.1607 0.0028 0.2592 -722.4 521928.6
12
950 17663 53.7847 0.5662 25.2413 0.0004 0.1063 -722.4 521932.7
13
779 18575 41.9381 0.5384 46.5478 0.0000 0.0453 -722.5 521936.3
14
819 19635 41.7112 0.5093 49.6947 0.0013 0.2517 -722.5 521940.1
15
1222 21163 57.7423 0.4725 80.6692 0.0053 -0.6510 -722.5 521944.9
16
1702 22880 74.3881 0.4371 656.7653 0.0117 -2.7661 -722.5 521949.5
17
1578 24127 65.4039 0.4145 276.9969 0.0170 -2.1724 -722.5 521952.4
18
1654 25604 64.5993 0.3906 250.8613 0.0239 -2.4461 -722.5 521955.5

89
19
1400 26500 52.8302 0.3774 16.5609 0.0281 -0.6822 -722.5 521957.2
20
1829 27670 66.1005 0.3614 300.6683 0.0337 -3.1835 -722.5 521959.3
21
2200 28300 77.7385 0.3534 839.7150 0.0367 -5.5534 -722.5 521960.3
22
2017 27430 73.5326 0.3646 613.6494 0.0326 -4.4697 -722.5 521958.9
23
2105 29560 71.2111 0.3383 504.0212 0.0427 -4.6406 -722.5 521962.3
24
1600 28150 56.8384 0.3552 65.2490 0.0360 -1.5328 -722.5 521960.1
25
2250 32100 70.0935 0.3115 455.0874 0.0545 -4.9806 -722.5 521965.8
26
2420 32500 74.4615 0.3077 660.5341 0.0563 -6.0990 -722.5 521966.3
27
2570 35250 72.9078 0.2837 583.0835 0.0683 -6.3099 -722.5 521969.4
28
1720 33500 51.3433 0.2985 6.6698 0.0608 -0.6366 -722.5 521967.5
29
1900 36000 52.7778 0.2778 16.1371 0.0714 -1.0735 -722.5 521970.2
30
2100 36200 58.0110 0.2762 85.5693 0.0722 -2.4861 -722.5 521970.4
31
2300 38200 60.2094 0.2618 131.0737 0.0802 -3.2425 -722.5 521972.2
Sum -
16.8957 2.1086 152.3450 16179999.0

90
1 s
yd 2 = 2.1085 x10 −8 yd 2 = 4.49 x10 −10
1 s = −1,52345 x10 −5
yd yd

s .1
yd yd
βˆ = = −722.50
1 2
yd
αˆ = s yd − βˆ 1 = 0.088
yd
sˆ = −722.50 + 0.88 1
Yd
ˆ
St
= αˆ 1 + βˆ
Ydt ydt
Sˆt
= −722.5 1 + 0.088 R2= 0.77
Ydt xt
The transformed equation must be multiplied by Yd & the equation will be
Sˆt = −722.5 + 0.088Ydt. This equation is free of hetroscedasticity

5.2 Autocorrelation
One of the assumption of OLS is the successive values of the random term Ui are temporarily
independent i.e. the value of Ui at time t is not correlated with Ui at t-1 period. This is called Ui'
s are not-
correlated with each other or there is no autocorrelation or Ut &Ut-1 are not serially dependent (they are
serially independent)
This assumption of no autocorrelation (serially independent) states that the covariance between Ut&Ut-1
is equal to zero or the successive values of Ut & Ut-1 covariance is zero
Cov (UtUt-1) = E{[Ut-E(Ut)][Ut-1-E(Ut-1]}
We know that from our assumption
E(Ut)=0 & E(Ut-1) = 0 then we left E[Ut-Ut-1]=0
But if the assumption is violated i.e. if the value of Ut (the random term at time t) is correlated with
(depend up on) its own previous value (i.e Ut-1) we say that there is autocorrelation or the random term
Ui is serially dependent. Autocorrelation is a special type of correlation & it refers to only the successive
value of the same variable but it doesn't refer to the successive values of different variables.

5.2.1 Graphic methods of detecting autocorrelation

There are two methods that are commonly used to obtain a rough idea of the existence or absence of
autocorrelation in the disturbance term- Ui- since ei = Yi − Yˆi estimates of the true Ui, and then if e's are
found to be correlated it will suggest that Ui are auto correlated with each others. Then we can use the
following methods.
1. By plotting the scatter diagram of ei's i.e the variable correlation we attempt to detect are et & et-1 the
observation point to be plotted are (e1,e2) (e2,e3), (e3, e4)--- (en,en-1) . Suppose if you have
Yi = α + β Xi + Ui model
• 1st estimates the model using the data & find Yˆ = αˆ + βˆXi then find the residual values

91
i.e ei = Yi − Yˆi . After the value of ei'
s are found we can calculate the value of et-1, as
follows.
et et-1
e2 e1
e3 e2
34 e3
. .
. .
. .
en en-1

In the first case when time period is 2 then et will be e2 & et-1 will be e2-1=e1 by doing so you can get the
values e1e2, e2e3, e3e4 etc. Now plotted these corresponding values on the two dimensional diagrams. If on
plotting, most of the points (etet-1) fall in 1st & 3rd quadrant (as shown in fig a) we can say that there is
positive autocorrelations. i.e. the product between et & et-1 are positive. If most of the points are fall
(etet-1) in quadrant 2nd &4th there will be negative autocorrelation because the product et et-1 are negative
(as shown in fig b).

et et

et-1 et-1

et-1 et-1
et et
Fig(a) positive outcome Fig. (b) Negative outcome

92
Autocorrelation may be positive or negative but in most of the cases of practice autocorrelation is
positive. The main reason for this is economic variables are moving in the same direction. Ex. in period of
boom employment, investment, output, growth of GNP,consumption etc are moving up wards & then the
random term Ui will follow the same pattern and again in periods of recession all the economic variables
will move down words & the random term will follow the same patterns.
• Another methods commonly used in applied econometric research for the detection of
autocorrelation is to plot the residuals against time (t)- & we will have two alternatives.
a) If the sign of successive values of the residuals etet-1 are changing rapidly their sign we can say
there is negative autocorrelation.
b) If the sign of successive values of etet-1 do not change its sign frequently i.e. several positives are
followed by several negatives values of etet-1 we can conclude that there is positive
autocorrelation. This can be seen using the following diagram

+ve et +ve et

Negative e6
autocorrelation e6
e1 e3 e5 e6
e6
e5
e6
e6 e4
e3
e2
e2 e4 e6 Positive
e1 autocorrelation

-ve et
-ve et

5.2.3 Other methods of detecting autocorrelations

1) The Runs Test:- In this test we take in to considerations the signs of residuals i.e. the
residuals will be listed as positive & negative ones in sequence.
Ex. we may have the following sequences
(-e1,-e2,-e3,-e4,-e5,-e6,-e7,-e8) (+e9, +e10, +e11, +e12, +e13, +e14, +e15, +e16, +e17, +e18, +e19, +e20,
+e21,) (-e22) (+e3) (-e24,-e25,-e26,-e27,-e28,-e29,-e30,-e31,-e32)
Thus there are 8 negative residuals followed by 13 positive residuals, followed by a negative &
a positive residuals, and finally followed by 13 negative residuals.
d) We now define a run as uninterrupted sequence of one symbol or
attribute such as -ve or +ve

93
e) Length of run = as the number of elements in the run. From our above
sequence of e' s we have 5 runs i.e. uninterrupted values of e's i.e. 8-ve e'
s followed by 13 +ve e's, 1 -
ve e, by 1 +ve & then by 13 -ve e' s. By examining how run behave in strictly random sequences of
observation one can derive a test of randomness of runs. If there are too many runs, it would mean
that in our example the e' s are changing the signs frequently, thus indicating negative autocorrelation.
Similarly if there are too few runs, they may suggest positive autocorrelation.
Now let’s represent the variables
n= total number of sample observations, equal to n1 +n2
Where n1 is the number of +ve symbols of e' s
n2 is the number of -ve symbols of e's
K= number of runs
Assuming that n1>10 (the values of e' s) & n2>10 (-ve values of e'
s) the number of runs is
distributed normally with
2n1 n 2
Mean = E(K) = +1
n1 + n 2
2n n ( 2n n − n − n )
Variance of K = σ 2 k = 1 2 12 2 1 2
(n1 + tn2 ) (n1 + n2 − 1)
Now using our hypothesis testing we hypothesis as follow
The null hypothesis H0= Cov (etet-1) = 0. No autocorrelation against
The alternative H1 Cov (etet-1) ≠ 0. There is autocorrelation
There is autocorrelation and the establish confidence interval with 95% confidence.
[E ( x) − 1.96σk ≤ k ≤ E ( K ) + 1.96σk ]
If the estimated k is lies in this limit accept the null hypothesis that there is no autocorrelation & reject the
existence of autocorrelation. Again if the estimated K is lies outside this limit reject the null hypothesis &
accept the alternative that there is autocorrelation.
In our example
n1=14 i.e. total number of +ve residuals (e' s)
n2 = 18 i.e total number of -ve residuals (e's)
Using our formula
2(14)(18)
E(K ) = + 1 = 16.75 Mean value of K
14 + 8
2(14)(18)[2(14)(18) − 14 − 18]
Var(K) = σ 2 k = = 7.49395
(14 + 18) 2 + (14 + 18 − 1)
S .E (k ) = δ 2 = 7.49395 = 2.7375
The 95% confidence interval is calculated as follows. From 95% confidence interval the level of
significance is 1- 0.95 which is equal to 0.05 and again we will have two tail test then
α = 0.05 = 0.025 from the normal table 0.025 value will be obtained at the intersection point of 1.9
2 2
in the first column & 0.06 in the first raw of the normal table. Then the normal distribution value of the
two tail test of 5% significance level is 1.96. Hence the 95% confidence interval of k( number of Runs)
K ± 1.96σk & this will be
16.75 ± 1.96(2.7375) = [11.3845 & 22.11.5]
Or this can be written as
11.3845 < k < 22.1155

94
But the calculated run value (K) is 5 form our example i.e. 1st -ve e' s, 2nd +ve e's 3rd -ve'
s, 4th +ve'
s 5th -
ve's. Since the calculated value of K is lying outside the table value of K we reject the null hypothesis
that ther is no autocorrelation between e’s & accept the alternatives that e'
s are correlated with each other.
2) Von Neumen ratio tests for the existence of autocorrelation. This is the ratio of the variance of the
first difference of any variable X over the variance of X. And this test is applicable for directly
observed series & for variables which are random i.e. variables whose successive values are not auto
correlated.
n
( xt − xt − 1) 2
σ 2 n −1
= i =2

Sx 2 ( xt − x ) 2
n
In case of the random term Ui this values are not directly observable but are estimated from the OLS
residuals (e'
s). For large samples (n>30) the Neuman ratio is
n
(et − et − 1) 2
i =2
σ2 n −1
=
Se 2 ( xt − x ) 2
n
Since the method is applicable only for large sample size i/e/ n>30. Then as the sample size is
σ2
very large then may be approximately normally distributed with
Se 2
σ2 2n
E =
Se 2
n −1
σ2 4 n 2 ( n − 2)
Variance = Var =
Se 2 (n + 1)(n − 1) 2
To under take the test 1st compute the Von Numan ration 2ndly by using the formula for mean & variance
determine the confidence interval. Then
• if the calculated value of Von Numan ratio is lies in the confidence interval accept the null
hypothesis that there is no autocorrelation reject the alternative.
• If the calculated value of Neuman test ratio is lie outside the confidence interval we reject the
null hypothesis & accept the alternative that there is autocorrelation in the model.
3) Durbin - Watson Test
The most celebrated test for the defection of serial correlation (autocorrelation) is developed by two
statisticians Durbin & Watson. It is popularly known as the Durbin- Watson d- statistics. This test is
applicable
a) For small samples
b) This is based upon the sum of the squared differences in successive values of the estimated
disturbance term.
c) The test is appropriate only for the first-order autoregressive scheme i.e. only when the successive
values are correlated with each other meansif et is correlated with et-1 et etc. but not et correlated
with et-2 & so on.
The d- statistics will be obtained using the following formula.

95
n
(et − et − 1) 2
d= t =1
n
et 2
t =1
From the above formula we can observe the following situation
i. If the successive values of the disturbance term is tend to close to another i.e. if positive value
of et is followed by positive value of et-1. The value in the numerator (et-et-1)2 will be very
small. Then we would expect positive autocorrelation results in small values of d.
ii. If there is large differences between et & et-1 i.e when the successive values of et & et-1 have
different signs it will generate large difference in the numerators. The signal for these types
of autocorrelation is an unusually large value of d which indicates negative autocorrelation.
The d- statistics test may be outlined as follows.
Step 1: If the null hypothesis H0 = p= 0 means that there is no- auto correlation (the e' s are serially
independent or e' s are not serially dependent). Against the alternative that H1=P=0 means that the e' s are
serially correlated (dependent) with each other or there is autocorrelation between the successive values
of et & et-1. Then we compute the d- statistics to test the null hypothesis
n
(et − et − 1) 2
d= t =2
n
et 2
t =1
And this
n
(et 2 − 2etet − 1 + et 2 − 1
d= n
t=2
et 2
t =1
n
(et − 2etet − 1 + et 2 − 1
2
d= n
t =2
et 2
t =1
n n n
et 2 + et 2 − 2 etet − 1
d= t =2 t =2
n
n= 2

et 2
t =1
If n is very large ( if we have large sample size) the following terms
n n n
et 2 ≈ et 2 − 1 ≈ et 2 are approximately equal there for we may write
t =2 t =2 t =1

2 et − 12
2 etet − 1
d= −
et 2 − 1 e 2t − 1
et 2 − 1 etet − 1
d =2 −
et 2 − 1 e 2t − 1
etet − 1
d = 2 1−
e 2t − 1

96
etet − 1
Let Pˆ = (We will prove this latter on)
et 2 − 1
d ≈ 2(1 − Pˆ )
From the above relations we will have the following
a) if there is no autocorrelation i.e. Pˆ = 0 the d = 2
b) if Pˆ = 1 then d-statistics will be zero ( d=0) & we will have perfect positive autocorrelation
c) If Pˆ = −1 d will be 4 then there will perfect negative autocorrelations.
Now we can conclude from the above values of P̂ the value of d is lies between 0&4 (o<d<4). The next
step is to compare the computed value of d* with the table value of d. The problem associated with this
test is the exact distribution of d is not known. Due to this Durbin & Watson have established arange of
values with in which we can neither accept nor reject the null hypothesis. These are the upper (dU) &
lower limits (dL) for the significance level of d which are appropriate to test no autocorrelation against
the alternative hypothesis of the existence of autocorrelation. For the two tailed Duration- Watson test we
have set of five regions for the value of d in diagram as follows.
Inconclusive region

Critical region
Inconclusive region

Reject H0& accept

that there is –ve
Critical region

Accept H0 autocorrelation
Reject H0

No
autocorrelation

d
dL du (4-du) (4.dL)

i) If calculated d is less than dL (i.e. d<dL ) we reject the null hypothesis & accept the
alternative that there is autocorrelation & the type of autocorrelation is positive
autocorrelation
ii) If the calculated d is less than (4-dL) i.e [d<(4-dL)] we reject the null hypothesis of no
autocorrelation & accept the alternative that there is autocorrelation. The type of
autocorrelation is negative autocorrelation.
iii) If the calculated d value is lieing between du & 4-du [du<d<(4-du)] we accept the null
hypothesis of no autocorrelation & reject the alternative.
iv) If the calculated d have the following type
[dL<d<du] of if (4—du)<d<(4-du)<d<(4-dL) the test will be inconclusive.

97
Short comings of the Durbin-Watson test
a) If the regression equation has no intercept term i.e if Yi = β Xi + Ui in this case the regression is
assumed to pass through the origin. Then the econometrician should re-run the regression by
including the intercept term to obtain e’s.
b) The distribution of U’s are generated by the first order auto regressive means et is depend upon
et-1, & et-1 on et-2 & so on but it will not be applicable if et is depends upon et-2.
c) The regression equation does not include the lagged values of one of the explanatory variables.

4.) An alternative test for Autocorrelation

This alternative test for autocorrelation has the advantage of applicability to any form of autocorrelation
& it provides estimates of the coefficients of the autocorrelation relationship, which are required for
remedial transformation of the original observations. The procedure is as follows.
1st apply OLS to the sample observation i.e

Yi = α + βXi + U
Regress your model by fitting the data & obtain the values of the estimated residuals i.e. e’s
2nd Since we are not sure in a priori about the existence of autocorrelation, we may experiment with
various forms of autoregressive structures, for example

et= Pet-1 +vt

Or et=Pet2-1 +vt or
Or et=P1et-1+P2et-2 +vt or
et = P et − 1 + vt and so on
Vt satisfies all the assumption of OLS. In the above equation et represents (Yt − Yˆt ) values & et-1
represents all one year lagged values of et. p is the estimated coefficients (estimator) Then the statistical
significance of the estimation P̂ & the over all significance of the above fitted regression will be judged
on the basis of S.E test, t test & F test. If P̂ is found statistically significant means

Pˆ Pˆ
> S .E ( Pˆ ), and , t >
2 S .E ( Pˆ )
R2
F= k −1 > F . This implies that the calculated value (F*) is greater than the table
(1 − R 2 )
N − k)
value (F) then we accepts that there is autocorrelation in the model

5.2.4 Consequence of autocorrelation

An OLS technique is based on the basic assumptions of mean, variance & covariance of the random term
Ui. If these assumptions do not hold well on what so ever account, the estimator’s derived using OLS
procedure may not be efficient.

a) OLS estimates are unbiased

Even after the residuals are serially correlated the parameter estimates of OLS are statistically unbiased.
i.e their expected value is equal to the true parameters.

xiUi
β 1 = β 1 +
xi 2

98
We have already formulated the above. Taking the expected values
xi (Ui)
E ( βˆ ) = E ( β 1 ) +
xi 2
We know that E(Ui)=0 then
E ( βˆ ) = β + 0

E ( βˆ ) = β
Thus irrespective of whether the random term ei is serially independent or not the estimates of the
parameter ( βˆ ) have any statistical bias as long as Ui& Xi’s are uncorrelated.

b) The variances of OLS estimates are underestimated.

When there is autocorrelation in the random term Ui the variance of the parameter estimates are likely to
be larger than those parameters estimated using other econometric methods.
xiui
( βˆ ) = β1 +
xi 2
Var ( βˆ ) = E[ βˆ − E ( βˆ1 )]2
We know that E ( βˆ ) = β
xiUi
& β1 = β1 + then substitute these values in Var ( βˆ )
xi 2
2
xiUi
Var ( βˆ ) = E β1 + −β
xi 2
2
xiUi
E
xi 2
2
xi xixj
Var ( βˆ ) = E (ui 2 ) + 2 E (uiuj )
xi 2 ( xi ) 2
From the above formula if Ui’s are serially independent E(UiUj)=0 & the term
xixj
2 E (uiuj ) = 0 and left with
( xi ) 2
2
xi
E (ui 2 )
xi 2
When the ui are serially dependent (when there is autocorrelation)
2
xi xixj
Var ( βˆ ) = E (ui 2 ) + 2 E (uiuj )
xi 2 ( xi ) 2

99
If you compare the two variance of β̂ [for no autocorrelation] & equation of autocorrelation the variance
of ( β̂ ) with no autocorrelation is smaller than the var( β̂ ) with autocorrelation.
c) The variance of the random term Ui may be seriously underestimated if the U’s are auto
correlated. In particular the var of Ui will be seriously underestimated in the case of positive
autocorrelation of the error term (Ui). As a result R2 ,t & F statistics are exaggerated.
d) Finally if the value of U is auto correlated, the prediction based up on OLS estimates will be
inefficient. It means the forecasting made on OLS estimates will be incorrect because this
prediction will have large variance as compared with prediction based on estimators obtained
from other techniques.

5.2.5 Sources of autocorrelation

The random term Ui may be correlated with preceding random term or autocorrelation may be observed
for many reasons.
i. Omission of explanatory variables:- Most of economic variables are tend to be auto correlated
with each other if we exclude those explanatory variables which are auto correlated, so their
influence will be captured by the random term Ui. This case may be called “qusi-autocorrelation’
since the autocorrelation is occurred due to the pattern of omitted explanatory variables. In this
case the autocorrelation is occurring not due to the behavior of the random term Ui but following
the pattern of omitted explanatory variable. If several auto correlated explanatory variables are
omitted then their effect on the random term may not be observed because the autocorrelation
pattern of the omitted regressions may be cancelled out with e ach others.
ii. Misspecification of the mathematical model. If we have adopted a mathematical form which
differs from the true form of the relation ship, the U’s may show serial correlation. For example if
we have chosen a linear function.

Yt = α + β1 X 1i + β 2 X 2 i + Ui
While the true relation ship is

Yt = αX 1i β1 X 2 i2β e ui
Which transform in to log Yt = log α + β 1 log X 1i + β 2 log X 2 i + Ui .
Then if you estimate using the linear functions then in this case Ui will appear to be serially dependent.
But had the researcher uses the log model Ui will not be serially dependent.
iii. Interpolations in the statistical data. In most of the published time series data when some values
of the true disturbance term Ui will be interrelated & exhibit autocorrelation pattern.
iv. Misspecification of the true random term –Ui. In some cases the time value of Ui may be
correlated successively. For example war which is captured by a random term Ui may affect the
current year production & exerts its influence in some future period production. Or drought in
Ethiopia will affect the crop yield in the agricultural sector which intern influences the
performance of other sectors of the economy in several time periods. Such causes result in
serially dependent autocorrelation (values of the disturbance term Ui). Then during this period of
time if we specify the random term E(UiUj)=0 then we are misspecifying the true pattern of Ui.
This kind of autocorrelation is called “true autocorrelation” because its root lies in the random
term Ui itself.

5.2.6 Solution for autocorrelation

The solution for autocorrelation or the type of corrective action in each particular autocorrelation is
depends on the causes or sources of autocorrelation.

100
a) If the source is omitted variables then the appropriate measure is to incorporate those excluded
explanatory variables.
Ex. If we have qusi- autocorrelation. Consumption in period t (Ct) is depends upon current income
(Xt), previous income (Xt-1), wealth (W), previous consumption behavior of individual (Ct-1).
Ct = α + β1 Xt + β 2 Xt − 1 + Wt + Ct − 1 + Ut
- Now if you omitted the lagged income then its influence will be reflected in the random term Ui
& autocorrelation will occur.
- If you omitted lagged income & wealth their effect may be cancelled out with each other & their
influence on the random term may not be observed & Ui will be serially independent.
To eliminate the autocorrelation which appear following the omitted variable (Xt-1) is to introduce the
omitted variables in to the function.
b) If the source of autocorrelation is misspecification of the mathematical form of the relationship
the relevant approach is to change the functional form. This can be investigated by regress the
residuals against higher powers of the explanatory variables or by computing a linear in logs form
& re-examining the resulting new results.
Given the above cases, if autocorrelation is observed the appropriate procedure is
a) To transform the original data
b) Applying OLS to the transformed data

Transformation of the original dataThe transformation of the original data depends up on the patterns
of the autoregressive structure which may be first order or higher order autoregressive.

First- Order autoregressive structure

Suppose we have the model

Y = α + β1 Xt + Ut
If in this model the autocorrelation is the first order scheme means Ut is depend up on Ut-1 i.e. e1e2, e2e3,
e3e4 etc. This is called also first order autoregressive. Then Ut is correlated with its preceding values.
Ut= Put-1+vt
Where vt satisfies all the assumption of Ui (OLS assumptions). P= is coefficient of Ut-1 since Ut is not
observable but we can approximate using the sample observation by obtaining et. There fore
et= Put-1+vt
The estimated value will be
eˆt = Pˆ et − 1
Vt = et − eˆt
v 2 t = (et − eˆt ) 2
= (et − pˆ et − 1)
Because eˆt = Pˆ et − 1
∂vt 2
= −2 (et − pˆ et − 1)(et − 1) = 0
∂pˆ
etet − 1 − pˆ et 2 − 1 = 0
pˆ = et 2 = etet − 1
etet − 1
pˆ =
et 2 − 1

101
This is the estimated value of autoregressive of the first order i-e when et is correlated with et-1. To
transform the original data take the lagged form of equation.

Yt − 1 = α + β Xt − 1 + Ut − 1
Multiply the above equation by p̂
p̂ Yt − 1 = pˆ α + pˆ β Xt − 1 + pˆ Ut − 1
Subtract this equation from the original equation

(Yt − pˆ Yt − 1) = (α − pˆ α ) + ( βXt − pˆ β Xt − 1) + (Ut − pˆ Ut − 1)

Now let’s represent
Yt − pˆ Yt − 1 = Y *
α − pˆ α = α *
β Xt − pˆ β Xt − 1 = β Xt *
Ut − pˆ Ut − 1 = vt
We can write the above equation as follows.
Y * = α * + β X * t + vt
Now we can apply OLS to the transformed relation to obtain αˆ * & βˆ *
α * = α − p̂α
α * = (1 − pˆ )α
α*
α* =
1− p

1
Var (αˆ ) = var(α *)
1 − pˆ
The estimators obtained are efficient, if only our sample is large so that loss of one observation becomes
negligible. The above procedure is possible only when the value of p̂ is known. Now we can describe the
method through which the parameters of the auto correlated model can be estimated.

Method I. A priori in formation on p̂

In many of the econometric research an investigator make some reasonable guess about the value of the
autoregressive coefficient by using his knowledge of the relationship between variable under the study. A
usual case is to assume that p̂ =1. The model will be transformed as follows.
Yt = α + β Xt + Ut
Lag this model by one period then you will have
Yt − 1 = α + β Xt + Ut − 1
Multiply by p̂
pˆ Yt − 1 = pˆ α + pˆ β Xt − 1 + pˆ Ut − 1
If we assume that p̂ =1 (perfect positive autocorrelation) then we left with
Yt − 1 = α + pˆ β Xt − 1 + Ut − 1
Subtract this equation from the original

(Yt − Yt − 1) = (α − α ) + β ( Xt − Xt − 1) + Ut − Ut − 1
Let Yt-Yt-1=Yt*

102
Xt-Xt-1= X*
Ut-Ut-1=Vt
Then we can write
Yt* = βXt * + vt
Here α is suppressed in this case & we will have the equation that will passes through the origin.
Suppose one assumes that there is perfect negative autocorrelation i.e pˆ = −1
The original model is

Yt = α + β Xt + Ut
Lagged the original model by one year.

Yt − 1 = α + β Xt − 1 + Ut − 1
pˆ = −1 Multiply by (-1) the lagged value
− Yt − 1 = −α − βXt − 1 − Ut − 1 Subtract from the original model
Yt − (−Yt − 1) = α − (α ) + β Xt − (− βXt − 1) + Ut − (−Ut − 1)
Yt + Yt − 1 = 2α + β ( Xt + Xt − 1) + Ut + Ut − 1
Divided both sides by 2 (because to make free the intercept term α )
Yt + Yt − 1 ( Xt + Xt − 1 Ut + Ut − 1
=α +β )+
2 2 2
This model is called two period moving average regression models because we are regressing the
value of one moving average
Yt + Yt − 1 Xt + Xt − 1
on
2 2
This method of first difference is quite popular in applied research for its simplicity. The problem with
this method is that it depends up on the assumptions of the perfect positive or perfect negative
autocorrelation in the data. But now the question arises how to know whether the value of p̂ is equal to
+ve or –ve 1? The answer is estimation of p̂ in the following ways.

Method II. Estimation of p̂ from the d* statistics

Another crude method for the estimation of the coefficient of the autoregressive scheme is to solve for
the p̂ .
d ≈ 2(1 − pˆ )
Suppose we calculate certain value of d- statistics
1
pˆ = 1 − d *
2
p̂ Will not be accurate if the sample size is small. The above relationship of d-statistics will be true for
large samples. For small samples Theil & Nagar have suggested the following relation
n 2 (1 − d ) + k 2
pˆ = 2
n − k2
2

Where n= total number of observations

d= Durban – Watson statistics
k= number of coefficients (including intercepts) to be estimated

103
Example - Given the following model Yt = α + β Xt + Ut

Table 5.7
Y X (X- x )2=x2 (Y- Y )2=y2 xy Yˆ = α̂ + β̂ X (Y-Yˆ )=e e2

1
2 1 49 25 35 0.63 1.37 1.8769
2
2 2 36 25 30 1.54 0.46 0.2116
3
2 3 25 25 25 2.45 -0.45 0.2025
4
1 4 16 36 24 3.36 -2.36 5.5696
5
3 5 9 16 12 4.27 -1.27 1.6129
6
5 6 4 4 4 5.18 -0.18 0.0324
7
6 7 1 1 1 6.09 -0.09 0.0081
8
6 8 0 1 0 7 -1 1
9
10 9 1 9 3 7.91 2.09 4.3681
10
10 10 4 9 6 8.82 1.18 1.3924
11
10 11 9 9 9 9.73 0.27 0.0729
12
12 12 16 25 20 10.64 1.36 1.8496
13
15 13 25 64 40 11.55 3.45 11.9025
14
10 14 36 9 18 12.46 -2.46 6.0516
15
11 15 49 16 28 13.37 -2.37 5.6169
Sun
105 120 280 274 255 41.768
Average
7 8

And given data for Y & X in table 5.7

We can get the following values
Y =7
x =8
yi 2 = 274
xi 2 = 280
xiyi = 255
ei 2 = 41.76874
xiyi 255
βˆ = 2
= = 0.91
xi 280

104
αˆ = y − βˆx = 7 − (0.91)(8) = 7 − 7.28 = −0.28
xiyi (255)
r 2 = β1 2
= 0.91 = 0.85
yi 274
Yˆ = −0.28 + 0.91Xt
ei 2 1 41.768
S .E ( βˆ ) = . = = 0.107
n−k xi 2 3640

n=14 & K=2( αˆ & βˆ )

ei 2 xi 2 41.768 x1240 51792

S .E (αˆ ) = . = =
n−k n xi 2
(13)(15)(280 54600
S .E (αˆ ) = 0.948

Derived from table number 5.7 table 5.8

Et et-1 et2 et—et-1 (et--et-1)2 (et-1)2
1
1.37 ---- 1.8769 1.37 1.8769 0
2
0.46 1.37 0.2116 -0.91 0.8281 1.8769
3
-0.45 0.46 0.2025 -0.91 0.8281 0.2116
4
-2.36 -0.45 5.5696 -1.91 3.6481 0.2025
5
-1.27 -2.36 1.6129 1.09 1.1881 5.5696
6
-0.18 -1.27 0.0324 1.09 1.1881 1.6129
7
-0.09 -0.18 0.0081 0.09 0.0081 0.0324
8
-1 -0.09 1 -0.91 0.8281 0.0081
9
2.09 -1 4.3681 3.09 9.5481 1
10
1.18 2.09 1.3924 -0.91 0.8281 4.3681
11
0.27 1.18 0.0729 -0.91 0.8281 1.3924
12
1.36 0.27 1.8496 1.09 1.1881 0.0729
13
3.45 1.36 11.9025 2.09 4.3681 1.8496
14
-2.46 3.45 6.0516 -5.91 34.9281 11.9025
15
-2.37 -2.46 5.6169 0.09 0.0081 6.0516
sum
41.768 60.2134 36.1511

Test the existence of Durbin-Watson test

(et.et − 1) 2
d* =
et 2

105
et 2 = 41.768
2
(et − et − 1) = 60.2134
60.2134
d* = = 1.442 . This is the calculated value of d. The table value of d at 5%
41.768
significance level N=15 & one explanatory variable (k=1) is DL= 1.08 du = 1.36. Since D* (calculated
1.44) is greater than du (1.36) and again d* is less than (4-du) i.e. 2.64 the calculated statistics is lying
between du & 4-du then there is no autocorrelation in the given sample of X & Y

F(d)

No autocorrelation
1.08=dL

1.36=dU

2.64=4-dU

Example 2
Suppose the hypothetical data on consumption expenditure and disposable income are given in the table
5.9 if the estimated function is given by
C = α + β Xt + Ut
Where C= consumption expenditure
Yt= disposable income

106
107
S.n Ct Yt C Y c2 y2 Cy ĉ (C- et2 et-1 (et-et-1) (et-et- (et-1)2 et*et-1
ĉ )=et 1)

1
206.3 226.5 -140.647 -152.84 19781.69 23358.84 21495.99 208.48 -2.18 4.75 0.00 0.00
2
216.7 238.6 -130.247 -140.74 16964.39 19806.62 18330.50 219.44 -2.74 7.52 -2.18 -0.56 0.32 4.75 5.97
3
230 252.6 -116.947 -126.74 13676.69 16062.01 14821.45 232.13 -2.13 4.52 -2.74 0.62 0.38 7.52 5.83
4
236.5 257.4 -110.447 -121.94 12198.63 14868.39 13467.51 236.47 0.03 0.00 -2.13 2.15 4.63 4.52 -0.05
5
254.4 275.3 -92.5474 -104.04 8565.02 10823.49 9628.26 252.69 1.71 2.92 0.03 1.68 2.83 0.00 0.04
6
266.7 293.2 -80.2474 -86.14 6439.65 7419.41 6912.19 268.91 -2.21 4.88 1.71 -3.92 15.35 2.92 -3.77
7
281.4 308.5 -65.5474 -70.84 4296.46 5017.74 4643.12 282.77 -1.37 1.88 -2.21 0.84 0.70 4.88 3.03
8
290.1 318.8 -56.8474 -60.54 3231.63 3664.61 3441.31 292.10 -2.00 4.01 -1.37 -0.63 0.40 1.88 2.75
9
311.2 337.3 -35.7474 -42.04 1277.88 1767.03 1502.68 308.86 2.34 5.46 -2.00 4.34 18.83 4.01 -4.68
10
325.2 350 -21.7474 -29.34 472.95 860.60 637.98 320.37 4.83 23.33 2.34 2.49 6.22 5.46 11.28
11
335.2 364.4 -11.7474 -14.94 138.00 223.08 175.46 333.42 1.78 3.18 4.83 -3.05 9.28 23.33 8.61
12
355.1 385.5 8.1526 6.16 66.46 37.99 50.25 352.53 2.57 6.59 1.78 0.78 0.61 3.18 4.58
13
375 404.6 28.0526 25.26 786.95 638.27 708.72 369.84 5.16 26.65 2.57 2.60 6.74 6.59 13.25
14
401.2 438.1 54.2526 58.76 2943.34 3453.21 3188.10 400.19 1.01 1.02 5.16 -4.15 17.23 26.65 5.22
15
432.8 473.2 85.8526 93.86 7370.67 8810.45 8058.47 431.99 0.81 0.66 1.01 -0.20 0.04 1.02 0.82
16
466.3 511.9 119.3526 132.56 14245.04 17573.21 15821.86 467.05 -0.75 0.56 0.81 -1.56 2.44 0.66 -0.61
17
492.1 546.3 145.1526 166.96 21069.28 27876.98 24235.26 498.22 -6.12 37.43 -0.75 -5.37 28.80 0.56 4.60
18
536.2 591 189.2526 211.66 35816.55 44801.65 40057.96 538.72 -2.52 6.33 -6.12 3.60 12.97 37.43 15.39
19
579.6 634.2 232.6526 254.86 54127.23 64955.66 59294.77 577.86 1.74 3.04 -2.52 4.26 18.15 6.33 -4.39
Sum 7207. 272019.2 246471.8 145.9
6592 4 223468.51 4 4 144.73 2 141.68 67.87

Table 5

108
From the table we found the following information
Ct = 6592
Yt = 7387.4
ct 2 = 223,468.5
c = 346.95
Y = 379.3368
yi 2 = 272,019.2
ctyt = 246,471.8
ctyt 246,471.8
βˆ = 2
= = 0.906
yt 272,019.2
αˆ = c − βˆx = 246.95 − (0.906)(379.3368) = 3.27
cˆ = 3.27 + 0.906Yd
βˆ ctyt
R2 = = 0.99
ct 2
Now the estimated consumption function shows that almost all variation in consumption is explained by
disposable income because R2=0.99 (99%). Let’s examine the error terms ui is auto correlated or not.
(et − et − 1) 2
d=
et 2
(et − et − 1) 2 = 145.916
et 2 = 144.7274
145.916
d= = 1.0082 This is the calculated value of d. At 5% significance level n= 19
144.7274
& K=1 the table value is d1=1.18 & du=1.40 since our calculated d-value 1.008 is falls below the lower
bound value of dL, then we can conclude that we reject the null hypothesis of no autocorrelation & we
accept the alternative that there is positive autocorrelation in the disturbance term.

5.27 Solution to remove autocorrelation in our model

Assuming that there exists first order autocorrelation form i.e. e1 is depending upon e2, e2 on e3 and so on.
Then
et = Pet-1 +vt
Where vt satisfies all the assumption of Ut.
etet − 1
pˆ = From the above data
et 2 − 1
The following results etet − 1 = 67.8732 & et 2 − 1 = 141.683
67.8732
pˆ = = 0.479
141.683
Now we transform the consumption expenditure as follows
Ct = α + β Yt + Ut
Lag by one period

109
Ct − 1 = α + β Yt − 1 + Ut − 1
Multiply it by P̂ which is 0.48 in our case.
0.48Ct − 1 = 0.48α + β 0.48Yt − 1 + 0.48Ut − 1 .
Subtract it from the original consumption expenditure function & you will have

Ct = 0.48Ct − 1 = (1 − 0.48)α + β Yt − 0.48Yt − 1) + (Ut − 0.48Ut − 1 ).

Now C * t = Ct − 0.48Ct − 1
Y * t = Yt − 0.48Yt − 1
V * t = Ut − 0.48Ut − 1

First transform the previous data i.e table 5.9 & the new estimated model will be

C * t = a + βY * t + vt
2 2
ct* =36,975.91 y t* = 45,270.8
2
ct* yt * = 40,854.01 Y * = 179
C * = 163.7436,975.91

110
Table 5.10
sn Ct-1 Yt-1 .48Ct- .48Yt- C* Y* c* y* c*2 y*2 c*y*
1 1
1 -
216.7 238.6 104.016 114.528 102.284 111.972 61.45289 -67.028 3776.458 4492.753 4119.064
2 -
230 252.6 110.4 121.248 106.3 117.352 57.43689 -61.648 3298.996 3800.476 3540.869
3 -
236.5 257.4 113.52 123.552 116.48 129.048 47.25689 -49.952 2233.214 2495.202 2360.576
4 -
254.4 275.3 122.112 132.144 114.388 125.256 49.34889 -53.744 2435.313 2888.418 2652.207
5 -
266.7 293.2 128.016 140.736 126.384 134.564 37.35289 -44.436 1395.238 1974.558 1659.813
6 -
281.4 308.5 135.072 148.08 131.628 145.12 32.10889 -33.88 1030.981 1147.854 1087.849
7 -
290.1 318.8 139.248 153.024 142.152 155.476 21.58489 -23.524 465.9074 553.3786 507.7629
8 -
311.2 337.3 149.376 161.904 140.724 156.896 23.01289 -22.104 529.5931 488.5868 508.6769
9 -
325.2 350 156.096 168 155.104 169.3 8.632889 -9.7 74.52677 94.09 83.73902
10 -
335.2 364.4 160.896 174.912 164.304 175.088 0.567111 -3.912 0.321615 15.30374 2.218539
11
355.1 385.5 170.448 185.04 164.752 179.36 1.015111 0.36 1.030451 0.1296 0.36544
12
375 404.6 180 194.208 175.1 191.292 11.36311 12.292 129.1203 151.0933 139.6754
13
401.2 438.1 192.576 210.288 182.424 194.312 18.68711 15.312 349.2081 234.4573 286.137
14
432.8 473.2 207.744 227.136 193.456 210.964 29.71911 31.964 883.2256 1021.697 949.9417
15
466.3 511.9 223.824 245.712 208.976 227.488 45.23911 48.488 2046.577 2351.086 2193.554
16
492.1 546.3 236.208 262.224 230.092 249.676 66.35511 70.676 4403.001 4995.097 4689.714
17
536.2 591 257.376 283.68 234.724 262.62 70.98711 83.62 5039.17 6992.304 5935.942
18
579.6 634.2 278.208 304.416 257.992 286.584 94.25511 107.584 8884.026 11574.32 10140.34
sum
2947.264 3222.368 -2E-07 0.368 36975.91 45270.8 40854.01

Where C=Ct-.48Ct-1, Y=Yt-.48Yt-1,

c* = C*- C * y*= Y*- Y *

111
ct* yt*t 40,854.01
βˆ = * 2
= = 1.10488
ct 31,975.91
aˆ = C * βˆY * t = 163.74-(1.10488)(179.02)= -34.9722
aˆ − 34.9722
αˆ = = = 67.254
ˆ
1 − P 1 − 0.488

The regression model after the data is transformed can be written as

Cˆ * = −34.9722 + 1.11Yt *
This model after transformation of the data can be expressed in terms of original variable as

Cˆ t = −34.9722 + 1.11Yt.
n=18
Example 3: Given the following model

Model A = Yt = α + β1t + ut
Model B= Yt = α + β 1t + β 2 t 2 + ut
Where t is time then regression on data for 1948-1964 gave the following results.

Model A = Yˆ = 0.4529 − 0.041t

(-3.9608) R2=0.5284 d=0.8252
Model B= Yˆt = 0.4786 − 0.0127t + 0.0005t 2
t=(-3.2724) (2.777)
R2=0.6629
d=1.82
a) Is the serial correlation in model or in model B? The period is 1949-1964 then n=16 years which
is the sample size in both the models. In model A we have only one explanatory variable K=1
given 5% significance level, n=16 &K=1 from the d-table dL=1-10 & du is 1.37. The calculated d
value of the regression model A is 0.8252 since it is less than the lower value of d-statistics then
we reject the null hypothesis & accept that in model A the random term Ui is positively auto
correlated with each others.
b) In case of model B we have two explanatory variables K=2, N=16 from the table at 5%
significance level dL=0.98 & du is 1.54. The calculated d-value of model B is 1.82 which is
greater than du then again here there is negative autocorrelation in random term Ui.

112
5.3 Multicollinearity

In the assumptions of OLS we were assuming that the explanatory variables are not perfectly linearly
correlated. i.e.
Yi = α + β X 1t + β 2 X 2 t + β 3 X 3t + Ut
Then we assume that the explanatory variables X1,X2 & X3 are not perfectly correlated means rX1X2 ≠ 1,
r X1,X3 ≠ 1, rX2,X3 ≠ 0. But if it does not hold we speak of there is perfect multicollinearity with the
explanatory variables. If the explanatory variables are multicollinear then we can not identify the
independent effects of the explanatory variables on the explained variables.
Suppose
Yt = α + β 1 L + β 2 k + β 3 Z + β 4T + Ut
Where L = lab our, K is capital, Z is land & T is technology Y is output.
Then if rLk=1 we can not trace the independent effects (contribution) of each in put factors on output.
When the explanatory variables are correlated with each other the method of OLS breaks down. In theory
we may have two situations in the relationship between explanatory variables.
a) rx1x2=1 i.e when the explanatory variables are perfectly correlated & the estimated parameters
using OLS method will be indeterminate i.e. β̂ will be indeterminate.
b) If rx1x2≠0 if the explanatory variables are not inter-correlated with each other & this is called the
variables are orthogonal.
In practice neither of these extreme cases is often met. In most cases there is some degree of inter-
correlation among the explanatory variables due to interdependence of many economic variables over
time. The following basic points should be considered when we are dealing about multicollinearity.
1) Multicollinearity is a sample phenomenon. When we assume a theoretical or population
regression model we believe that all explanatory variables included in the model have in
dependent effect (influences) on the dependent variable. But in this case we may have two
options.
• In our sample all explanatory variables are so highly correlated & then we can not isolate their
individual influences on the dependent variable
• In the sample some explanatory variables may be correlated with each other.
In any case we can not satisfy the assumption of independency among explanatory variables.
2) Multicollinearity is a question of degree & not kind. Means it is not a question of whether the
multicollinearity among explanatory variables is negative or positive. But what matters is the
extent the degree of correlation among the explanatory variables.
3) Multicollinearity is the problem arises due to the presence of linear relationship among the
explanatory variables.

5.3.1 Sources of multicollinearity

The problem of multicollinearity may arise for various reasons.
1st Due to the inherent characteristics of economic variables they tend to move together overtime. Means
economic magnitudes are influenced by the same factors. Once such factors becomes operative in the
economy all variables tends to change in the same direction.
Ex. During boom period profit, income saving, output, employment, investment, consumption,
prices etc are tend to rises (move in the same direction) & decreases in period of recession. In the
time series data growth & trend factors are the main cause of multicollinearity.
2nd Multicollinearity may arise due to use of lagged explanatory variables in the regression
model.

113
Ex. Ct = α + β1Yt + β 2Yt −1 + Ut
Where Ct=consumption expenditure.
Yt current income, Yt-1 is previous income. Here Yt & Yt-1 may be
correlated with each others. Hence the problem of multicollinearity may be observed in distributed lag
models.
3rd Multicollinearity is usually connected with time series data because economic variables are move
together in the same directions.
4th – Multicollinearity is also quite frequent in cross sectional data where
Q= ALα Kβeut
Where Qt= out put
Lt = lab our & Kt is capital
Then if you collect data from different manufacturing at a single period (at a time) then you will find that
in large firms capital & lab our are very high as comp aired to small firms i.e. capital & lab our are tend to
move in the same direction & will correlated with each other.
5th- An over determined model: - when the model has more explanatory variables than the number of
observations there will be multicollinearity. This could happen in medical research where there may be a
small number of patients about who information is collected on large number of variables.

5.3.2 Consequence of multicollinearity

The presence of multicollinearity affects the least squares estimators & renders them inefficient. The
problem of multicollinearity must therefore be regarded as *a ‘block mark’ that reduces the confidence in
conventional test of least squares estimators.
The basic consequences of multicollinearity are
a) The least square estimators are indeterminate
b) The variance & covariance of the estimators becomes infinitely large means the standard errors of
these estimates will be very large.We prove there as follows
Suppose the relationship that will be estimated is
Yt = α + β1 X 1t + β 2 X 2 t + Ut
Assume further that X1 & X2 are related with each other & their relation is X2=KX1 where K is any
arbitrary number. The formulae for the estimation of the coefficients β̂1 & βˆ 2 are

βˆ1 =
( x1 y ) ( )
x 22 − ( x 2 y )( x1 x 2 )
() xx ) x12 )( x 22 − ( 1 2
2

βˆ2 =
( x y )( x ) − ( x y )( x x )
2
2
1 1 1 2

( x )( x ) − ( x1x ) 2
1
2
2 2
2

Now substitute KX1 in place of X2 & you will find that

βˆ1 =
( x1 y ) (
) − ( kx y )( x kx )
k1 x1
2
1 1 1

( x )( kx ) − ( k x x )
2
1 1
2
1 1 2
2

=
k 2
( x y )( x ) − k ( x y )( x ) = 0
1
2
1
2
1
2
1

k ( x ) −k ( x )
2 2 2
1
0 2 2 2
1

Again
βˆ =
( kx y )( x ) − ( x y )( 1
2
1 1 x1kx1 )
k ( x ) −k ( x )
2 2 2 2 2 2
1 1

114
=k
( x1 y )( )
x12 − k ( x1 y ) ( x1
2
) = 0/0
k2 ( x2 2
1 ) −k 2
( x1
2 2
)
βˆ ,& βˆ 2 are indeterminate i.e. there is no way of finding separate values for βˆ ,& βˆ2 . The standard
error of the estimates becomes infinitely large.
Suppose we have
Yt = α + β1 X 1t + β 2 X 2 t + Ut
If X1 & X2 are correlated (X2=KX1) then the variance of βˆ ,& βˆ2
ˆ x22
Will be Var ( β1 ) = σu 2

x22 − ( x1 x2 )
2 2
x1

Substitute in place of X2 = Kx1

(kx) 22
Var ( βˆ1 ) = σu 2
x1
2
( kx1 ) − (
2
x1 kx1 )
2

σu 2 k 2 x12
=
k2 x1
2
x12
2
x1 − k 2 ( x1 )
2 2

k2 k22 σu 2 x 21
= σu 22
= =∞
k ( x
2 2
) −( x1 )
2 2 0

Again to find the variance of Var ( βˆ2 )

σu 2 x12
Var ( βˆ2 ) =
x12 x2 2 − ( x1 x 2 ) 2

Substitute KX1 in place of X2

σu 2 x12
Var ( βˆ 2 ) =
x1
2
( xk k ) −(2
x1 kx )2

σu 2 x12
=
k2 x1
2
x12
2
x1 − k 2 ( x1 )
2 2

σu 2 x 21
= =∞
0
Thus the variance of the estimates become infinite unless σu 2 = 0
To summarize
a) Although we obtain OLS estimators, their standard error tends to be large as the degree of
correlation between the variables increase.
b) Because of large standard errors the probability of accepting a false hypothesis (i.e type II error)
increases.
c) The OLS estimates & their standard errors become very sensitive to slightest change in the
sample data.

115
d) If multicollinearity is high, one may obtain a high R2 but none of them or very few estimated
regression coefficients are statistically significant.

5.3.3 Test for detecting multicollinearity

1) Method based on Frisch' s confluence analysis or Bunch - map analysis. According to this test the
seriousness of multicollinearity may be detected by
a) coefficient of determination R2
b) Partial correlation coefficient ϒ x1x2
c) Standard error of the regression coefficients.
But none of these symptoms by itself is a satisfactory indicator of multicollinearity because
i. large standard errors may arises for various reasons not only due to the presence of
multicollinearity
ii. high ϒ x1x2 is not only a sufficient but no a necessary condition for the existence of
multicollinearity
iii. R2 may be high & yet the estimates may not be significant & imprecise.
However a combination of this entire criterion should help the detection of multicollinearity, as has been
suggested in Frisch' s test. In order to gain as much knowledge as possible as to the seriousness of
multicollinearity it is suggested the following method.
If the regression model is assumed to be
Yi = α + β X 1 + β X 2 t + β X 3 + Ui
Means that Y= f(X1X2X3) then the procedure is regress Y on X1, X2, &X3 separately i.e. stepwise first
regress Y on X1 then on X2 & finally on X3. Thus
a) In elementary regression we examine their results on the basis of a priori & statistical criterion.
We choose the elementary regression which aspects to give the most possible results on both a
priori & statistical criterion. Then gradually insert additional variables & we examine their effects
on the individual coefficients i.e. standard error & R2. And the new variable is classified as
useful, superfluous or detrimental as follows.
i. If the new variable improves R2 without rendering the individual coefficients unacceptable
(wrong) on a priori considerations, the variable is considered useful & is retained as an
explanatory variable.
ii. If the new variable improves R2 and does not affect to any considerable extent the value of
the individual coefficients, it is considered as superfluous & is rejected.
iii. If the new variable affects considerably the sign or values of the coefficients it is considered
as detrimental.
If the individual coefficients are affected in such away as to become unacceptable on theoretical, a priori,
considerations, then we may say that this is a warning for the existence of multicollinearity. In this case
the new variable is important, but because of intecorrelation with the other explanatory variables its
influence can not be assessed statistically by OLS. This does not mean that we must reject the detrimental
variable.If we omit the detrimental variable completely to avoid multicollinearity.
a) we must bear in mind that the influence of the detrimental variable will be absorbed by other
coefficient and
b) The influence may be absorbed by the random term which may become correlated with the
variable left i.e. E(XUi) ≠ 0. It will violate the assumption of OLS which is Cov (UiXi)=0
2) Examination of partial correlations. If in the regression Y on X2,X3 & X4 the R2 is very
high but ϒ2X3X4, ϒ2X2X4, ϒ2X2X3, are comparatively low, this may suggest that the variables X2,X3 &
X4 are highly interconnected.
3) Auxiliary regression. Here we regress each X on the remaining X variables & compute the
corresponding R2, which we denote by R2i. Each one of these regression is called auxiliary

116
regression. That is auxiliary to the main regression of Y on the X variables. Then based
on the relation between the F test & R2.

R 2 x1 x 2 , x3 ...x k
Ri = k −2
1 − R x1 , x 2 , x3 ...x k
2

(n − k + 1)
Follow the F distribution with k-2-numerator
N-k+1 denominator degree of freedom. In the equation
N-stands number of sample size
k- Number of explanatory variables including the intercept term
R2 X1X2...Xk the coefficient of determination in the regression of variable
Xi on the remaining X variables. If the computed F at chosen level of significance exceeds the table
value of F it would indicate the presence of multicollinearity between Xi & the other X variables. If it
does not, we say that the particular Xi is not collinear & we may retain the variable in the model. This
rule is some how simplified if we use Klein’s rule of the thumb which states that we except
multicollinearity if the R2 computed from the auxiliary regression is greater than the over all R2 obtained
from the regression of Y on all repressors. The above test of multicollinearity will show us the location of
multicollinearity.
4) Tolerance & Variance inflation factor. The variance & covariance of the multiple regression i.e.
Yi = α + β1 X 1i + β 2 X 2 + Ui
Can be written as
σu 2
Var ( βˆ1 ) =
x12 i (1 − r 2 x1 x 2 )
σu 2
Var ( βˆ2 ) =
x12i (1 − r 2 x1 x2 )
− rx1 x2σu 2
Cov( βˆ2 βˆ1 ) =
(1 − r 2 x1 x2 ) x12i. x22i
From the above if ϒx1x2 tend to wards 1 that is, co linearity increases, the variance of the two estimator
increases & in the limit ϒx1x2=1 the variance of the estimators will be infinite. It is equally clear from the
above formulae as ϒx1x2 increases towards 1, the covariance of the two estimators also increase in
absolute value. The speed with which the variance & covariance increases can be seen with the variance
inflation factor (VIF) which is defined as
1
VIF =
(1 − r 2 x1 x 2 )
VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As ϒ2x1x2
approaches to 1. The VIF approaches infinity. As the extent of co linearity increases, the variance the
estimator increases, & in the limit it can become infinite. If there is no co linearity ϒx1x2=0 then VIF will
be 1. Then we can write the variance of follows
σ2
Var ( βˆ1 ) = VIF
x12i

ˆ σ2
Var ( β 2 ) = VIF
x 22 i

117
This shows the variance of βˆ ,& βˆ 2 are directly proportional to the VIF. We could also use what is known
as the measure of tolerance which is defined as
TOL J = (1 − R 2j ) = 1
VIF j
From which TOLj is 1 if there is no co linearity where as it is zero when there is perfect col linearity.
5) Computation of t- ratio to the pattern of multicollinearity. The t- test helps to detect those
variables which are the cause of multicollinearity. The test is performed on the partial correlation
coefficients through the following procedure of hypothesis testing.
The null hypothesis is H0= ϒxixj.x1x2x3...xk=0
The alternative is H1= ϒxixj.x1x2x3...xk≠0
In three variable models
(r12 − r13 .r23 ) 2
ϒ2x1,x2x3= 2 2
(1 − r13 )(1 − r23 )
(r13 − r12 .r32 ) 2
ϒ2x1,x3.x2= 2 2
(1 − r12 )(1 − r32 )
2 (r23 − r21.r31 )2
ϒ x2,x3. x1= 2 2
(1 − r21 )(1 − r31 )
The form this we can compute the t-test for each estimator as follows
(r 2 xi x j .x1 , x2 , x3 ...xk ) n − k
t* =
1 − r 2 xi x j .x1 , x2 ,...xk
If calculated t* is > the table value of t reject the null hypothesis that no-multicollinearity & accept that
there is multicollinearity. If calculated t* t table value accept the null hypothesis that there is no
multicollinearity & reject he alternative that there is multicollinearity.

5.3.4 Solutions to the problem of multicollinearity

The solutions which may be adopted if multicollinearity exists in a function vary depending up on
a) severity of the multicollinearity
b) availability of other sources data
c) on the importance of factors which are multicollinear
d) On the purpose of which the function is being estimated & other conditions.
To solve these problems
i. Some writers have suggested that if multicollinearity does not seriously affect the estimates
of the coefficients, one may tolerates its presence in the function, although the integrity of
OLS estimates is to a certain extent impaired.
ii. Others suggested that if multicollinearity affects some of the unimportant factors one may
exclude these factors from the function. Again specification error may be expected to
undermine the BLUE character of the ordinary least squares.
iii. Multicollinearity may affect only a part of β̂ '
s while other estimates may remain fairly stable
& reliable. In this case
• The reliable β̂ '
s may be used for any purpose. i.e forecast or policy formulation
• All the estimates may be used for forecasting provided that the same multicollinearity pattern will
continue to exist in the forecast period.
Given all the above if multicollinearity has serious effect on the coefficients estimates of important
variables one should adopt any one of the correlation solution.

118
1) Ridge Regression: - In this case just added a constant number λ to the variance of the explanatory
variables before solving the normal equations. This is found to decrease the problem of
multicollinearity but it is a mechanical solution as there is no clear cut method of solvingλ.
2) Increase the size of the sample!- To avoid multicollinearity it is suggested that multicollinearity
may be avoided or reduced by increasing the sample size. The reason for this is that as you
increase the sample size the covariance among the parameters (X' s) get reduced. But one should
remember that this should be true when inter correlation happens to exist only in the sample but
not in the population, the procedure of increasing the size of the sample will not help to reduce
the multicollinearity.
3) Dropping a variable (s):- The problem of multicollinearity may be reduced or solved if we drop
the variable(s) that is (are) collinear. But here we must be carefully not to commit a specification
bias or specification error.
Let Yi = β1 X 1 + β 2 X 2 + Ui
From this model
E ( βˆ1 ) = β
σ 2u
Var ( βˆ 1) =
xi 2 (1 − r 2 x1x 2 )
Now let’s omit variable X2 by expecting that it is collinear with X1 & we will have the following
model
Yi = β1 X 1 + +Ui
Model after omitting X2, the estimator will have
X 1 Yi
βˆ1 * = 2
X1
In place of Yi substitute β1 X 1 + β̂ 2 X 2 + U

ˆ βˆ1 X 1 X 1 + βˆ 2 X1 X 2 + X 1Ui
β1 * = 2
X1
We know that from our assumption X 1Ui =0 then
X 12 X1 X 2
βˆ1 * = βˆ1 2
+ βˆ 2 2
X1 X1
X1 X 2
E ( βˆ1 *) = βˆ1 + βˆ 2 2
X1
X 1u σu X1 2
σu 2
2

Var ( βˆ1 *) = Var = =

Xi 2 X1
2 2
( X2
2
)
Thus the estimator after omitting the variable X2 is biased (large mean) but has smaller variance than
β̂1 (completed model variance). Therefore, dropping a variable with the hope to estimate the problem of
multicollinearity may lead to biasness of estimators.
4) Introducing an additional equation in the model. The problem of multicollinearity may be
overcome by expressing the relationship between multicollinear variable. Such relation in a form
of an equation may then be added to original model. The addition of null equation transforms

119
our single model (original) in to simultaneous equation model. The reduced form method can
then be applied to avoid multcollinearity.
5) Use extraneous information:- Extraneous information means the information obtained from any
other source outside of the sample which being used for estimation. Extraneous information may
be available from economic theory or from some empirical studies already conducted in the field
in which we are interested. We can use the following methods which extraneous information is
utilized in order to deal with the problem of multicollineaity.
A) A priori information. Suppose we consider the following model
Yi = α + β1 X 1i + β 2 X 2 + Ui
Where Yi= consumption X1i=income & X2i wealth. Income & wealth variables may move together &
create multicollinearity. But suppose in a priori if we know that β 2 = 0.10β 1 i.e the rate of change of
th
1
consumption with respect to wealth is of the corresponding rate with respect to income. Then we can
10
write the regression
Yi = α + β1 X 1i + 0.10β1 X 2 i + Ui
Yi = α + β1 ( X 1i + 0.10 X 2 i ) + Ui
Let X1i + 0.10X2 = X*1 substitute
Yi = α + β1 X 1 * +Ui
Run regression on the model /apply OLS/ & obtain the estimator β1 & from the relationship between
βˆ1,& βˆ 2 . You can calculate β̂ 2 from β̂ 2 =0.10 β1
B) Combining cross sectional & time serious data or pooling cross section & time series data.
Suppose we want to study the demand function for automobile & assume that we have a time
series & cross sectional data; on the number of cars sold, average price of the cars, & consumers
income. Suppose we have the following
Yt = αPt β 1 It β 2 et
ut

Given that the demand function is studied using the above model. Where. Yt= demand for cars (number
of cars sold), Pt is average price of the cars, I is income & t is time. If our objective is to estimate the
price elasticity of demand (i.e. β1 ) & income elasticity using β 2 . First transform the non-linear function
in to linear function i.e.
ln Yt = ln α + β1 ln Pt + β 2 ln It + ut.
In time series data the price & income variables generally tend to be highly collinear i.e. one can not
separate the income & price effect on quantity demanded. On the other hand it is not possible to obtain
price effect B1 from the cross sectional data (because price structure is the same for all consumers at a
particular point of time). Under such condition it is suggested to use pooling techniques which avoids to a
certain extent the problems associated with both (cross & time series) of sample data. Pooling
techniques can be outlined as follows.
• 1st Cross-section sample is used to obtain an estimate of the income coefficient using the
following
ln Yt = α + β 2 * ln It + Ut
And find the influence of change in income (It) on demand Yt is eliminated.
Zt =Yt -- β̂ 2 It
nd
• 2 stage. The new variable obtained (Z) is regressed on price variable using time series
data to estimate price coefficients β̂ 2
Z t = α + β , ln Pt + Ut

120
By combining the result, the relationship becomes
ln Yt = ln αˆ + βˆ1 ln Pt + βˆ 2 * It
Where β̂1 is derived from the time series data & βˆ2 * is obtained from the cross sectional data. By
pooling techniques we have skirted the multicolinearity between income & price. The estimators obtained
using pooling the time - series and cross-sectional data in the manner just suggested may create problems
of interpretation. Because we are assuming implicitly that the cross sectional estimated income elasticity
(βˆ *) is the same thing as that which would be obtained from a pure time series analysis.
2
C) Transformation of variables: Given that we have time series data on consumption (Ct), income
(UE) & wealth (W).
Ct = α + β 1Yt + β 2Wt + Ut
From the above model income & wealth may tend to move in the same direction & creates
multicollinearity. One way of minimizing this dependence is to proceed as follows. Take the lagged (t-1)
values of the above model & you will have
Ct − 1 = α + β 2Yt − 1 + β 2Wt −1 + Ut − 1
Subtract from the previous model
Ct − Ct − 1 = (α − α ) + β1 (Yt − Yt − 1) + β 2 (Wt − Wt − 1) + (Ut − Ut − 1)
Let C*t = Ct-Ct-1
Y*t = Yt-Yt-1
W*t= Wt-Wt-1
Vt= Ut-Ut-1
C t * = β 1Y * t + β 2Wt * +Vt
This equation is known as the first difference form because we run the regression, not on the original
variable, but on the difference of successive values of the variable. The first difference regression model
often reduces the severity of multicuollinearity because there is no a priori reason to believe that the first
difference model will also be highly correlated. But the problem with this difference transformation
model is the error terms Vi may not satisfy one of the assumption of CLR i.e. the disturbance term are not
serially correlated.

121
Excersice unit 6:

1) a) What is meant by perfect multicolinarity? What is its effect?

b) What is medant by high multicollinarity but not perfect multicollinearity?
c) How can multicollinearity be detected? How can you avoid or reduce ulticollinearity?
2) Aresearcher collects the following data from 15 firms about their capital, labour & out put.

Firms Output in tones Labour employeed Capital

1 2350 2334 1570
2 2470 2425 1850
3 2110 2230 1150
4 2560 2463 1940
5 2650 2565 2450
6 2240 2278 1340
7 2430 2380 1700
8 2530 2437 1860
9 2550 2446 1880
10 2450 2403 1790
11 2290 2301 1480
12 2160 2253 1240
13 2400 2367 1660
14 2490 2430 1850
15 2590 2470 2000

a) Fit a cobbdoglas production function find te coffeicent of determination?

b) Regress Q on L only
c) Regress Q on K
d) What can you say about the multicollinearity?
2) a) What is meant hetroscedasticity?
b) Why hetroscedasticity is a problem?
c) How the presence of hetroscedasticity is tested?
d) Given the following data on consumption expenditure C & disposable income Yd for 30 families

122
Family Consumptio Yd Family Consumption Yd

1 10600 12000 16 14400 17000

2 10800 12000 17 14900 17000

3 11100 12000 18 15300 17000

4 11400 13000 19 15000 18000

5 11700 13000 20 15700 18000

6 12100 13000 21 16400 18000

7 12300 14000 22 15900 19000

8 12600 14000 23 16500 19000

9 13200 14000 24 16900 19000

10 13000 15000 25 16900 20000

11 13300 15000 26 17500 20000

12 13600 15000 27 18100 20000

13 13800 16000 28 17200 21000

14 14000 16000 29 17800 21000

15 14200 16000 30 18500 21000

a) Regress consumption on disposable income & test the exisitance of hetroscedisticity?

b) If there is hetroscedasticity correct it using appropriate method?( assuming the
variance is proportional to Yd2( square of the disposable income)a) What is meant
autocorrelation?

4) a) Why autocorrelation is a problem?

b) How can autocorrelation be corrected?
c) How is the presence of positoive or negative first order autocorrelation tested?
d) The presence of autocorrelation can be tested by calculating the Durbin-Watson statistic.
e) Given the following data on import & GDP from 1980 -1999. Then regress M on GDP &
test for autocorrelation at 5% level of significance?

123
Year Imports GDP
1980 299.2 2918.8
1981 319.4 3203.1
1982 294.9 3315.6
1983 358 3688.8
1984 416.4 4033.5
1985 438.9 4319
1986 467.7 4537.5
1987 536.7 4891.6
1988 573.5 5258.3
1989 599.6 5588
1990 649.2 5847.3
1991 639 6080.7
1992 687.1 6469.8
1993 744.9 6795.5
1994 859.6 7217.7
1995 909.3 7529.3
1996 992.8 7981.4
1997 1087 8478.6
1998 1147.3 8974.9
1999 1330.1 9559.7

f) Correct autocorrelation if it is found in the above data?

124
Chapter 6 : Estimation with Dummy Variables

The variables used in regression equation usually take values over some continuous range. In regression
analysis the dependent variable is frequently influenced not only by variables that are quantitative
(income, output, prices, costs, heights etc.) but also by variables that are essentially qualitative in nature
(sex, race, profession etc.). Dummy variables are constructed by econometricians to be mainly used as
proxies for other variables which can not be measured in any particular case for various reasons. Dummy
variables are commonly used as proxies for qualitative factors such as sex, profession, religion etc. Since
such qualitative variables usually indicates the presence or absence of a quality or an attribute such as
male or female, black or white, etc. One method of quantifying such attribute is by constructing artificial
variables that take on values of 1 or 0. 1 indicates the presence of an attribute & 0 indicates the absence of
an attribute. Variables that assume 1 or zero values are called dummy variables, or indicator variables,
binary variables & dichotomous variables. Suppose the firm utilized two types of production process to
obtain its output.

Yi = α + β1 X 1t + Ut
Where Yi is output obtained
X1 is a dummy variable.
1 if the output is obtained from machine A
Xi= 0 if the output is obtained from machine B

By applying OLS you obtained αˆ & βˆ .

• α̂ Measures the mean excepted output associated with machine B

• β̂ Measures the difference in mean output associated with a change from machine B to
machine A. (by how much the mean output of machine B is deviating from the mean
output of machine A).
• αˆ + βˆ the mean expected output from machine A.
The dummy variable procedure can easily be modified if more than two distinct values are involved.
Assume that in the above example three different processes may be employed & one wishes to account
that the three processes may not give identical output. Then in this case we constructed two dummy
variables to take in to account the three different process of production (Always when there are N
variables we develop N-1 dummy variable.)
Suppose
Yt = α + β1 X 1t + β 2 X 2 + Ui
The output is produced using three methods i.e A, B & C.
X1t= 1 if the output is obtained from machine A and 0 otherwise (if it comes from B & C the value will be
zero).
X2t = 1 if the output is obtained from machine B & otherwise zero. ( if it comes from A&C the value will
be zero).
• In this case α̂ represents the mean value of output obtained from machine C.
• β̂1 = represents the difference in output associated with a change from machine C to machine A.
• αˆ + βˆ = the mean value output obtained from machine A.
• β̂ 2represent the difference in output associated with a change from machine C to
machine B.

125
• βˆ 2 + αˆ = the mean value of output obtained from machine B.
The above examples consist of dummy variables as explanatory variable. Such models are called analysis
of variance (ANOVA) i.e the dependent variable is quantitative but the explanatory variables are
qualitative. But in most economic research, regression model contains a mixture of quantitative &
qualitative variables. Such as

Ct = α + β1 x1t + β 2 Dt + Ut
Where Ct= consumption expenditure
X1t = income of the consumer
Dt= is a dummy variable indicating wind fall gain & loss.
In this case we have quantitative & qualitative variables as explanatory variables. Such models are called
Analysis of covariance (Acov).

Ct = α + β1Y1t + β 2 D1t + β 3 D2 t + β 4 D3t + Ut

Where Ct= consumption expenditure of the house hold
D1t=
1 if the house hold has children
0 if no children (other wise)

1 if the house hold owns its own house

D2= 0 (other wise)

1 if the age of the house hold is over 70 years

D3= 0 (other wise)

Notice that each of the three qualitative variables, having children, owning house, & age of the house
hold, has two categories & hence need one dummy variable for each. Not also that the omitted or base,
category now as “a house hold with no child, no house & less than 70 years” is given by the following
equation.

Ci = α + β1Yt + Ut
To find the mean consumption of this house hold is given by

E(Ct)= α + β Yt + Ut
E(Ut)=0
E(Ct) = α + β Xt
Mean value of the consumer who has a child but no house & less than 70 years. i.e. D1t=1, D2t=0
& D3t=0. Then
Ct = α + β1Yt + β 2 D1t + Ut
E (Ct ) = E [α + β1Yt + β 2 D1t + Ut ]
E (Ct ) = α + β1Yt + β 2 D1
Because E (Ut)=0
Again this is the mean consumption expenditure of the consumer. By the same analogy you can
continue in such away.

126
6.1 Some important uses of Dummey variables
Analysis of Covariance model
A) Use of dummy variable for measuring the shift of a function. A shift of a function implies that the
constant intercept changes in a different period while the other coefficients remaining the same.
Such type of shift can be examined by introducing a dummy variable in the function under study.
Suppose if we want to study the consumption expenditure of the society as a whole for the period
1910-1950. We know that there were two break through events where occurring during these
periods. These are.
• 1st world war in 1910’s & 2nd WW in 1940’s
• Great depression was occurring in 1930’s
Given these two events the remaining period is assumed to be normal years. Then generally we can
divided these period as normal years & abnormal (war & depression period) years. During these periods it
is expected to have different consumption expenditure patterns of the society. Suppose we assume that
during the abnormal & the normal period marginal propensity to consume is constant but there is a
change with respect to the autonomous /subsistence level/ consumption. The consumption function will
be

Ct = α + β1Yt + β 2 Dt + Ut
Where Ct = consumption expenditure
Yt= income of the consumers

1 for normal years

Dt= 0 for abnormal years

The normal years function (when Dt=1) all periods)

Ct = α + β1Yt + β 2 + Ut
Ct = (α + β 2 ) + β1Yt + Ut
Given that α + β = γ then
Ct = γ + β1Yt + Ut
Cˆ t = β Yt
1
The abnormal years function (when Dt=0) would be
Ct = α + β1Yt + β 2 + Ut
Because Dt=0 in all periods
Cˆ t = α + β1Yt
Then we apply OLS using two explanatory variables Yt & Dt & obtain estimates of Yt &Dt in the
regression equation i.e.

Ct = α + β1Yt + β 2 Dt + Ut
Apply OLS (Suppose you get the following equation)

Cˆ t = 19.5 + 0.85Yt + 10 Dt
Now from this estimated you can get the normal & abnormal year period equations as follows. If there is
war period /abnormal year/ Dt=0 & the equation will be

127
Cˆ t = 19.5 + 0.85Yt Abnormal year equation.
If there is normal year Dt will have a value of 1 then
Cˆ t = 19.5 + 0.85Yt + 10 Dt
Dt= 1 in all periods
Cˆ t = 19.5 + 0.85Yt + 10
Cˆ t = (19.5 + 10) + 0.85Yt
Cˆ t = 29.5 + 10Yt
To generalize
Cˆ t = 19.5 + 0.85Yt Abnormal year
Cˆ t = 29.5 + 0.85Yt Normal year
These function suggest that there is a difference in the intercept in abnormal & normal period since the
abnormal period is less than the normal period intercept then we can conclude the depression & the two
war periods had significantly negative effect on consumption expenditure

Y
Cˆ t = (αˆ 0 + βˆ2 ) + βˆ1Yt Normal

Ĉ Cˆ t = αˆ + βˆ1Yt Abnormal year

αˆ + βˆ2
β̂
X
0 income

B) Use of Dummy variable for measuring the change in the slope parameter over time. The abnormal
period may not affects the autonomous (subsistence level) but it may affect the marginal propensity to
consume i.e. if the abnormal periods affects the marginal propensity to consume but not the constant term.
Suppose we have
Ct = α + β1Yt + β 2 (YtDt) + Ut

1 in normal years
Dt= 0 in abnormal years

Since we assume that the abnormal year is affecting the slope (MPC) we clipped it with disposable
income. If the period is normal year the estimated function will be (Dt=1)

128
Ct = αˆ + βˆ1Yt + βˆ 2Ytˆ
Cˆ t = α + ( β1 + β 2 )Yt Normal year equation. If the period is abnormal
year i.e Dt=0 then the estimated function will be

Cˆ t = αˆ + βˆ1Yt Abnormal year equation.

To have both the abnormal & normal year estimated function first we use data to estimate

Ct = α + β1Yt + β 2 ( ytDt ) + Ut
Cˆ t = αˆ + βˆ Yt + βˆ (YtDt ) Will be the estimated function.
1 2

On the basis of statistical criterion check all the parameters (αˆ , βˆ1 βˆ 2 ) significance. Then when
Dt= 1 you will have

Cˆ t = αˆ + ( βˆ1 + βˆ 2 )YtDt
let βˆ + βˆ = βˆ *
1 2

Cˆ t = α + βˆ1 * YtDt This is the normal year equation.

When Dt=0 the estimated function will have
Cˆ t = αˆ + βˆ1Yt This is for abnormal year.
Since βˆ * = ( βˆ1 + βˆ2 ) is greater than β̂1 we can conclude that even though the constant term
(autonomous consumption) in both periods (αˆ ) is equal but the marginal propensity to consume is
different in normal & abnormal period

Ĉt
Normal year Cˆ t = αˆ + βˆ * YtDt

Abnormal year Cˆ t = αˆ + βˆ1Yt

α̂ Income

129
C) The final possibility is when there is a change in the intercept & slope over time. The regression
equation when it is considering two things simultaneously i.e affecting the slope coefficient & the
constant term can be explained using the following function.
Ct = α + β 1Yt + β 2 Dt + β 3Yt Dt + Ut
Dt=1 for normal year
Dt=0 for abnormal year.
Then the function would be
e) Normal year when Dt=1
Ct = α + β1Yt + β 2 (1) + β 3Yt(1) + Ut
Ct = α + β 1Yt + β 2 + β 3Yt + Ut
Ct = (α + β 2 ) + ( β 1 + β 3 )Yt + Ut
let γ 1 = α + β 2 & γ 2 = β1 + β 3
Ct = γ 1 + γ 2Yt + Ut
Cˆ t = γˆ1 + γˆ2Yt
f) Abnormal year when Dt=0
Ct = α + β1Yt + β 2 (0) + β 3Yt (0) + Ut
Ct = α + β1Yt + ut
Ct = αˆ + βˆ Yt
1

Normal year C t = αˆ + βˆ1Yt

Abnormal year Cˆ t = αˆ + βˆ * YtDt

( αˆ + βˆ1 )

α Income

130
6.2 Use of Dummy variables in Seasonal analysis
To know how the dummy variables are used to capture the seasonal effects (quarters) in the economic
time series data we can form the regression equation as follows. We have four quarters in a year these are
quarter one, two, three & four.
a. To test seasonal pattern using intercept

Yt = αi + β 1 X 1t + β 2 D1 + β 3 D2 t + β 4 D3t + Ut
Where Yt= is volume of profit
Xt = is Volume of sales

1 if it is the second quarter

D1= 0 otherwise

1 if it is the third quarter

D2= 0 otherwise

D3= 1 if it is the fourth quarter

0 otherwise

If there are seasonal patterns in various quarters the estimated differential intercept will explain it by
applying OLS you can estimate the function & obtain the coefficients.
Yˆt = αˆ + βˆ1 xt + βˆ 2 D2 + βˆ3 D2 t + βˆ 4 Dt
If D1= 1 we could find that (second quarter)
Yˆt = αˆ + βˆ1 xt + βˆ2 D1
= (αˆ + βˆ ) + βˆ Xt
2 1
If D2 =1 (third seasons)
Yˆt = αˆ + βˆ1 xt + βˆ3 D2
= (αˆ + βˆ ) + βˆ Xt
3 1
If D4 =1 (fourth quarter)
Yˆt = αˆ + βˆ1 Xt + βˆ 4 D3
Yˆt = (αˆ + βˆ ) + βˆ X
4 1 1
If it is in the first quarter
Yˆt = αˆ + βˆ Xt

Because all the other quarters will be assigned zero value. In the above case we consider only the
intercept term which indicates the presence of seasonal patterns
Ex. If we obtain the following results from a given data.

131
Yˆt = 6688+ 1322D2- 217D3t+ 183D4 0.0383Xt.
S.E (1711) (638) (632) (654) (0.0115)
t (3.908) (2.07) (0.344) (0.281) (3.33)
R2=0.5255

The results shown that only the sales coefficient (Xt) & the differential intercept associated with
the second quarter are statistically significant at 5% level. It means on the basis of S.E & t-test
you can check it as follows
αˆ
S .E (αˆ ) < ,
2
,
βˆ
S .E ( βˆ1 ) < 1

2
βˆ
S .E ( βˆ 2 ) < 2 ,
2
βˆ
S .E ( βˆ1 ) <
2
ˆˆ
β
S .E ( βˆ 4 ) < 4
2

And in all case if t±>2 you can accept that the explanatory variables will affect the dependent variables
(i.e. Profit is affected by sales (Xt), & the dummy variables). In many of the study only some of β̂ ‘s are
significant which reflects that only some quarters may affect the profit From the above example only
α̂ , β̂ 3 are statistically significant. Thus we can conclude that there are some seasonal factors operating in
the second quarter of each year but there is no seasonal factors that will affect profit in the other quarters.
The average profit in the first quarter is 6688 & in the second quarter the average profit is ( α̂ , + β̂ 3 ) i.e
6688+1322 = 8010. The sales (X1) coefficient β̂1 tells us that if sales increase by 1 average profits will
increase by 0.033 cents.
b. If the seasonal factor do not affect the mean value but the slope of the seasonal factors.
Yt = αt + β 1 X t + β 2 D1 Xt + β 3 D2 Xt + β 4 D3t + Ut
Where
1 If it is the second quarter
D1= 0 Otherwise

1 If it is the third quarter

D2= 0 Otherwise

1 If it is the fourth quarter

D3= 0 Otherwise

The seasonal effect can be captured by the slope coefficients. If all (D1, D2, D3 are zero)

132
Yˆt = α + β Xt + Ut
This is the first quarter (the base quarter.)
If D1=1 (second quarter)
Yt = α + β1 Xt + β 2 D1 Xt + Ut
Yˆt = α + β xt + β D xt + ut
1 2 1

Yt = α + ( β1 + β 2 ) Xt + Ut
If D2 = the third quarter
Yt = α + β 1 Xt + β 3 (1) Xt + Ut
Yt = α + ( β 1 + β 3 ) Xt + Ut
If D3 = 1 the forth generator
Yt = α + ( β1 + β 4 ) Xt + Ut
Now the seasonal effect can now be examined by using hypothesis testing of F-test to know the joint
values are equal to zero β 2 = β 3 = β 4 = 0
c. The final possibility is that let the intercept and the slope will be affected by the seasonal
factors. And the function would be
Yt = α + β 1 Xt + β 2 D1 Xt + β 3 D2 Xt + β 4 D3 Xt + β 5 D1 + β 6 D2 + β 7 D3 + Ut
Given the previous meanings for all variables
Yt = α + β1 Xt is for the base quarter (1st quarter)
If D1= 1 for the second quarter
Yt = α + β 1 Xt + β 2 D1 Xt + β 5 D1 + Ut
= (α + β 5 ) + ( β 1 + β 2 ) Xt + Ut
If D2= 1 for the third quarter
Yt = α + β 1 Xt + β 3 D2 Xt + β 6 D2 + Ut
= (α + β 6 ) + ( β 1 + β 3 ) Xt + Ut
If D3= 1 for the fourth quarter
Yt = α + β 1 Xt + β 4 D3 Xt + β 7 D3 + Ut
= (α + β 7 ) + ( β 1 + β 4 ) Xt + Ut

133
• Excersise for chapter six

1) The following table gives the consumption expenditure C,the disposable income Yd &
sex of the house hold head of 12 random families.

Family Consumption Disposable income Sex type

1 18535 22550 M

2 11350 14035 M

3 12130 13040 F

4 15210 17500 M

5 8680 9430 F

6 16760 20635 M

7 13480 16470 M

8 9680 10720 F

9 17840 22350 M

10 11180 12200 F

11 14320 16810 F

12 19860 23000 M

A) Regress Yd on X B) Test for a different intercept for families with a male or a female as
head of the household. C) Test for different slope or MPC for families with male or female as head
of the household. D) Test for both different intercept & slope. E) which is the best result

134
2) Given the following sales in four quarters from 1995 -1999

Year Quarter Time trend Sales

1995 1 1 540.5

1995 2 2 608.5

1995 3 3 606.6

1995 4 4 648.3

1996 1 5 568.4

1996 2 6 632.8

1996 3 7 626

1996 4 8 674.6

1997 1 9 587

1997 2 10 640.2

1997 3 11 645.9

1997 4 12 686.9

1998 1 13 597

1998 2 14 675.3

1998 3 15 663.6

1998 4 16 723.3

1999 1 17 639.5

1999 2 18 716.5

1999 3 19 721.9

1999 4 20 779.9

Using the above data run regression of sales and the seasonal dummies & interpreate the result?

135
Chapter 7: Simultaneous equation models

7.1 The nature of simultaneous – equation model

In the estimation of single equation model the dependent variable is expressed as a linear function of one
or more independent explanatory variables. The relationship between these two dependent & independent
variables is one sided or unidirectional. That is the explanatory variables are the causes that brings any
change or effects to the dependent variables. i.e.

Qd = f (p) ------------------------------- 7.1

Qd (quantity demanded) is a function of price means any change in price will brings change in the
quantity demanded. But a change in quantity demanded is not expected to have any effect on the price of
the commodity. But in reality there is no one way causation, there are situations where the dependent
variable (Qd) will affect the price of goods i.e.

P=f (Qd)--------------------------------------------------------7.2
Under such conditions we need to consider more than one regression equations. One for each
interdependent variable to understand the multi-flow of influence among the variables. This is precisely
what is done in simultaneous equation models. A system of equation describing the joint dependence of
explanatory & explained variables is called a system of simultaneous-equations or simultaneous equation
model. In simultaneous equation model variables are classified as endogenous (explained) & exogenous
(explanatory variable). Endogenous (explained variables are those variables that are determined by the
economic model. But exogenous /explanatory variables are those variables which are determined outside
the model.
Example.
Qd= α+β1P+β2Y+Ut - demand function
Qs= α1+β3P+β4R+Ut - supply function
Where Qd= quantity demanded, P= price, R= rainfall
Qs= quantity supplied, Y= in income

In these models we have five variables (Qd, Qs, P, Y & R). Quantity supply, quantity demanded & prices
are endogenous variables that will be determined in the model Y& R are exogenous variables which is
determined or given outside the model. In this simultaneous model total number of equations is equal to
the number of endogenous variables. In the above model we have three endogenous variables such as Qd,
Qs & P then the total number of simultaneous equation is equal to three.
Qd= α+β1P+β2Y+Ut
Qs= α1+β3P+β4R+Ut
Qd= Qs
In these simultaneous equation models it is not possible to estimate each equation independently but we
can determine the value of endogenous variables by taking in to account all the information provided by
other equations of the system. Means we can determine the values of endogenous variables jointly or
simultaneously. If you apply OLS for each equation independently to estimate the parameters of each
equation by disregarding other equations of the model, the estimates obtained are not only biased but also
becomes inconsistent. As you increase the sample size indefinitely, the estimation does not converge to
their true parent. The bias arising from the application of OLS independently for each equations of the
simultaneous equations model is known as simultaneity bias or simultaneous equation bias.
Ex. Demand function! - The demand for a commodity depend on its own price (P), on other goods price
(Po) & on income (I) so the equation.

136
Qd= α0+β1P+β2P0+β3I+Ui……………………………..7.3
If you apply OLS for the above equation one of the CLRS assumption will broken down i.e. E(Pu1)≠0.
The explanatory variables are independent of the random term (Ui).
Since the price of the commodity is also affected by the quantity demanded of that commodity. The above
single equation model can not be treated independently. Means there is a two-way causation of the
following type.

Qd= f(p) & in turn P= f(Qd).

This arises a needs at least one more equations to estimate the given demand function. Assume that the
required relation is described under

P= f (Q) = α1+β4 Qd +β5 W + Ut ..................................................7.4.

W= weather conditions
Substitute the value of Qd (equation 7.3) in P equation 7.4 & you will have

P= α1+β4 [α1+β1P+β2P0+β3I+Ui ]+β4W+U2

To obtain estimates of αˆ , βˆ1 , βˆ 2 , βˆ3 of the demand function using OLS it is to be shown that explanatory
variables P0 & I are distributed independently of U and the explanatory variable Q4 & W are distributed
independently of U2

7.2 Inconsistency & Simultaneity Bias of OLS Estimators.

Biased ness
Qd= α+β1P+U
Where Qd = quantity demanded
P is the price of the commodity
We have Qd = f (P)
P=f (Qd). This shows the two way causation in a relationship leads to violation of the
important assumption of linear regression model

P=α0+β2Qd+β3Z0+v...........................................................7.5
Where Y is income & Z is advertisement
E(Ui)=0 E(V) = 0
2 2
E(Ui ) = σ u E(v2) = σ2u
E(UiUj)=0 E(ViVj)=0 Also E(UiVi)=0

There are two endogenous variables i.e Qd & P. In addition there is one exogenous variable Z.
Substitute the quantity demand equation in to the price equation

P = α0 + β2 (α+β1P+Ui ) + β3Z0+ Vi
P--β1β2P = α0 + β2 α +β2 Ui + β3Z+ Vi
P (1--β1β2) = (α0+β2 α) + β3 Z + (β2Ui+Vi )
α + β 2α β3 β 2U iVi
P= 0 + Z+
(1 − β , β 2 ) (1 − β 1 β 2 ) (1 − β 1 β 2 )

137
From here we can proved that the covariance of X & U is not zero. Let’s move it

Cov (P,V) = E[{(P---E(P)(U--E(U)}]

= E[(P---E(P)U] Since E(Ui)=0

Substitute the above value of P & you will have

α 0 + β 2α β3 β u +v α + β 2α β3
Cov(P,U)=E + Z+ 2 i i - 0 + Z Ui
(1 − β 1 β 2 ) 1 − β1β 2 (1 − β1β 2 ) (1 − β1β 2 ) 1 − β1 β 2
U
=E [α 0 + β 2α + β 3 Z + β 2Ui + Vi − α 0 + β 2α − β 3 Z ]
1 − β1 β 2

u
=E ( β 2ui + vi )
1 − β1β 2

1
=E ( β 2Ui 2 + UiVi)
1 − β1 β 2
1
= E ( β 2Ui 2 + UiVi )
1 − β1 β 2
For our assumption E(UiVi)=0
1
= ( β 2 E (ui 2 )
1 − β1β 2
β2 β 2σ 2 u
Cov(PU)= E (ui ) =
2
≠ 0 − − − − − − − − − − − − − 7 .6
1 − β1 β 2 1 − β1 β 2

The covariance between P & U is not zero then we break our assumption that E(PiUi)=0. That is
covariance between P & Ui is not zero as a consequence of applying OLS to each equations of the model
separately the coefficient will turn out to be biased & inconsistency. Let'
s see it that the estimates are
biased.
Qd= α+β1P+Ui From our demand equation if we use deviation form we will
have
qd.p = β1 pi2 + pivi divided both sides by pi2
qdpi pivi
= β1 +
pi 2 pi 2
qdpi
We know that β̂ =
pi 2
pivi
Then βˆ = β1 +
pi 2
Take expected value

138
pivi
E ( βˆ ) = E β1 +
pi 2
pivi
E ( βˆ ) = β1 + E − − − − − − − − − − − − − − − − − − − − − − − − − 6 .7
pi 2
In case of E(XU) ≠ 0 then the above equation E (αˆ ) ≠ β1 i/e β̂ is not equal to the true population
pivi pivi
parameter β̂1 because βˆ1 = β1 + E & B̂1 is biased by the amount E .
pi 2 pi 2

Consistency Problems: An estimator is said to be consistent if its probability limit is equal to its
population value. Therefore to show that β̂ the estimator of β is consistent, it is required to be shown
that its plm is equal to β i.e

P lim ( β̂ ) = β
If we applying the rules of probability limit to
xu
βˆ = β1 + E
xi 2
xivi
P lim ( β̂ ) = P lim β1 +Plim
xi 2
P lim( xivi / N )
P lim(β1 ) +
P lim( xi 2 / N
P lim( xu / N )
=β+
P lim( xi 2 / N
P lim( xiu / N )
P lim(βˆ1 ) = β + − − − − − − − − − − − − − − − −6.8
P lim( xi 2 / N )
The quantities in the brackets are explained as follows
xiui / n) is the cov (XU) & xi2/n is the sample variance of X. As the sample size N increases, the
sample covariance between X&U would approximate the true population covariance given by
xu
βˆ = β1 + E
xi 2
Similarly as N→∞, the sample variance of X will approximate its population variance, say σ2x.
Therefore
β 2σ 2u
1 − β1β 2
P lim(βˆ1 ) = β1 +
σu 2
β2 σv 2
βˆ1 = β1 + . − − − − − − − − − − − − − − − − − − − − − 6.9
1 − β1β 2 σx 2

139
β̂ is a biased estimator & the bias will not disappear even when N→∞. The direction of bias will depend
up on the structure of the particular model & the value of the regression coefficients.

7.3 Solution to the simultaneous equations

Application of OLS to an equation belonging to a system of simultaneous equation yields
A) Biased estimates
piui
E ( βˆ ) = β1 + E
pi 2
Where p is price from our equation. The estimated average value of β̂ is different from the
population parameter β1 by the amount
piui
E
pi 2
B) Inconsistent estimates
β1 σv 2
P lim(βˆ1 ) = β1 + .
1 − α1β 2 σx 2
The obvious solution is to apply other methods of estimation which give better estimates of parameters.
These are several methods used to estimate the equation. The most common are
a. The reduced form method or indirect least squares (ISLS)
b. The method of instrumental variables
c. Two - stage least squares (2SLS)
d. Limited information maximum likelihood (LIML)
e. The mixed estimation
f. Three stage least squares (3LS)
g. Full information maximum likelihood (FIML)
The first five methods can be applied to one equation at a time & due to this we call it single equation
system. The remaining two are called systems methods because they are applied to all equations of the
system simultaneously.

7.4 Some definitions of simultaneous equations

One has to acquaint himself with some important definitions & notations frequently used in the estimation
of simultaneous equation. These are Structural, Reduced & Recursive models.

i. Structural models
A structural model is complete system of equations which describe the structure of the relationship of the
economic variables. Structural equations express the endogenous variables (which is determined in the
model) as a function of
a) Other endogenous variables
b) Exogenous variables (determined outside the model) forgiven
c) The random term Ui.
The simultaneous equation will be complete equation if the number of equations is equal to the number of
endogenous variables. Ex. lets consider the following closed

Ct = α0 + β1 Yt +u1.....................................7.9
It = b+b1Yt + b2 Yt-1 + u2.............................7.10
Yt = Ct + It + Gt.............................................7.11

140
Equation 7.9 is consumption function 7.10 is investment function & 7.11 is national income
(definitional)
From the above equations consumption (Ct), It (investment) & (Yt) national income are endogenous
variables because their values will be obtained from the above equations. Since we have three endogenous
variables as a rule the number of equation must be three. Yt-1 & Gt are exogenous variables or
predetermined values or the values are determined outsides of our model. Now from the above
simultaneous equations we have 3 endogenous variables & two exogenous variables. The above
simultaneous equation system is complete because the number of endogenous variables equals to the
number of equations. The structural parameters are in general express propensities if the equation is linear
& elasticity’s if the equations are non-linear & transform in to linear equations. The structural parameter
( β̂ α̂ ) which are either propensities or elastic ties they explained the direct effects of each explanatory
variables on the dependent variable. Indirect effects of the explanatory variable on the dependent variable
can be computed by the solution of the structural system. Factors not appearing in any function explicitly
may have an indirect influence on the dependent variable of the function.

tatisticsDirect effects
Ex. From our equations (6.10 -6.12) β1 , b1 ,b2 express the direct effect of the explanatory variables on the
dependent variable.
β1 = is marginal propensity to consume
b1 =if income increase by 1 birr on the average investment will increase on the
average by b1 amount
b2= if the lagged income increase by one birr on the average investment will
increase by b2 amount.

Indirect effect:- The indirect effect can not be easily obtained from the structural parameters.
Ex. a change in consumption will affect indirectly because an increase in consumption will affect income
(Y) which is a determinant of investment (C→Y→I). The effect of C on I can not be measured directly by
any of the parameter ( β1 ,b1 ,b2) but it will be taken into account by the simultaneous solution of the
system. Traditionally the structural parameters are represented by β̂ '
s when they are endogneous
variable (coefficients of Ct,Yt & It).
• The parameters of the exhogenous variables such as coefficients of Yt-1 & Gt. represents by γ
• Endogenous variables are (I, Ct, & Yt) represented by the y' s.
• For the sake of simplicity we will ignore the constant term.
The structural system, if we use the above notations, will becomes

y1= β1 3 Y3 +u1....................................................7.12
y2= β 2 3 Y3 +γ21x1 + u2..........................................7.13
y3= y1 +y2 + x2......................................................7.14
Where y1 =C y2 =I y3= Y
x1= Yt-1 x2=G
Transfer all the observable variables to the left hand side & leave ui in the right hand side.

y1 + 0y2 - β1 3y3 + 0x1 + 0x2 = u1 ....................................7.15

0y1 + y2 - β23 y3 - γ21 x1+ 0x2 = u2 ....................................7.16
-y1 - y2 + y3 + 0x1 - x2 = 0 .................................................7.17

141
Take the coefficients of the endogenous & exhogenous variable & establish table structure

1 0 - β1 3 0 0
0 1 - β2 3 -γ21 0 This is table of structural coefficients.
-1 -1 1 0 -1

Values of the structural parameters may be obtained by using sample observations on the variables of the
model & applying an appropriate econometric method.

ii. Reduced form models:

In the structural models we explain the endogenous variables in terms of other endogenous variables
exogenous variables & random term. But in reduced form the endogenous variables are expressed as a
function of the exogenous variables only.

1st method
yi = πi1 x1 + πi2 x2 ................πnk xk +vi .........................7.18
And proceed with the estimation ofπ' s by applying some appropriate method. In this case yi is
endogenous where as xi are exogenous variables. In our example of simple three equations model the
reduced form would be (we can write the endogenous variables Ct, It & Yt as a function of Yt-1 & Gt
as follows)
Ct= π11 Yt-1 + π12 Gt +v1 .........................7.19
It= π21 Yt-1 + π22 Gt +v2 .........................7.20
Yt= π31 Yt-1 + π32 Gt +v3 .........................7.21

2nd method
For obtaining the reduced form of a model is to solve this structural system of equation (6.10 -
6.12)
Ct = α + β1 Yt +u1
It = b+b1Yt + b2 Yt-1 + u2
Yt = Ct + It + Gt
Drop the constant term of α&β
a) substitute equation 6.10 & 6.11 in to 6.12 & you will get

Yt = (β1 Yt +u1) + β1 Yt + b2 Yt-1 + u2) +Gt

Yt = (β1 bt )Yt + β2- Yt-1+Gt +(u1+u2)
Yt = (β1 bt )Yt + β2- Yt-1+Gt +(u1+u2)
[1-β1-b1] Yt = b2Yt-1+Gt +(u1+u2)
b2 1 u1 + u 2
Yt = Yt − 1 + Gt + − − − − − − − − − −7.22
(1 − β 1 − b1) (1 − β 1 − b1 ) 1 − β 1 − b1)

This is the reduced form of the identity equation

142
b) Substitute Yt in to equation of consumption (substitute equation number
7.22 in to equation number 7.9.)

b2 1 u1 + u 2
Ct = β 1 Yt − 1 + Gt + + u1
(1 − β 1 − b1) (1 − β 1 − b1 ) 1 − β 1 − b1)

β 1 b2 β1 β u + β 1u 2
= β1 Yt − 1 + Gt + 1 1 + u1
1 − β 1 − b1 1 − β 1 − b1 1 − β 1 − b1

b1b2 b1 β u + β1u 2 + u1 − β 1u1 − b2u1

Ct = Yt − 1 + Gt + 1 1 − − − − − 7.23
1 − β 1 − b1 (1 − β 1 − b1 ) 1 − β1 − b1

This is the reduced form of the consumption function.

c) Substitute Yt in to the investment function (i/e equation number 7.22 in
to equation number 7.10)
b2 1 u1 + u2
It = b1 Yt − 1 + Gt + + b2Yt − 1 + u1
(1 − β1 − b1) (1 − β1 − b1 ) 1 − β1 − b1)

b1b2 b1 bu +bu
= Yt − 1 + Gt + 1 1 1 2 + b2Yt − 1 + u 2
1 − β1 − b1 (1 − β 1 − b1 ) 1 − β1 − b1

b1b2 b1 bu +bu
Yt − 1 + b2Yt − 1 + Gt + 1 1 1 2 + u 2
1 − β 1 − b1 (1 − β 1 − b1 ) 1 − β 1 − b1

b 2b1Yt − 1 + b2Yt − 1 − β 2 b2Yt − 1 − b1b2Yt − 1 b11 bu +bu

+ Gt + 1 1 1 2 + u 2
1 − β 1 − b1 1 − β1 − b1 1 − β 1 − b1
b1Yt − 1 + Yt − 1 + β 1Yt − 1 − b1Yt − 1 b1 bu +bu
b2 + Gt + 1 1 1 2 + u 2
(1 − β 1 − b1) (1 − β 21 − b1 ) 1 − β 1 − b1

b2 (1 − β1 )Yt − 1 b1 bu1 + b2 u 2
It = + Gt + + u 2 − − − − − 6.24
1 − β1 − b1 (1 − β 2 − b1 ) 1 − β 1 − b1
This is the reduced form of the investment function. The reduced form of the parameters measures the
total effect. i.e direct & indirect effects of the exhougneous variables on the endogenous variables of the
model. Let’s write equation 7.9 up to equation 7.11 as follows.

Ct= π11 Yt-1 + π12 Gt +v1 .........................7.26

It= π21 Yt-1 + π22 Gt +v2 .........................7.27
Yt= π31 Yt-1 + π32 Gt +v3 .........................7.28

143
All theπ' s representing the coefficients of the explanatory variables & measures the total effects of the
explanatory variables (direct & indirect effect) on the endogenous variables. For example π21 in the
investment function measures the total effects of Yt-1 will be split in to two parts. There are
a) Direct effect : the effect of Yt-1 on I which is explained by b2 in equation 6.11.
b) Indirect effect:- In the second case when Yt-1 affects (It) investment using b2 again investment
(It) will be affect income (Yt) which income will affects consumption. This can be explained as follows
Yt-1 It Yt It

Ct Yt It
Hence the direct effect of Yt-1 on It (investment) will be captured by b2 but the other effects of lagged
income on other variables of income & consumption are indirect effects. From the parameter π21 we will
have the following parameters in equation 7.9 & 7.10

b2 (1 − β 1 )
π 21 = − − − − − − − − − − − − − − − − − − − − − 7.29
1 − β 1 − b1
This π21 consists of three variables b2, β2, b1 which is reflected (appear) as coefficients in equation 7.9 &
7.10.This can be decomposed in the following components.

b21
π 21 = b2 1 +
1 − β 1 − b1

b2 β 1
π 21 = b2
1 − β1 − b1

Total effect = direct effect + indirect effect

• b2 which is the direct effect of Yt-1 on investment.

• B2β1 the indirect effect of Yt-1 on (It) investment
• It (investment) influences Yt. which in turn affects Ct.
• 1-β 1-b1 = when Yt affects Ct using β1 again Ct will affect Yt using b1 & finally
Yt will affect It.
These reduced form coefficients are used for forecasting & policy analysis, since it is the total effect of a
change in the exogenous variables on the dependent variables. The above two ways of obtaining the
reduced form model suggested that estimates of the reduced form coefficients may be obtained in two
ways.

1. Direct estimation of the reduced form coefficients.

The reduced formπ' s may be estimated by the method of least- squares -no restriction (LSNR). This
means we can apply OLS to equation number 7.26 - 7.28) because we express all the endogenous
variables in terms of exogenous variables. This method of obtaining the π' s is called least squares no
restriction (LSNR) because it doesn' t take into consideration any information on the structural parameters.
In this method what is requited is knowledge of the predetermined variables appearing in the system not
about the coefficients in equation 7.9-7.11 (about β1 b1, b2). No need to worried about whether β1=0, b1=0
& b2 =0 or not

144
2. In direct estimation of the reduced form coefficients.
It is known that there is a relation ship between the reduced form (π) coefficients & the structural
parameters (explained in the table). There fore to obtain values of coefficients estimate the structural
parameters (7.15 - 7.17) by any appropriate econometric techniques and then substitutes these estimates in
to the system of parameters relationships to obtainπ' s indirectly. This indirect method involved three
steps.
a. Solve the system of endogenous variables so that each equation contains only predetermined
explanatory variables. This will be done by continuous substitutions of variables until we arrive at the
reduced form of all the equations. Using this way we obtain the system of parameters relations, that is
to say the system which defines the relations between theπ' s & theβ '
s &γ‘s.
b. Obtain the estimates of the structural parameters by any appropriate econometric method.
c. Substitute the estimates of β' s & γ's in to the system of parameters relations to find the estimates of
the reduced form coefficients.

Advantage of indirect estimation of the reduced-form coefficients.

Though this method of indirect estimation of reduced form is complicated, it has a very good importance.
a) The derivation of π' s β'
s & α' s is more efficient because in this way we take in to account all the
information incorporated in the structural model.
b) Structural changes occur continuously overtime.
If we know these changes in β ' s & α' s we may easily re-compute the π' s. While if the are π'
s computed
with LSNR method it will not be possible to take this information in to account, because each π is a
function of several structural parameters.
3. Recusive Models
A model is called recursive if its structural equations can be ordered in such away that the
i. The first includes only predetermined variables in the right hand side (exhogneus variables).
y1=f(x1, x2,...xk,u)
The first endogenous variable y1 is a function of all the exhougenous variables. Then we can write it.
y 2 = γ 11 x1 + γ 12 x 2 + − − −γ 1k x k + u 2
ii) Second equation contains all the predetermined variables i.e. x'
s & only one endogenous variable.
Y2=f(x1,x2,...xk,y1,u2)
y 2 = γ 21 x1 + γ 22 x 2...γ 2 k xk + β 21 y1 + u 2

y n = γ n1 x1 + γ n 2 x 2 ...γ n k x k + β n1 y n ...β ny n + un
Given values of the exhougneous variable (x) we may apply OLS to each equation individually
because by assumption
cov(uiuj)=0
cov(y,u2)=0
Recurisive systems are also called triangualr system because the coefficients of the endogenous variables
(the β '
s) from a triangular array: the main diagonal of the arrays of β '
s contains units & no coefficients
apear above the main diagonal. Take our equation 7:9-7.11.

Ct= y1=f(yt-1,Gt)
It= y2=f(ct,yt-1,Gt)
Yt= y3=f(ct,It,Yt-1,Gt)
y1=γ11yt-1+γ12Gt+ui
y2=β21Ct+γ21Yt-1+γ22 Gt + u2
y3=β31Ct+β32It+γ31Yt-1+γ32 Gt + u3

145
Take all the right hand variables in to the left hand side by excluding ui.

y1-γ11yt-1-γ12Gt=u1
y2-β21Ct-γ21Yt-1-γ22 Gt = u2
y3-β31Ct-β32It+γ31Yt-1-γ32 Gt = u3

β's of endogenous variables γ's of exhougnous variables

y1/Ct/ Y2/It/ y3/yt/ yt-1 Gt
1 0 0 -γ11 -γ12
-β21 1 0 -γ21 -γ22
-β31 -β32 1 -γ31 γ32

If you look at the coefficients of β '

s (endogneous variable) the diagonal contains 1 in all coefficients.
Above the diagonals the coefficients are zero & below it there are numbers. Then this equation can be
estimated using OLS & it will yields unbiased & consistent estimates.

7.4 The Identification Problem

Problem of identification refers to model formulation. It does not concern the estimates of the model. The
estimation of the model depends up on the emperical data & the form of the model. OR : By the
identification problem we mean that whether the numerical estimates of a structural equations can be
obtained from the estimated reduced form coefficients (whether we are obtaining α,β from π' s ). If α,β’s
are obtained from π' s we say that a particular equation under consideration is identified, if not we can say
that the equation is un identified or under identified. An identified equation may be either exactly or fully
or just identified or over identified. It is said to be exactly identified if unique numerical values are
obtained from the structural parameters. It is said to be over identified if more than one numerical values
are obtained for some of the parameters of the structural equations. Let' s illustrate using simple Keynesian
model.

Ct = α + β1Yt + Ut − − − − − − − − − − − − − − − − − − − − − − − − − −7.29
Yt = Ct + It − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 7.30
Where Ct= consumption expenditure Yt= income & I is investment. Ct& Yt are endogenous variables
& we have two equation. But only we have one exogenous variable i.e. I. The model is complete because
the number of equations equal to the number of endogenous variables.
To simplify substitute equation 7.29 in to 7.30

Yt = α + β1Yt + Ut + It
Yt − β1Yt = α + Ut + It
(1 − β1 )Yt = α + It + Ut
α 1 Ut
Yt = + It +
1 − β1 1 − β1 1 − β1
α 1 Ut
Let π 0 = , π1 & w0 =
1 − β1 1 − β1 1 − β1
Substitute in the above equation

146
Yt=π0+π1It+w0 -----------------7.31
Equation 6.31 is the reduced- form equations. The reduced form coefficients (π0&π1) are non linear
combinations of the structural coefficients. Substitute equation 7.31 in to equation number 7.29
C = α + β [π 0 + π 1 It + w0 ] + Ut.
α I Ut
= α + β1 + It + +t
1− β 1 − β1 1 − β1
βα β1 β Ut
=α + + It + 1 + Ut
1 − β 1 − β1 1 − β1
α − αβ 1 + β α β1 β Ut + Ut − β 1Ut
= + It + 1
1− β 1 − β1 1 − β1
α β1 Ut
Ct = + It + − − − − − − − − − −7.32
1 − β 1 − β1 1 − β1
α β Ut
Let = π 2 ,+ 1 = π 3 & = w1
1− β 1 − β1 1 − β1
Then
Ct= π2+π3It+w1 ...............................................7.33
The reduced form coefficients such as π1+π3 which are coefficients of investment in the income &
consumption function respectively. They are called impact or short run or multiply because they measure
the immediate impact on endogenous variables. π1 explain the immediate impact of investment on income
and π3 shows the immediate impact of investment on consumption

7.4.1 Under identification

Consider the following demand & supply equations

Qd=α0+α1P1+U1 ................................................................7:34
Qs=β0+β1P1+U2 ................................................................7:35
Qd=Qs equilibrium condition
Where: Qd is quantity demand, Qs is quantity supplied & P is price
Qd=Qs

α0+α1P1+u1=β0+β1P1+U2.....................................................7:36
Rearrange it
α1 P1-β1P1=β0-α0 1+ (U2-U1)
(α1 -β1)=(β0-α0 ) 1+(U2-U1)

(β 0 − α 0 ) U 2 − U1
P1 = + ......................................7:37
(α − − β 1 ) (α 1 − β1 )

β0 − α0 (U − U 1 )
Where = π 0 and 2 =V
α − − β1 (α 1 − β 1 )

P1= π0+Vt ...........................................................7:38

147
Substitute equation 7.38 in to equation 7.34
β0 − α0 (U − U 1 )
Qd = α 0 + α 1 + 2 + U 1t
α − − β1 (α 1 − β 1 )

α 1 β 0 − α 0α 0 (U − U 1 )
Qd = α 0 + + α1 2 + U 1t
(α − − β1 ) α 1 − β1

α 0α 1 − α 1 β 0 + α β 0 − α 0α 0 α 1U 2 − α 1U 1 + α 1U 1 − β 1 − U 1
Qd = +
α 1 − β1 α 1 − β1

α 1 β 0 − α 0 β1 α 1u 2 − β1u1
Qd = + ..............................7.39
(α 1 − β1 ) α1 − β1
α β − α 0 β1 α 1u 2 − β1u1
Let π 1 = 1 0 & =w
(α 1 − β 1 ) α 1 − β1
Then we can write equation number 6.39 as follows

Qd = π1+w................................................................7.40
Equation number 7.38 & 7.40 were the two reduced form equations derived from the structural equations
number 7.34 & 7.35. Now if you compare the number of structural equation coefficients (α0, α1, β0 & β1)
are four where as from the structural equations we have only two coefficients (π0 & π1). The coefficients of
reduced form contain the coefficients of the structural equations i.e α0, α1, β1 & β2 are found in π0 & π1. But
how we can find the values of α0, α1, β1 & β2 from π0 & π1. Since it is not possible to find these values from
π0&π1 or the coefficients of the structural equations are greater than the coefficients of the reduced form
then we can say that the equation is under identified & we can not compute four structured coefficients
from two reduced coefficients.

7.4.2 Exact /Just/ Identification

The reason to have under identified function in the previous demand & supply function was that
f) The same variables P&Q are appearing in both functions.
g) There is no additional information
Now let’s incorporate additional variable in the demand function

Qd= α0+ α1P1 + α2 I + U1................................................................7:41

Qs= β0+ β1P1 + U2 ........................................................................7:42
Here the only new variable is I which represents income & income is exogenous variable. In the above
function we have P, & Q is endogenous & only one exogenous variable I.
Qd= Q1= identify equation

α0 + α1P + α2 I + U1= β0 + β1P1 + U2

α1P1 -- β1P1 = (β0----α0)--αI + (U2--U1)
P1(α1--β1) = (β0--α0)--αI+(U2--U1)
β − α1 α2 (U − U 1 )
P1 = 0 − + 2 ........................................7.43
α 1 − β1 α 1 − β1 (α 1 − β1 )

148
β 0 − α1 α2 (U − U 1 )
Let π 0 = ,π 1 = & v1 = 2 .
α 1 − β1 α1 − β1 (α 1 − β 1 )
P1=π0+π1+U1 ................................................................7:44
Substituting 7.44 in to equation 7.41
β − α1 α2I (U − U 1 )
Qd=α0+α1 0 ,π1 = + 2 . + α 2 I + U1
α1 − β1 α 1 − β1 (α 1 − β1 )
α 1 β 0 − α 21 α 1α 2 I α U − α1U 1
Qd = α 0 + 1 2 . + α 2 I + U1
α 1 − β1 α1 − β1 α 1 − β1
α α 1 − α β 1 + α 1 β 0 − α 21 αα I α U − α 1U 1
− 1 2 + α 2 I + u1 + 1 2 + U1
α1 − β1 α1 − β1 α1 − β1
α β − α 0 β1 α 2 β1 α U − β1U 1
Qd = 1 0 − I+ 1 2
α 1 − β1 α 1 − β1 α 1 − β1 1
α β − α 0 β1 − α 2 β1 α U − β1U 1
Let π0 = 1 0 ,π 3 = ,w = 1 2 .
α 1 − β1 α 1 − β1 α 1 − β1
Qd = π2+π3 I+w............................................7.45

Equation number 7.44 & 7.45 are reduced- form equations & OLS can be applied to estimate their
parameters. In the structural equations (7.41&7.42) contains five structural coefficients α0,α1,α2,β 1&,β2.
But there are four reduced form equations coefficients (π0,π1,π2&π3). Since the number of π'are less than
(they are four) the structural coefficients (they are five α0,α1, β0,β1,α2, &,β2) then we can not find unique
solutions. But the supply function is independently identified because
Qs = π2+π3It+wt
In the supply equation of 7.42 there are two structural parameters (β0 &, β1) again in the reduced form
equation of the supply equation we have two reduced form coefficients π2+π3 i.e why the supply function
is identified.

From equation 7-42 we have

Qs = β0 + β1p1
β0 = Qs --β1p1
Substitute equation 7.45 in place of Qs & equation number 7.44 in place of p & you will get (after
simplification)
β0 = π2---β1π0 .................................................7.46
Again from the same equation number 7.42 you can get the value of β1
Qs − β 0
β1 = Substitute in place of Q3 equation number 7.45 & in place of P1 equation number 7.44
P1
and you will get (after simplification)
π3
β1 =
π
But in case of the demand function α0, α1, & α2, is 3 structural coefficients but in reduced form of
equation the coefficients are two. Since in the demand function the coefficient of the reduced form (7.45)
is less than the coefficients of the structural equation (7.41). We can concluded that the demand function
is under identified (π2,π3) are less than α0,α1,&α2). But in case of supply function π2,π3 are equal to

149
β 0 , β1 then it is just identified. In conclusion we can say that the supply function is identified but the
demand function is not identified on the basis of this one can say that the system as a whole is not
identified. In conclusion we can say that the supply function is identified but the demand function is not
identified on the basis of this one can say that the system as a whole is not identified. Suppose lets have
the following equations

Demand function Qd = α0+α1P1+α2 I+U1..............................................7:47

Supply function Qs = β0+β1P1+β2 pt-1 U2 ...............................................7:42
Where the demand function the same as 7.41 but the supply function includes the lagged price as
explanatory variable. We have two (P1&Q) endogenous variables & two (It & Pt-1) exhogenous variables.
At market clearing point
Q1=Qd
Then
α0+ α1P1 + α2 I + U1 = β0 + β1P1 + β2 pt-1 U2
Solve for P1
α1P1- β1P1 = (β0---α0 )--α2It + β 2 pt-1 + (U2--U1)
(α1-β1)p1 = (β0---α0)--α2It + β2Ptt-1 + (U2--U1)
(β − α )1 α2 β2 (U − U 1 )
p1 = 0 − It + pt − 1 2 .
(α 1 − β1 ) (α 1 − β1 ) (α1 − β1 ) (α 1 − β 1 )
(β 0 − α 0 ) − α2 β2
π0 = ,π1 − ,π 2 .
(α1 − β1 ) (α1 − β1 ) (α1 − β1 )
(U − U 1 )
Vt = 1 2 .
α 1 − β1
P1= π0 + π1It + π2Pt-1+ Vt .....................7.49
Substitute this price value either in demand or supply equation you will get
Qt=π3 + π4It + π5Pt-1 + wt .....................7.50
α β − α 0 β1
Where π3= 1 0
α 1 − β1
α 2 β1
π4 =
α1 − β1
αβ
π5 = 1 2
α1 − β1
α u − β 1 u1
wt = 1 2
α 1 − β1
The structural equations number 7.47 & 7.48 consists of six structural coefficients α0, α1 ,α2, β0, β1, & β2
& there are six reduced form coefficients in equation 7.49 & 7.50 (π0, π1, π2, π3, π4,& π5). In this
equation the structural coefficients are equal to reduced form. Then we can conclude that the system as a
whole is identified.

7.4.3 Over identification

Keeping the supply function as equation number 7.48 but lets modify the demand function by
incorporating wealth (R) we will have the following equation.
Demand function

150
Qd = α0 + α1P1 + α2 It + α3Rt + U1...........................................7:51
Supply function
Qs = β0 + β1P1 + β2Pt-1 + U2 ........................................................7:52.
Now we will have two endogenous variables (Q&P1) & three exogenous variables
(Pt-1, It & Rt ).
In equilibrium Qd = Q1
α0 + α1P1 + α2 It + α3Rt + U1 = β0 + β1P1 + β2Pt-1++ U2
α1P1--β1P1 = (β0-α0) + β2Pt-1---α2 It---α3Rt + U2--U1
P1(α1-β1 ) = (β0--α0) + β2Pt-1---α2 It--α3Rt + U2-U1
(β − α )1 β2 α2 α3 (U − U 1 )
p1 = 0 + It + I− Rt + 2 .
α 1 − β1 (α 1 − β1 ) (α 1 − β1 ) (α 1 − β1 ) (α 1 − β1 )
(β 0 − α 0 ) β2 −α2
Where π 0 = ,π 1 − ,π 2 .
(α 1 − β1 ) (α 1 − β1 ) (α 1 − β1 )
−α3
π3 = , Vt = U 2 − U 1 ) .
α 1 − β1 α 1 − β1

Pt = π0 + π1It + π2 Rt + π3Pt-1 + Vt .....................7.53

Substitute Pt in the demand or supply function
Qt=α0 + α1 [π0 + π1It + π2 Rt + π3Pt-1 + Vt] + α2 It + α3Rt + U1
After simplification you will get
Q = π4 + π5It + π6 Rt + π7Pt-1 + wt .....................7.54
α β − α 0 β1 − α 2 β1 − α 3 β1
Where π4= 1 0 ' π5 = , π6 =
α 1 − β1 α1 − β1 α 1 − β1
αβ α u − β 1 u1
π7 = 1 2 & wt = 1 2
α1 − β1 α 1 − β1
From equation number 7.51 & 7.52 we have seven structural coefficients but in equation 7.53 & 7.54 we
have eight reduced form coefficients. Since the coefficients of reduced form coefficients are greater than
the reduced form coefficients we can say that the system as a whole is over identified.

7.5 Rules for identification

The reduced form equation is time consuming process. Thus the so called order and rank conditions of
identification lighten the task by providing a systematic way. There are two formal rules with which we
can establish the identification relationship. These rules set a condition for identifiably of a relationship.
They are
a) The order condition
b) The rank condition for identification.
The identification of a system means the identification of each equation. And the identification of the
parameters in any equations means there is unique value for each parameter in the equations. In
econometric there are two possible situations of identifiably. These are.
a) Equation under identified when its statistical form is not unique. When one or more of its
equation of the model are under identified we can say that the system as a whole is under
identified.
b) Equation identified. In this case a system is identified when all the equation are identified. In
identified system we can have two options. Exactly identified & over identified.

151
If an equation is under identified it is impossible to estimate all its parameters using any econometric
techniques. But if the equation is identified its coefficients (parameters) can be statistically estimated.
i) If the equation is exactly identified the appropriate method for estimation is the method of indirect
least squares methods (ILS).
ii) If the equation is over identified ILS will not give unique estimates of the structural parameters
because it will not yield unique estimates of structural parameters. In this case we use various
methods. These are 2SLS (Two stages least squares), or maximum likely hood methods.

7.5.1 Order Condition for identification

This method is a necessary but not sufficient condition for identification of the equations.
Let
• G= total number of equations (total number of endogenous variable)
• K= total number of exogenous & endogenous variables.
• M= number of exogenous & endogenous variables is a specific equation.
If (K-M) ≥ (G-1) the equation is identified
If (K-M) = (G-1) the equation is exactly identified
If (K-M) > (G-1) the equation is over identified
If (K-M) < (G-1) the equation is under identified

Example 1. Qd = α + α1P1 + α2I + u1...........................................7.55

Qs = β0 + β1P1 + u2 ..................................................7.56

Take the demand equation (7.55)

G= total number of equation =2
K= total number of exogenous & endogenous equations =3
M= total number of endogenous & exogenous variables in equation 7.53. i.e demand
equation = 3
(K-M) ≥ (G-1) = (3-3) < (2-1)
0 < 1 then we can say that the demand function is under identified.
Take the supply equation (7.56).
G=2 K=3 M=2
(K-M) ≥ (G-1) = (3-2) = (2-1), 1=1 From this we can conclude the supply function
is exactly identified
Ex.2. Given the following structural model & determine whether the equation are identified or under
identified.
y1 = 3y2-2x1+x2+u1 .......................................7.57
y2 = y3+x3+u2 ..............................................7.58
y3 = y1-y2-2x3+u3 ..........................................7.59
G= total number of endogenous variables =3 (y1, y2, y3)
K= total number of endogenous & exogenous variables (y1, y2, y3, x1, x2, x3) = 6
Take equation 7.57 in this equation M= endogenous & exogenous variables in this specific
equation is = 4 (y, y2, x, x2)
(K-M) ≥ (G-1)
(6-4) > (3-1)
2 > 1 the equation is identified & it is over identified.
Take equation 7.58 M=3 (y2,y3 x3)
(K-M) ≥ (G-1)
(6-3) ≥ (3-1)

152
3 ≥ 2 the equation is over identified
The equation number 7.59. M=4(y, y2, y3, x3)
(K-M) ≥ (G-1)
(6-4) ≥ (3-1)
3 ≥ 2 exactly identified equation

7.5.2 Rank Condition for identification

The rank condition states that in a system of G equations (total number of equations in the model) any
particular equation is identified if & only if it is possible to construct at least one-non zero determinants of
order (G-1)
Example 1. Given the structural model

y1 = 3y2 + 2x1 + x2 + u1
y2 = y3 + x3 + u2
y3 = y1 + y2 - 2x3 + u3
It is known that y' s are endogenous & x' s are exogenous variables. To construct rank condition 1st
Transfer all the left hand variables into right hand side & put in table

-y1 + 3y2 + 0y3 + 2x1 + x2 + 0x3 + u1 = 0

0y1 + y2 + y3 + 0x1 + 0x2 + x3 + u1 = 0
y1 + y2 - y3 + 0x1 + 0x2 + 2x3 + u1 = 0
By ignoring the random term put in table form
Variables
Equations y1 y2 y3 x1 x2 x3
1st Equation -1 3 0 2 1 0
2nd Equation 0 -1 1 0 0 1
rd
3 Equation 1 1 -1 0 0 -2

Now let’s examine the identifiably of the second equation. Then to do so follow the following steps
a) strike out the raw coefficients of equation number 2
b) strike out the columns in which a non-zero coefficients of equation number 2 (delete the column
of non zero coefficients of the variables found in equation 2)

153
Equation Variables
y1 y2 y3 x1 x2 x3
1st Equation -1 3 0 -2 1 0
2nd Equation 0 -1 1 0 0 1
3rd Equation 1 1 -1 0 0 -2
Then we left with
y1 X1 x2
-1 -2 1
1 0 0

C) Now determine that the table with one dependent & two independent variables of order (G-1) the
determinants of at least one of (G-1) is non-zero.
• If the (G-1) determinant is equal to zero then we can say that the equation is under identified.
• If the (G-1) determinant at least one of the determinant is none zero then the equation is
identified.
From the above table we will have the following determinants of order

∆1= 1−1 −20 = 0+2 ≠0 , ∆2 = 0−2 01 =0 ∆3= 1−1 01 =1

From the above we have two non-zero determinants & conclude from the second equation is identified.
Example 2 : Given the mathematical model

D= α0 + α1P1 + α2 p2 + α3Y + α4t + U

S = β0 + β1P1 + β2 P2 + β3c + β4t + W
D=S
Where D= quantity demand Y= income
S= quantity supplied C= costs
P1= Price of the product t= time
P2= Price of other product
Number of endogenous variables are (D, S & P) three & four exogenous variables (P0, Y, C & t).
Check the 2nd equation (supply equation) is identified or not.
i. Order condition
K= 7 (number of exogenous & endogenous variables)
M= (number of exogenous & endogenous variables in the supply equation) =5 (C,
P2, t,P1 &S )
G= no of equations (endogens variables) in the model=3
(K - M) ≥ (G – 1)
( 7 - 5) ≥ (3 - 1)
2 = 2 The supply equation is exactly identified.

154
7.5.3 Rank Condition
1st write the structural equations as follows ignore the constant term & the random term

-D + α1P1 + α2 p2 + α3Y + α4t + 0Ct + 0S =0

OD + β1P1 + β2 P2 + β3C + β4t + 0Y-- S = 0
-D + 0P1 + 0P2 + 0Y + 0C + 0t + S = 0

Variables
Equation
D1 P1 P2 Y T C S
1st equation -1 α1 α2 α3 α4 0 0
2nd equation 0 β1 β2 0 β4 β3 -1
3rd equation 1- 0 0 0 0 0 0

2nd strike out the 2nd equation

3rd strike out all the non-zero column of the 2nd equation
-1 α3
-1 0 = 0- (- α3) = α3

Since the value of the determinate is none zero i.e. α3. Hence the supply equation is identified.
(K - M ) ≥ (G-1)
(7-5) ≥ (3-1) = 2 the supply equation is exactly identified.

7.6 Estimation of Simultaneous equation models

To estimate the simultaneous equation models we adopt two approaches.
• The first one is single equation method, also known as limited information method. In this single
equation method we estimate each question in the system individually.
• The second one is system methods also known as full information methods. In this case we
estimate all equations in the model simultaneously.
In practice system methods are not commonly used for variety of reasons rather, single equation methods
are often used. The major single equation methods applied in the estimation of simultaneous equation
methods are
a. Ordinary least squares (OLS)
b. Indirect least squares (ILS)
c. Two stage least squares (2SLS)

155
7.6.1 Ordinary Least Squares

We have seen that applying OLS on simultaneous equation produce bias & inconsistent parameters. But
there is one situation were OLS can be applied appropriately even in the context of simultaneous
equation.

Y1t = α0 + x1 + α2 x2 + u1 .............................................7.57
Y2t = β0 + β1Y1 + β2 x1 t +β3x2 + u2 ................................7.58
Y3t = r0 + r1Y1 + r2Y2 + r3 x1 + r4x2 + u3 ......................7.59

In equation 7.57 the endogenous variables appear in the left & the exogenous variables in the right hand
side. Hence OLS can apply straight forwardly to this question given all the assumptions of OLS holds
true. In equation 7.58 we can apply OLS provided that Y1 & U2 are uncorrelated. Again we can apply
OLS to the last equation if both Y1 & Y2 are uncorrelated with uU3. In this recursive system OLS can be
applied to each equation separately & we do not face a simultaneous equation problem. The reason for
this is that clear, because there is no interdependence among the endogenous variables. Thus Y1 affect Y2
influence Y3 without being influenced by Y3. In other words each equation exhibits a unilateral causal
dependence.

7.6.2 Indirect least square (ILS method)

ILS is applicable only for just identified equations [(K-M) = (G-1)]. The method of obtaining the
estimates of the structural coefficients using OLS of the reduced form coefficients is known as the method
of (ILS) indirect least squares & the estimates obtained are known as the indirect least squares estimates.

ILS involves the following Steps

• Step - 1- We first obtain the reduced form equation from the structural equations. i.e. explaining
the endogenous variables as a function of explanatory (exogenous variables) & a stochastic term.
• Step 2 Apply OLS to the reduced- form equations individually. In this case the exogenous
variables are uncorrelated with the stochastic term.
• Step 3 we obtain estimates of the original structural coefficients from the estimated reduced-form
coefficients obtained in step-2. ILS derives from the fact that structural coefficients are obtained
indirectly from the OLS estimates of the reduced form coefficients.
Example
Qd = α0 + α1 p1 + α2 Y + u1 ..........................7.60
Qs = β0 + β1p1 +u2 …........................................7.61

Where Qd = quantity demanded, Qs= quantity supplied P = Price, Y is income. Assume that Y is
exogenous & Q &Pt are endogenous variable. Take equation 7.60 prove if it is identified or not

156
(K-M) ≥ (G-1)
(3-3) < (2-1)
0 < 1 the demand function is under identified.
Take equation number 7.61.
(K-M) ≥ (G-1)
(3-2) = (3-1)
1 = 1. The supply function is just /exactly/ identified.
α0 + α1 p1 + α2 Y + u1t = β0 + β1p1 + u2
α1 p1 = β1p1 + ( β0 - α0) - α2 Y + u2 - u1
α1 p1 - β1p1 = ( β0 - α0) - α2 Y + u2 - u1
p1 (α1 - β1 ) = (β0-α0 ) - α2 Y + u2 -u1
(β − α 0 ) α2 (u − u1 )
p1 = 0 − Yt + 2 . − − − − − − − − − − − 7.62
(α 1 − β1 ) (α 1 − β1 ) (α 1 − β1 )
(β 0 − α 0 ) −α2 u − u1 )
Where = π0, = π 1 ,&. 2 =w
(α 1 − β1 ) (α 1 − β1 ) α 1 − β1
P1 = π0 + π1Y + w.....................................7.63
Substitute equation 7.62 in to 7.60
β0 − α 0 α2 (u − u )
Qd= α0 +α1 − Yt + 2 1 . + α 2Y + u1
α1 − β1 α1 − β1 (α1 − β1 )
α 1 β 0 − α 0α 1 α 1α 2 α u − α 1u1
= α0 + − Yt + 1 2 . + α 2Y + u1
α 1 − β1 α1 − β1 α 1 − β1
α1β 0 − α 0 β1 α β α u − u1β1
= − 2 1 Y+ 1 2
α 1 − β1 α 1 − β1 α 1 − β1
α 1 β 0 − α 0 β1 − α 2 β1 α u − β1u1
Let = π2,= = π3, 1 2 = vt
α 1 − β1 α1 − β1 α1 − β1
Qd = π2 + π3 Yt + vt............................................7.64
In equation 7.63 & 7.64π' s are reduced form coefficients & are non linear combinations of the structural
coefficients ( α & β). The reduced form parameters (π' s) can be estimated by OLS.
p1 yi
From equation 7.63. πˆ1 =
yi 2
πˆ0 = P − πˆY
Since the supply function is exactly identified the parameters can be estimated uniquely from the reduced
form coefficients as follows.
π3
β0 = π2 + β1π0 & β1 =
π1
From the estimated value
π3
βˆ0 = πˆ 2 − βˆ1 , πˆ 0 & β1 = . These are the parameters of the supply function using ILS.
πˆ1

157
7.6.4 Two stages least square methods /2SLS/

When the equation is over identified we use 2SLS method consider the following
Y1t = α + α1Y2 + α3 X1 + α4 X2 + U1 ..........................7.65
Income function
Y2 = β0 + β1Y1 + Ut .................................................7.66
Money supply function
Where Y1 = income X1 - investment
Y2 = Stock of money X2 = Government expenditure
Y1 & Y2 are endogenous variables & X1 & X2 are exogenous variables.
Take equation 6.65 = K=4 , G=2, M=4
(K-M) ≥ (G-1)
(4-4) < (2-1) then the income equation is not identified /under identified
Take equation number 7.66 M=2
(K-M) ≥ (G-1)
(4-2) > (2-1)
2 > 1 the money supply equation is over identified
If one applies OLS to the money supply the estimates obtained will be inconsistent because of the
correlation between Y1 & U2. If we use a proxy for Y1 which is not correlated with U2 such a proxy is
called instrumental variable. If you find such a proxy you can apply OLS & estimate the money supply
equation. This is instrumental variable can be obtained by the two-stage least squares (2SLS). The method
indicates two successive applications of OLS. The process is explained as follows.
• Step 1. To get rid of the likely correlation between Y1 & U2 (in equation 7.66). First
regress Y1 on all the predetermined variables in the whole system (x1 & x2).
Y1 = αˆ 0 + αˆ1 x1 + αˆ 2 x 2 + uˆ t ...................................7.67
Where Û are the usual OLS residuals.
Yˆ1 = αˆ 0 + αˆ1 x1 + αˆ 2 x 2 t ...................................7.68
Equation number 7.67 is the reduced form regression because it explains that the exogenous variables
appear on the right- hand side.
Equation number 7.67 can be explained.
Y1 = Yˆ1 + uˆi...................................7.69
In this equation Yˆ1t is a linear combination of the non- linear stochastic X’s & a random component Û .
Following the OLS theory Yˆ1t & Û are uncorrelated.
• Step 2. The money supple equation which is over identified can be written as
Y2 = β0 + β1( Yˆ1 + Û ) + U2
= β0 + β1 Yˆ + β1 Û +U2
1

= β0 + β1 Yˆ1 + ( U2 + β1 Û )
= β0 + β Yˆ + U*..........................................7.70
1

Where U* = U2+B1 Û
Comparing equation 7.70 & 7.66 is that, they seem very similar. But the difference lies between Y1 & Yˆ1 .
The advantage of replacing Y1byYˆ1 in the original money supply equation is that since Y1 is correlated
with U2 & rendering OLS in appropriate; but the replacement will avoid this problem. As a result OLS
can be applied to equation number 7.70 which will give consistent estimates of the parameters of the

158
money supply function. The basic idea behind 2SLS is to purify the stochastic explanatory variable Y1 of
the influence of the stochastic disturbance U2. This goal is achieved
• 1st by regressing Y1 on explanatory variables (X, & X2)
• 2nd obtaining Yˆ1
• 3rd Replacing Y1 by Yˆ1 & apply OLS.
The estimators obtained are consistent i.e. they converge to their values as the sample size increases
indefinitely. To further illustrate 2SLS we use the following model.

Y1 = α0 + α1Y2 + α2 X1 + α3 X2 + U1 ..........................7.71
Y2= β0 + β1Y1 + β2 X3 + β3 X4 +U2 ...........................7.72
Where Y1 = income X4 = Previous income
Y2 = money supply X3 = money supply in the previous period
X1 = investment X2= Government expenditure

X1, X2 , X3 & X4 are exogenous variables but Y1 & Y2 are endogenous variables.
Take equation number 7.71 the income equations
K=6 G= 2 M=4
(K - M) ≥ (G-1)
(6-4) > (2-1) = then the income equation is over identified.
Equation number 7.72 - the supply equation
K= 6, G=2 M=4
(K-M) ≥ (G-1)
(6-4) > (2-1) again the money supply equation is over identified.
Since both the income & money supply equations are over identified and we use 2SLS to estimate the
parameters coefficient. Then to apply 2SLS we follow the following steps.
• Step 1. Regress the endogenous variables (Y1 & Y2) on all the exogenous variables (X1, X2, X3,
&X4).
Y1 = α0 + α1X1 + α2 X2 + α3 X3 + α4 X4 + U1 ..........................7.73
Y2 = β0 + β1X1+ β2 X2+ β3 X3 +β4 X4+ U2 ...........................7.74
ˆ
Obtain Y1 & Ŷ2 replace these values in equation 7.71 & 7.72 respectively by the original Y1 & Y2
Y1 = α0 + α1 ( Yˆ + Û ) + α2 X1+ α3 X2+U1
2

Y2= β0 β1( Yˆ1 + Û ) + β2 X3+ β3 X4 +U2

Y1=α0+α1( Yˆ2 +α2 X1+ α3 x3 +α1 Û +u1

Where α1 Û +u1 =U*1
Y1=α0+α1( Ŷ2 +α2 x1+ α3 x3 +U*1 ................7.75
The money supply equation will have
Y2= β0+β1 Yˆ1 +β2 x3+ β3 x4+β1 Û +u2
Where Û +u2 =U*2
Y2= β0+β1 Yˆ1 +β2 x3+ β3 x4+ U*2.................................................7.76
Apply OLS for equation 6.75 & 6.76 the estimates obtained will be consistent & unbiased.

159
Exersice for chapter seven

1) A) What is meant by simultaneous equations system or model?

B ) What do we mean by structural equation?
C) Explain simultaneous equation biase & reduced form equations?
2) Given the following structural equations represents the demand & supply model

Demand Qt = α0 + α1 X1 + α2 Y1 + U1
Supply Qt= β0 + β1X1 + U2

Where Q is quantity, X1 is price & Y1 is income.

a) Why we need this simultaneous equation model?
b) Why the estimation of demand & supplyfunction by OLS give baised &
inconsistent parameter estimates?
3) With reference to the demand & supply model given below?

Demand: Qt = α0 + α1P1 + α2 Yt + U1
Supply Qt = β0 + β1P1+ β2 T2+ U2

a) Determine if the demand & the supply functions are exactly identified. Overidentified or
underidentified?.
b) Find the reduced form equations?
c) Derive the formula for the structural parameters?

5) a) When can be indirect least squares & 2SLS are used?

b) What are the short comings of using indirect least squares?
c) What are the advantages of 2SLS with respect to ILS?

6) Let amodel consists of the following equations

Y1= 4Y2 — 3X1 + U1

Y2= 2Y3 + 2X3 + U2
Y3= 2Y1 --- 3Y2 -- X2 ---X3 + U3

Where Y1, Y2, Y3,are endogenous & X1,X2 & X3 are exougneous variables. Discuses the
identification of each of the equations of the model, based on the order & rank conditions?

160

(Ebook PDF) Introductory Econometrics Asia Pacific Edition Instant Download
100% (2)
(Ebook PDF) Introductory Econometrics Asia Pacific Edition Instant Download
55 pages
Econometrics For Management
No ratings yet
Econometrics For Management
53 pages
Econometrics Mid Exam
100% (1)
Econometrics Mid Exam
2 pages
ARBA MINCH University
100% (2)
ARBA MINCH University
55 pages
Applied Econometrics Module
100% (2)
Applied Econometrics Module
141 pages
Chapter 5 Violations of CLRM Assumptions
100% (2)
Chapter 5 Violations of CLRM Assumptions
25 pages
Elements of Econometrics - Study Guide
No ratings yet
Elements of Econometrics - Study Guide
363 pages
Econometrics ppt-1
100% (1)
Econometrics ppt-1
205 pages
Chapter 4 Violations of The Assumptions of Classical Linear Regression Models
100% (10)
Chapter 4 Violations of The Assumptions of Classical Linear Regression Models
10 pages
Econometrics Module
No ratings yet
Econometrics Module
185 pages
Econometrics CH 1-4
100% (1)
Econometrics CH 1-4
315 pages
Econometrics I-For Lectuure Latest
67% (3)
Econometrics I-For Lectuure Latest
148 pages
Econometrics I Lecture Notes
100% (1)
Econometrics I Lecture Notes
74 pages
SRSTRT Mod
100% (1)
SRSTRT Mod
184 pages
Dvielopmetal Economics Chap 1 & 2 Students
100% (1)
Dvielopmetal Economics Chap 1 & 2 Students
95 pages
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
100% (5)
Introduction To Econometrics Ii (Econ-3062) : Mohammed Adem (PHD)
83 pages
Calculus For Economists-Module-Final Draft - Dr. Addisu M
100% (1)
Calculus For Economists-Module-Final Draft - Dr. Addisu M
146 pages
Aau Macro Economics PDF
100% (6)
Aau Macro Economics PDF
80 pages
Macroec I
100% (1)
Macroec I
109 pages
Group Assignment For Econometrics
75% (4)
Group Assignment For Econometrics
2 pages
Chap1 Econometrics
No ratings yet
Chap1 Econometrics
36 pages
International Economics II - Chapter 3
100% (1)
International Economics II - Chapter 3
75 pages
1 Logit Probit and Tobit Model
100% (2)
1 Logit Probit and Tobit Model
51 pages
Econometrics Two
No ratings yet
Econometrics Two
116 pages
Econometrics 2
No ratings yet
Econometrics 2
135 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
108 pages
Micro Perfect and Monopoly
No ratings yet
Micro Perfect and Monopoly
57 pages
Tourism Research
No ratings yet
Tourism Research
14 pages
Module On Monetary Economics
No ratings yet
Module On Monetary Economics
210 pages
Econometrics Module
No ratings yet
Econometrics Module
155 pages
Proposal of Research BA
100% (5)
Proposal of Research BA
27 pages
Econometrics - Basic 1-8
100% (1)
Econometrics - Basic 1-8
58 pages
Learning Guide Learning Guide: Unit of Competence Module Title LG Code: TTLM Code
100% (2)
Learning Guide Learning Guide: Unit of Competence Module Title LG Code: TTLM Code
20 pages
Econometrics Modulei-3
88% (17)
Econometrics Modulei-3
87 pages
Econometrics Module 2
100% (1)
Econometrics Module 2
185 pages
Econometrics II
100% (1)
Econometrics II
101 pages
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
100% (3)
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
34 pages
1 Introduction To Econometrics
100% (6)
1 Introduction To Econometrics
18 pages
Ass 1 2019 RMBA
100% (3)
Ass 1 2019 RMBA
8 pages
International Economics II Answers 1
100% (2)
International Economics II Answers 1
3 pages
Econometrics Lecture Chapter 2 Note pdf-1
No ratings yet
Econometrics Lecture Chapter 2 Note pdf-1
34 pages
Worksheet Econometrics I
100% (3)
Worksheet Econometrics I
6 pages
Managerial Economics Assignment Biruk Tesfa
100% (1)
Managerial Economics Assignment Biruk Tesfa
13 pages
Econometrics Chapter 1 7 2d AgEc 1
No ratings yet
Econometrics Chapter 1 7 2d AgEc 1
89 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Chapter One
No ratings yet
Chapter One
5 pages
A) Chart of Accounts: 1. On March 1, 2020, Tahir Muktar, A Famous Businessman in Addis, Opened A Business
No ratings yet
A) Chart of Accounts: 1. On March 1, 2020, Tahir Muktar, A Famous Businessman in Addis, Opened A Business
12 pages
MoE S Model Exit Exam Solution (Economics) July 04, 2023
No ratings yet
MoE S Model Exit Exam Solution (Economics) July 04, 2023
118 pages
Econometrics II CH 2
100% (1)
Econometrics II CH 2
18 pages
Hypothesis Testing
63% (24)
Hypothesis Testing
59 pages
Econometrics MTU
No ratings yet
Econometrics MTU
31 pages
CH 1
No ratings yet
CH 1
15 pages
Six Sigma Green Belt 2009
100% (1)
Six Sigma Green Belt 2009
3 pages
Practice Questions Econometrics II
100% (1)
Practice Questions Econometrics II
5 pages
At 5
100% (1)
At 5
9 pages
Kxu Stat Anderson Ch10 Student
No ratings yet
Kxu Stat Anderson Ch10 Student
55 pages
Econometrics II Handout For Students
No ratings yet
Econometrics II Handout For Students
29 pages
ECO - Chapter 01 The Subject Matter of Econometrics
No ratings yet
ECO - Chapter 01 The Subject Matter of Econometrics
42 pages
Institutional and Behavioral Economics Forth Years by Shimels M
No ratings yet
Institutional and Behavioral Economics Forth Years by Shimels M
46 pages
Sta630 Mcqs Most Imp File For Final
No ratings yet
Sta630 Mcqs Most Imp File For Final
141 pages
Econometrics I Ch2
No ratings yet
Econometrics I Ch2
105 pages
Ethnography in The Performing Arts
No ratings yet
Ethnography in The Performing Arts
146 pages
Econ 3049: Econometrics: Department of Economics The University of The West Indies, Mona
No ratings yet
Econ 3049: Econometrics: Department of Economics The University of The West Indies, Mona
16 pages
MA3251 Stastical and Numerical Methods 1 - by LearnEngineering - in
No ratings yet
MA3251 Stastical and Numerical Methods 1 - by LearnEngineering - in
238 pages
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
No ratings yet
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
35 pages
Sample Thesis Using Anova
100% (3)
Sample Thesis Using Anova
6 pages
Section 1: Multiple Choice Questions (1 X 12) Time: 50 Minutes
No ratings yet
Section 1: Multiple Choice Questions (1 X 12) Time: 50 Minutes
7 pages
Econometrics 2 Exam Answers
67% (3)
Econometrics 2 Exam Answers
6 pages
8 Using Secondary Data
100% (1)
8 Using Secondary Data
15 pages
01 - Selecting The Study Subjects - Hulley, Designing Clinical Research, 4th Edition PDF
100% (1)
01 - Selecting The Study Subjects - Hulley, Designing Clinical Research, 4th Edition PDF
9 pages
Quantitative and Qualitative Research
No ratings yet
Quantitative and Qualitative Research
16 pages
f4649504 Using Mixed Methods Approach To Enhance and Validate Your Research PDF
No ratings yet
f4649504 Using Mixed Methods Approach To Enhance and Validate Your Research PDF
82 pages
Sriii Q2 M5
No ratings yet
Sriii Q2 M5
9 pages
EC203 WEEK 1 Slides 17 N
No ratings yet
EC203 WEEK 1 Slides 17 N
41 pages
Rubric For STEM Paper
No ratings yet
Rubric For STEM Paper
2 pages
One-Way ANOVA & Kruskal-Wallis Test: DR Elaine Chan Wan Ling
No ratings yet
One-Way ANOVA & Kruskal-Wallis Test: DR Elaine Chan Wan Ling
26 pages
Notes in Statistics
No ratings yet
Notes in Statistics
36 pages
Immanuel Mission International School: Title in Bold
No ratings yet
Immanuel Mission International School: Title in Bold
13 pages
Tugas Critical Appraisal
No ratings yet
Tugas Critical Appraisal
4 pages
Lecture 2 HYPOTHESIS TESTING Real
No ratings yet
Lecture 2 HYPOTHESIS TESTING Real
10 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
2 pages
Worksheet November 21 Solutions - 2
No ratings yet
Worksheet November 21 Solutions - 2
8 pages
Nutrition & Food Science: Article Information
No ratings yet
Nutrition & Food Science: Article Information
9 pages
Lampiran 6 Hasil Uji Statistik
No ratings yet
Lampiran 6 Hasil Uji Statistik
3 pages
Efektivitas Pengadaan Barang Dan Jasa Se PDF
No ratings yet
Efektivitas Pengadaan Barang Dan Jasa Se PDF
13 pages
IE486 Design and Analysis of Experiments PDF
No ratings yet
IE486 Design and Analysis of Experiments PDF
2 pages
Existing Data Based Research
No ratings yet
Existing Data Based Research
2 pages
Albert Einstein
No ratings yet
Albert Einstein
2 pages
CRITICAL APPRAISAL SKILLS PROGRAMME Edit
No ratings yet
CRITICAL APPRAISAL SKILLS PROGRAMME Edit
5 pages
Close: Sources of Collinearity. John Wiley
No ratings yet
Close: Sources of Collinearity. John Wiley
1 page