0% found this document useful (0 votes)
5 views245 pages

1 Merged Merged

Uploaded by

Maadhav Sehgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views245 pages

1 Merged Merged

Uploaded by

Maadhav Sehgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 245

___________________________________________________________________________________________________

Subject Business Economics

Paper No and Title 8, FUNDAMENTALS OF ECONOMETRICS

Module No and Title 1,INTRODUCTION TO TWO VARIABLE REGRESSION


ANALYSIS

Module Tag BSE_P8_M1

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

CONTENTS

1. INTRODUCTION

2. WHAT IS REGRESSION ANALYSIS

3. ECONOMIC DATA

4. TYPES OF REGRESSION MODEL

5. POPULATION REGRESSION MODELS

6. THE STOCHASTIC DISTURBANCE TERM

7. SAMPLE REGRESSION FUNCTION

8. SUMMARY

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

1.INTRODUCTION

The term Regression has its historical origin which is connected to the works of Francis Galton. While studying
the height of parents and children, Galton found the following result: tall parents usually have tall children and
short parents usually have short children but the average or mean height of children born of both parents tend to
move towards the average or mean height in the population as a whole. In other words the height of children of
both tall and short parents tends to move towards theaverage or mean height of the population. Karl Pearson
confirmed the findings of Galton by analysing thousand records of heights of both children and parents.

2. WHAT IS REGRESSION ANALYSIS

Regression analysis is a statistical tool or method that is very useful for studying the relationships between two (or
more) variables. For example, in economics we have Keynes’s theory of Physiological law of consumption. It
states when income increases there is an associated increase in consumption. However the marginal propensity to
consume will lie between 0 and 1. There are other such examples of relationship between variables in social
sciences. In such cases, Regression Analysis can be employed to build a model to predict the value of one
variable (dependent variable) on the basis of other given variables (the independent variables). We briefly explain
the notation and terminology usedcommonly in regression analysis.

Notation:

Dependent Variable:

Independent Variables: , , ,…………………………………………, �

Terminology:

The dependent variables are also known by various terminologies such as: Explained variable, predictand,
Regressand, Response, Endogenous, outcome, controlled variable.

The independent variables are also known by various terminologies: Predictor, Regressor, Stimulus, Exogenous,
Covariate, control variable.

Though the use of terminology is a personal choice, we will simply use dependent and independent/ explanatory
variables in this chapter.

2.1 Deterministic versus Statistical Relationship

In regression analysis we look for statistical relationship (and not deterministic relationship) between variables. In
statistical relationship we essentially deal with stochastic or random variables that possessed probability
distributions. For instance, the yield of crop depends on temperature, rainfall and fertilizers but agronomists could

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

not predict the yield of crop exactly because there are errors involved in measuring these explanatory variables
and there are other factors affecting crop yield. Therefore in statistical relationship, the dependent variable and
explanatory variables does not have exact relationship.

In deterministic relationship the relationship between variables can be exactly determined. One such example is
Newton’s Law of Gravity. According to Newton’s Law of Gravity, the force of attraction between every particle
in the universe is directly proportional to the product of the masses and it is inversely proportional to the square of
the distance between them. There are other examples such as Ohm’s Law or Boyle’s Gas Law.

2.2 Regression versus Causation

A statistical relationship between two (or more) variables does not in any manner imply causation. For instance,
we know that crop yield depends on rainfall. Statistically speaking there is no valid reason why we cannot assume
that rainfall depends on crop yield. However simple common sense do suggests that this is not usually the case as
we cannot change rainfall by increasing or decreasing crop yield. Therefore the crop yield is treated as dependent
variable and rainfall as explanatory variable.

To determine causality between variables we must take into account a priori or theoretical considerations.
Statistical relationship in itself is not at all sufficient to imply causation between variables.

2.3 Regression versus Correlation

Both correlation and regression are closely related but conceptually different. In correlation analysis our main
objective is to measure the strength or degree of linear association between variables. For example we are
interested in measuring the correlation (coefficient) between smoking and lung cancer.

In regression analysis we are interested in predicting the average or mean value of one variable (dependent
variable) on the basis of the given values of other variables (independent variables). For instance we would want
to know the occurrence of AIDS among drug users. Secondly in correlation analysis, both the variables are treated
as random. However in regression analysis the dependent variable is random variable while the explanatory
variables is fixed and given.

3. ECONOMIC DATA

The success of any regression analysis will depend on the availability of high quality data necessary for
economic research. So we briefly look at the types of data and issue of data accuracy.

Types of Data

There are three types of data available for empirical analysis: time series, cross-section and pooled data (i.e.,
combination of time series and cross section).

Times series:
In time series, we measure the set of observations at different time periods. We can have daily data (e.g., stock
prices, weather reports), weekly data (e.g., money supply, weekly sale), monthly data (e.g., consumer price index,
BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

unemployment rate), quarterly data (GDP, Industrial Production), Annual data (e.g., government budgets),
quinquennially data (e.g., census of manufacturing) and decennially data (e.g., census of population). The
problem with time series econometrics is that we need to assume that the underlying time series data is stationary.
Simply speaking a time series data is said to be stationary if its mean and variance remain constant over a period
of time.

Cross-Section Data:
In Cross Section Data we measure the data on one or more variables at the same point in time. Examples- Census
of population after every 10 years, Consumer expenditure survey etc. The problem with cross-sectional data
relates to the issue of heterogeneity. So, when we include heterogeneous units in a statistical analysis, the size and
scale effect needs to be considered into account.

Pooled Data
In pooled data we combine both the time series and cross sectional data. For instance we can have a data on the
price and output of different variety of cloths for a time period of say 10 years.

The Issue of Data Accuracy


There are lots of data available for empirical research in economics. However the quality of data is often not very
good. There are many reasons for that.
(1) Most data in social sciences are non-experimental in nature. Therefore observational error due to
omission or commission may creep in during data collection.
(2) Measurement errors may arise due to approximations and round off in experimental data
(3) The problem of non response is rampant in questionnaire type surveys.
(4) It is often difficult to compare various sample data as there are wide differences in the method of
obtaining data
(5) Sometimes economic data are available in highly aggregated form and may not contain much information
about micro units.

Therefore it is possible that if the quality of data is poor, it may lead to unsatisfactory results for the economic
researcher.

4.TYPES OF REGRESSION MODEL

There are usually two types of regression models

(1) Simple Regression model: This is further divided into simple Linear regression Model and Simple
Non Linear regression Model
(2) Multiple regression model. This is also further divided into Multiple Linear regression model and
Multiple Non Linear Regression model.

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

Regression Model

Simple Regression Multiple Regression


Model (Only One Model (Two or more
explanatory variable) explanatory variables)

Linear Non-Linear Linear Non-Linear

In simple regression model the number of independent variable is one while in multiple regression model it is
more than one. The term Linearity can mean both Linearity in parameter and variables. For regression analysis,
Linear regression model would mean a model which is linear in parameter and Non Linear regression model
would mean a model which is not linear in parameter. By Linearity in parameter we mean that the parameters say

�, has the power 1 and are not multiplied or divided by any other parameters like � . or ��
1

Linear in Variable Non Linear in Variable


Linear in Parameter Linear Regression Model Linear Regression Model
Non Linear in Parameter Non-Linear Regression Model Non-Linear Regression Model

Examples:

Linear Regression model

(a) = �� + � ( Simple Linear Regression Model)


(b) = �� + � (Simple Linear Regression Model)
(c) = �� + � +� (Multiple Linear Regression Model)
(d) = �� + � +� + � (Multiple Linear Regression Model)

Non Linear Regression Model

(e) = �� + � (Simple Non Linear Regression model)


(f) = �� + � + � (Multiple Non Linear Regression Model)

5. POPULATION REGRESSION MODELS

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

In regression analysis the main aim is to estimate and/or predict the (population) average/mean value of the
dependent variable based on the known explanatory variable (s). We consider in table 1 aillustrative example of
the weekly income and weekly expenditure on consumption of a hypothetical community. The total
population comprised of 36 families. These families are further divided into 6 income groups from Rs 1000 to Rs
3500. We have 6 fixed values of and the corresponding values are shown in the table. So we have 6 sub
populations. The bottom row shows the conditional expected values of Y for the 6 subpopulations.

Table 1: An illustrative example of weekly income and weekly expenditure on consumption.

Weekly Income (in Rupees)


X
1000 1500 2000 2500 3000 3500
Weekly 700 850 1150 1500 1800 2000
Consumption 750 950 1200 1600 1850 2200
Y 800 1000 1350 1700 1900 2250
850 1200 1450 1800 2050 2300
900 1250 1500 1900 2150 2350
___ 1350 1550 _____ 2250 2400
___ ___ 1600 _____ ____ 2600
Total 4000 6600 9800 8500 12000 16100
⁄ � 800 1100 1400 1700 2000 2300

Corresponding to table 1, we can also find the conditional probability for population of 36 observations. This can
be seen in table 2.

Table 2: Conditional probability for population of observation 36

X
1000 1500 2000 2500 3000 3500
Conditional 1/5 1/6 1/7 1/5 1/6 1/7
Probabilities 1/5 1/6 1/7 1/5 1/6 1/7
p(y/Xi) 1/5 1/6 1/7 1/5 1/6 1/7
1/5 1/6 1/7 1/5 1/6 1/7
1/5 1/6 1/7 1/5 1/6 1/7
___ 1/6 1/7 _____ 1/6 1/7
BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

___ ___ 1/7 _____ ____ 1/7


Conditional 800 1100 1400 1700 2000 2300
Means of Y

We plot the graph of the relationship between expenditure on consumption and income as shown in fig 1. We can
clearly see that there is enough variation in expenditure on consumption in each income group. However the
average expenditure on consumption rises as income rises. This phenomenon can be seen in table 1 itself. For
instance, when the income level is Rs 1000, the corresponding average expenditure on consumption is Rs 800.
When the income is increased to Rs 1500 the average expenditure on consumption rises to Rs 1100. Thus we
clearly see a positive relationship between income and expenditure on consumption. These mean/average values
of dependent variable (here consumption expenditure) are known as conditional expected values as they are
conditioned or depend on the given explanatory variable (here income group).

Apart from conditional expected values, we can also find unconditional expected value of expenditure on
consumption . We add the expenditure on consumption of all income groups and divide it by the total
number of families which is Rs 1583.33. It is unconditional as it includes expenditure on consumption of all
income groups. The conditional expected values are different from the unconditional expected value. The
unconditional expected value gives the average expenditure on consumption of all income groups while the
conditional expected value gives the average expenditure on consumption of a particular income group. For
instance the conditional expected value of expenditure on consumption for people with income group of Rs 1000
is Rs 800 while for another income group Rs 1500 it is Rs 1100. The concept of conditional expected value may
help us to predict the average or mean value of dependent variable at different values of independent variable
which is in fact the essence of regression analysis.

The next objective is to obtain population regression line. The population regression line is obtained by joining
the conditional mean values of for various level of . Geometrically the population regression line is the locus
of conditional means of dependent variable for given values of the explanatory variables. The regression curve
thus passes through these (conditional) mean values. We assume for simplicity shake that these values are
symmetrically distributed around their respective (conditional) mean values. This can be seen in figure 1 which is
given below:

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

5.1Population Regression Function

Continuing with the above illustrative example we have already seen that each conditional mean of expenditure
onconsumption depends on particular income level. Therefore, ⁄ � is a function of � where � is a given
value of . Mathematically

⁄ � = � ��

where � denotes some function of the independent variable .

Equation is known as Conditional Expectation Function (CEF) or Population Regression Function (PRF). It
tells us how the expected value of the distribution of Y is related functionallyto the value of � in some way.

The next question relates to the functional form which � should take? The function form of the Population
Regression Function is both an empirical and theoretical question. For instance, economist assumed that
expenditure on consumption and income are linearly related. For simplicity sake and as a initial working
hypothesis we assume that the Population Regression Function ⁄ � is a linear function of � .

⁄ � = �� + � � ��

where�� � � are known as the regression coefficients. The regression coefficients are fixed but unknown.�� is
known as intercept or constant term; and� as slope coefficients.

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

Equation (2) is known as linear popular regression model. Our main objective is to find (i) the true values of
�� � � and (ii) standard error of �� � �

5.2 Stochastic Specification of the Population Regression Function

Coming back to our illustrative example, we find that the average expenditure on consumption increases as
income increases. However if we look at a particular family this need not be necessary true. For instance there is a
family with income of Rs 1500 whose expenditure on consumption is Rs 850 which is below the expenditure of
one family with income of Rs 1000. Therefore there are families whose consumption expenditure deviates from
the average expenditure.

The deviation of Individual � from its expected /mean value ⁄ � can be expressed as follows
�� = � − ⁄ �
� = ⁄ � + �� ��
� = �� + � � + �� ��

The deviation of � from its expected value is denoted by �� . �� is an unobservable random variable and can take
either positive or negative values. They are popularly known asstochastic disturbance term.

Interpretation of equation (4)

The expenditure on consumption of an individual family for a given income level can be expressed as the sum of
two components:

(1) The systematic or deterministic component:


This is represented by ⁄ � which is the average expenditure on consumption of all the families with
the same income level.

(2) The random, or non-systematic component.


This is represented by the stochastic disturbance term,��

To understand the issues more clearly we write the hypothetical example in table 1 in the form of equation 3
The individual consumption expenditure for = � can be written as given below:

= = �� + � + �

= = �� + � + �

= = �� + � + �

= = �� + � + �

= = �� + � + �

Conditional mean of disturbance term


BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

Consider equation (3) and take expectation both the sides

�⁄ � = ⁄ � + �� ⁄ �

= ⁄ � + �� ⁄ � (Since expected value of constant is that constant itself).

Since �⁄ � is equivalent to ⁄ � we get

�� ⁄ � =

Thus the conditional mean value of �� is zero. This is because we assume that the regression curve/line passes
through the conditional mean of .

The stochastic specification in equation 3 clearly shows that there are other variable(s) apart from income which
affects expenditure on consumption and income alone cannot explain individual family consumption expenditure.

6. THE STOCHASTIC DISTURBANCE TERM

The disturbance term �� captures all omitted variables that collectively affects but are not included in the model.
The question is why is it not possible to introduce all the terms that affect the dependent variable explicitly into
the model. There are numerous number of reasons for this:

1. There is always some elements of intrinsic randomness in human responses. This arises due to
unpredictability of human choices, error in making decisions among others.
2. An effect of large number of omitted variables is contained in . Due to incompleteness of theory or data
unavailability there are large numbers of explanatory variables which are excluded from the model.
3. There could be error in measuring
4. Functional form of is not known. In reality it is very difficult to know the exact functional form of
the relationship between dependent variable and independent variables.

7.SAMPLE REGRESSION FUNCTION

Our objective is to estimate Population Regression Function (PRF). In reality we cannot observe or see the
population relationship between the dependent variable and explanatory variable . So we use sample
information to estimate population values. Consider the two random samples drawn from Table 1

Table 3: Two Random Samples


Sample 1 Sample 2
Y X Y X
700 1000 800 1000

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

950 1500 1200 1500


1350 2000 1450 2000
1800 2500 1500 2500
2150 3000 1850 3000
2400 3500 2250 3500

The Sample Regression Function (SRF) is the counterpart of PRF which can be expressed as given below:

: ̂� = � + �

Where ̂� is an estimator of [ � ⁄ � ]
� is an estimator of ��
is an estimator of �

We draw sample regression curve/line based on table 3. This can be seen from figure 2 which is given below:

Recall that the Stochastic Population Regression Function can be written as follows:
� = �� + � � + ��

The Stochastic Sample Regression Function is


� = �+ �+ �

Where ̂� is the conditional expected value / predicted value of �


̂
� is the deviation (or residual/error) between predicted value of � and the actual �.
Note that �� ≠ � .

The main objective of regression analysis is to estimate the Population regression Function with the help of the
Sample Regression Function. The sample estimates will be used as an approximate of the population parameters.
BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
___________________________________________________________________________________________________

These sample estimates will also vary from one sample to another sample. So our task will be to estimate the SRF
which make this approximation as close as possible.

8. SUMMARY

 The main idea behind any regression analysis is to study the statistical dependence of dependent
variable on one or more explanatory variables

 The objective of any regression analysis is to estimate and / or predict the mean value of the dependent
variable on the basis of the known value of the explanatory variables.

 The success of any regression analysis will depend upon the availability of high quality data.

 Regression models could be of two types: Simple Regression model and Multiple Regression model. In
both cases, we could have linear and non-linear regression models.

 We study linear population regression functions regressions that are linear in the parameters only.They
could be non linear in the explanatory variables.

 The population regression function (PRF) or the conditional expectation function (CEF) remains the key
concept behind regression analysis.We study how the average value of the dependent variable changes
with the given value of the explanatory variables.

 We study the stochastic PRF as they are useful for empirical analysis.The stochastic disturbance term ��
plays an important role in estimating the PRF.

 The stochastic disturbance term �� captures all the factors that influence the dependent variable but are
not explicitly incorporated in the model.

 In reality one rarely has the access to the entire population of interest.So we use the stochastic sample
regression function to estimate the PRF.

BUSINESS ECONOMICS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


MODULE No. : 1, INTRODUCTION TO TWO VARIABLE REGRESSION
ANALYSIS
____________________________________________________________________________________________________

Subject Business Economics

Paper No and Title 8, Fundamentals of Econometrics

Module No and Title 2, Estimation of regression analysis

Module Tag BSE_P8_M2

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
1
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. POPULATIONS AND SAMPLE REGRESSION FUNCTION
2. METHODS FOR ESTIMATING REGRESSION MODEL
3. THE LEAST SQAURE METHODS
3.1. NECESSARY REQUIREMENT FOR OLS ESTIMATES
3.2. INTERPRETATION OF THE COEFFICIENTS
4. VARIANCE AND STANDARD ERROR OF THE OLS ESTIMATES
4.1. THE VARIANCE AND STANDARD ERROR OF THE OLS ESTIMATES
4.2. VARIANCE ESTIMATES OF DISTURBANCE TERM
5. NUMERICAL PROPERTIES OF OLS
6. SUMMARY

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
2
____________________________________________________________________________________________________

1. POPULATION AND SAMPLE REGRESSION FUNCTION


The Population Regression Function (PRF) can be defined as follow:
= � + � +
Where is the dependent variable; is the explanatory variable; is the disturbance
term; � � are fixed but unknown parameters.

The right hand side of the Population Regression Function can be divided into two parts:
(1) The systematic components: � + � = ��
(2) The disturbance term:
We need to estimate the population parameters � � from a given sample since it is
not possible to observe the whole population. The sample counterpart of the PRF is
known as the Sample Regression Function (SRF) which can be expressed as follows:

� = + � + � ℎ � = , ,………,
= ̂� + �

Where ̂� = + � is known as the fitted value of � . ̂� is also known as the


estimated conditional mean of � . The error term � measures the deviation of the sample
value � from the estimated conditional mean ̂� .

The sample counterparts of the PRF of various parameters and terms can be seen as
follows:
Population Sample


�� ̂�

The sample estimates of can be calculated for a given sample but these
estimates will change from sample to sample. However the population parameters
� � are fixed but remain unknown. The relationship between sample and
population regression lines can be seen from figure 1 below:

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
3
____________________________________________________________________________________________________

2. METHODS FOR ESTIMATING RE LGRESSION MODE

We want the deviation of � from ̂� to be as minimum as possible. In other words we


want the least � possible. This can be done in three different ways.

(1) Minimize the sum of the deviations


(2) Minimize the sum of the absolute deviations
(3) Minimize the sum of the squared deviations
Criterion 1: Minimize the sum of the deviations

According to this criterion the values of would be chosen in such away that
the sum of all the errors are (near) zero. This can be achieved by minimizing the
following function
min |∑ �|
�=
Although this criterion is intuitively appealing, it has serious problem. The residuals of
positive signs can be compensated by the residuals of negative signs. So there could be
infinite number of lines which have the same sum of residuals ∑�= � equal to zero, no
matter what its slope or intercept are.

Criterion 2: Minimize the sum of the absolute deviations

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
4
____________________________________________________________________________________________________

According to this criterion the values of would


be chosen in such away that it minimizes the sum of the absolute deviations. This can be
achieved by minimizing the following function:

min ∑| � |
�=
This approach is also known as the ‘minimum absolute distance’ (MAD) estimator
because it minimizes the distance between � ̂� . This approach avoided the
possibility of positive errors being compensated by negative errors. In this approach all
the deviations are given equal weight and it is more resistant to influence by outliers.
However it is not very popular because their calculations are complicated and involves
linear programming or iterative calculations.

Criterion 3: Minimize the sum of the squared deviations

According to this criterion, the values of would be chosen in such away that it
minimizes the sum of the squared deviations. This can be achieved by minimizing the
following function:

min ∑ �
�=

This approach is known as the least square estimation method. This criterion avoided the
problem of compensation of residuals as we square the residuals. This approach puts
more weights on observations with large deviations and less weights on observation with
small deviations. It is also easy to calculate and obtain the least square estimates. The
least square estimates has some very useful properties under some relatively general
conditions.

3. THE LEAST SQAURE METHODS


Recall from the Sample Regression Function
� = ̂� + �
� = � − ̂�
� = � − − �

We want to minimize the sum of squared errors (ESS) or the residual sum of squares
(RSS)
i.e.min ∑�= � . This is shown graphically in figure 2 below:

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
5
____________________________________________________________________________________________________

�= ∑ �
�=

= ∑ � − ̂�
�=

= ∑ � − − �
�=

=∑ � − � − − � � + + � + �
�=
The estimates of and are obtained by partially estimating the above equation with
respect to and

= ∑ − � + + �
�=

= − ∑ � − − �
�=

= − ∑ �
�=

= ∑ − � � + � + �
�=

= − ∑ � − − � �
�=

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
6
____________________________________________________________________________________________________

= − ∑ � �
�=
We equate the above two equations to 0

= − ∑ �− − � =
�=

= − ∑ � − − � � =
�=
We then obtain

∑ � = + ∑ �
�= �=

∑ � � = ∑ � + ∑ �
�= �= �=

These two equations are known as the Ordinary Least Squares Normal Equations. They
represent two equations and two unknowns. Solving them gives and

∑�= � − ̅ � −̅
=
∑�= � − ̅
,
=
= ̅− ̅

3.1. Necessary Requirement for OLS estimate

We can always compute the OLS estimates for a particular sample as long as∑�= � −
̅ > . In other words all the � should not be equal and there should be some variation
in � .

3.2. Interpretation of the Coefficients

: Geometrically represents the intercept and it denotes the point where the
regression line cuts the y-axis. Econometrically it represents the average value of when
= . It may or may not have substantial meaning depending on the problem.

: It captures the change in when changes by 1 unit. It can also be interpreted in


terms of derivatives and marginal effects.

Consider a simple regression model


� = + �+ �
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
7
____________________________________________________________________________________________________

The marginal effect of on is calculated as



=

Thus measures the marginal effect of on
When we have additive model like above, the marginal effect of on is the same as the
effect of a one-unit increase in on . However when we have multiplicative or non-
linear model, they are not the same.

4. VARIANCE AND STANDARD ERROR OF THE OLS


ESTIMATES

We know that the variance of a random variable measures the dispersion of that variable
around its mean. If the variance is small, then the individual variable is closer to their
mean. A random variable with smaller variance will also have narrow confidence interval
for the parameter. Therefore the precision of an estimator is captured by the variance of
an estimator. Hence it is worth computing the variances of ordinary least square estimates
respectively.

4.1 The variance and standard error of the OLS estimates:

It should be noted that the OLS estimates , depends on the dependent variable
� . The � in turns depend on the disturbance terms , , … … . , � . Therefore the OLS
estimates are random variables with associated distributions.
The variance and standard error of the ordinary least square estimates are
given as below:
∑� �
� = �
∑� � − ̅

∑� �
� = √ �
∑� �− ̅

Where � is the homoscedastic variance of the disturbance term term



� =
∑� � − ̅

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
8
____________________________________________________________________________________________________


� =
√∑� � − ̅
The covariance between the OLS estimates and are
− ̅
�� , = � =− ̅
∑ �− ̅

From the above formulae, it is clear that when there is large variation in � and sample
size is large, the variance and corresponding standard error of the estimates are smaller.
And smaller variances actually improve the precision of the ordinary least squares
estimates.

The problem with the above expression is that the population variances are unknown
precisely because � is unknown.
4.2 Variance estimates of disturbance term

The population variance � can be estimated from the sample. Consider the following
equation:
̂� = + �
The above equation is a straight line. So the estimate of � can be obtained as follows:
� = �− − � where � is the estimate residual. So an estimator of � is found by
estimating the variance of error term and correcting it for the loss of degrees of freedom
for calculating . So unbiased estimator of � can be obtained as follows:

∑� � �
= �̂ = =
− −

Where is the OLS estimator of the true but unknown � and − are the degrees of
freedom.The standard error of the error term is found by taking the square root of the � .
It is also known as root mean square error or standard error of the disturbance term.

∑� �
=√

Since is an estimate of the variance of the error term � , it is also an estimate of the
variance of � conditioned on �

Numerical Example 1

We illustrate an economic theory by considering a Keynesian Consumption function. The


fundamental Psychological Law states that on average consumption rises as income
increases but the increase in income is less than the increase in consumption. In other
words the marginal propensity to consume is greater than zero but is less than one. The
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
9
____________________________________________________________________________________________________

exact functional relationship between income and


consumption is not specified by Keynes. However for the sake of simplicity, we assume
that the relationship between consumption and income is linear. The raw data for weekly
consumption expenditure and weekly income of a family are given below:

Weekly 800 1200 1450 1500 1850 2250


Consumption
expenditure
(In Rupees)
Income 1000 1500 2000 2500 3000 3500
(In Rupees)

From the above given data


(1) Find the Ordinary least square estimate of intercept and slope coefficient.
(2) Find the estimated regression equation for weekly consumption expenditure and
weekly income
(3) Interpret the economic meaning of intercept and slope coefficient.
(4) Find the predicted value of weekly consumption expenditure when the weekly
income is Rs 3300
(5) Calculate the variance and standard deviation of the disturbance term. Interpret its
meaning.

Solution:
Here weekly consumption expenditure is the dependent variable and income is the
explanatory variable. Let represent the dependent variable and represent the
explanatory variable. In simple linear regression model the relationship between Weekly
Consumption expenditure and Weekly income can be written as follows:
� � = + ∗� +

The calculation for slope and intercept coefficient is given in the table below:

−̅ − ̅ − ̅ − ̅ −̅
800 1000 -708.33 -1250 1562500 885412.5
1200 1500 -308.33 -750 562500 231247.5
1450 2000 -58.33 -250 62500 14582.5
1500 2500 -8.33 250 62500 -2082.5
1850 3000 341.67 750 562500 256252.5
2250 3500 741.67 1250 1562500 927087.5
∑ − ̅ −̅ =
∑ =9050 ∑ =13500 ∑ − ̅ = 2312500
4375000

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
10
____________________________________________________________________________________________________

The mean value for the weekly consumption expenditure can be calculated as given
below:

̅= = = .

Thus the average weekly consumption expenditure is Rs. 1508.33

Again the mean value for weekly income can be calculated as given below:

̅= = =
Thus the average weekly income of a family is Rs 2250.

Now, we calculate the slope and intercept coefficient as given below:

∑ − ̅ −̅
= = = .
∑ − ̅
= ̅− ̅= .

(a) The slope coefficient is 0.52857 and the intercept coefficient is 319.0476

(b) The estimated regression equation for weekly consumption expenditure and
weekly income can be written as follow:
�̂ = . + . ∗�

(c) The value of = . measures the slope of the regression line. It shows
that when the value of weekly income for a family lies between Rs 1000 and Rs
3500 and as income increases by Re 1, the average increase in estimated
weekly consumption expenditure is Re 0.52857. So if the weekly income
increases by Rs 100, then on average the weekly consumption expenditure will
rise by Rs 529 approximately.

The value of = . measures the intercept of the regression line. It


indicates that the average level of weekly consumption expenditure of family is
equal to Rs 319 when the weekly income is equal to zero. So if a family does not
have any weekly income, it will have to finance some basic level of consumption
either by dissaving or borrowing. This makes lots of sense. However such
mechanical interpretation of intercept term may not always be meaningful in other
regression models.
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
11
____________________________________________________________________________________________________

Therefore it is best to interpret the intercept term as the average or mean effect of
the dependent variable when all the explanatory variables are omitted from the
regression model.

(d) The predicted value of weekly consumption when weekly income is Rs 3300 can
be obtained as follows:

�̂ = . + . ∗ = .

Therefore when weekly income is Rs 3300 the expected weekly consumption


expenditure is Rs 2063. This is the usefulness of regression analysis as they can
be use in predicting the value of one variable based on the given value of another
variable.

(e) The variance and standard variance of the disturbance term can be calculated as
follows:


�̂ = =

ℎ �= ∑ � − ∑ � − ∑ � �

Here ∑ � = ; ∑ � = ∑ � � =
So �= − . ∗ − . ∗ = .
44794.47
Therefore, �̂ = = 4
= .
And � = √ . = 105.8235

The variance and standard variance of the disturbance term are 11198.617 and 105.8235
respectively.

The interpretation for the standard deviation is as follows: The standard deviation of
105.8235 is the magnitude of typical deviation from the estimated regression line. So
some points are closer to the regression line and other points are farther away from it.

Numerical Example 2

It is widely believed in labour economics that wage earning depends on the level of
education. so the relationship between monthly earning and the number of years of
education are as follows:

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
12
____________________________________________________________________________________________________

̂� = . + . �
= .
= .
= .
= .
�= .

(a) Interpret the slope and intercept coefficient of the above wage-education
regression model
(b) What does the standard deviation of disturbance term indicate?
Answer:
(a) There is positive relationship between the level of education and the monthly
wage earning. For every increase in additional year of schooling raises the
monthly wage earning by 42% approximately. The intercept term is positive
however there is no economic meaning attached to it.

(b) The standard deviation of the disturbance term is small indicating that the
individual values of sample data do not divert away from the regression line.

5. SUPURIOUS REGRESSION MODEL

Consider the following model:


=� + � +

In the above specification of the model we assumed implicitly that causes . We


generally use as a measure of goodness of fit. However it cannot be used to identify
the direction of causality. So even if and are highly correlated it does not provide a
clue on whether the changes in causes or the changes in causes . For instance the
correlation coefficient between elephant population and human population may be quite
high in Assam. Does this mean that the change in elephant population causes the change
in human population or vice-versa? This is clearly not the case and we have a situation of
spurious correlation. In such cases, if we regress one variable against the other, we will
have a spurious regression.

Consider a second more realistic example. Suppose we run a regression model in which
the number of crime in a city is consider as a dependent variable and the number of
policemen is taken as an explanatory variable. Let us assume that we obtain positive
slope coefficient. In such a situation can we say that more number of policemen in a city
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
13
____________________________________________________________________________________________________

increases the number of crime in that city? Common sense


will tell us that the answer is definitely no. Instead it is possible that due to increase in
crime the city employ more policemen. A more plausible regression model will be to
consider the number of crime as independent variable and the number of policemen as
dependent variable. It is also possible that are other factors that needs to be incorporated
in the regression model. Therefore due care needs to be taken based on economic theory
and other information before formulating a regression model.

6. NUMERICAL PROPERTIES OF OLS


The OLS estimates obtained from the sample data always satisfy the least square criteria.
They can be used to draw the sample regression line. The sample regression line obtained
from the OLS estimates has the following numerical properties.

1. The sample regression line always passes through the sample means of and
̅= + ̅
This property hold when the sample regression model has an intercept term .
To derive the above equation recall that = ̅− ̅ which can be re-written as
̅= + ̅ . So the predicted value of the dependent variable is ̅ when the
explanatory variable is ̅

2. The sum and the average value of the residuals � are zero.
∑ � = and ̅ =

To prove the above property recall the derivation of the least square estimates

= − ∑ �− − � =
�=
Or − ∑�= � =
So ∑�= � = and hence ̅ =

3. The mean value of the predicted (say ̂ ) is equal to the mean value of the actual

̂� = + � = ̅− ̅+ � = ̅+ � − ̅

Summing both the sides over the sample values and dividing through the sample size n
we get

∑�= ̂� ∑�= ̅ ∑�= � − ̅


= +
̅
̂� = ̅
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
14
____________________________________________________________________________________________________

4. The regressors and the errors are uncorrelated. i.e the covariance between
regressors and residuals are zero.
� �, � =
Or ∑�= � � =

To prove the above recall the equation of the OLS estimates



= − ∑ �− − � � =
�=

∑ � � =
�=
5. The predicted value of � and the errors are uncorrelated. i.e, the covariance
between predicted value of � and the errors are zero.

� ̂� , � =

∑ ̂� � =
�=

Proof:

∑ ̂� � = ∑ + � �
�= �=

= ∑ � + ∑ � � =
�= �=

because of the fact that


∑�= � � = and∑�= � =

7. SUMMARY
1. The sample regression function is used to estimate the population regression function
because it is not possible to observe the population parameters.

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
15
____________________________________________________________________________________________________

2. There are basically three methods that can be used to


estimate the sample population regression function. They are: (a) Minimizing the sum of
the deviations (b) Minimizing the sum of the absolute deviations and (c) Minimizing the
sum of the squared deviations

3. The least square method is chosen as the best way to estimate the sample regression
function. The least square estimates possessed interesting statistical properties.

4. The intercept coefficient of the ordinary least square represents the average value of
the dependent value when the explanatory variable is zero. The slope coefficient captures
the change in dependent variable when the explanatory variable changes by one unit.

5. The standard errors measure the precision of the ordinary least squares estimates. If the
standard errors are small then the estimates are precisely estimated.

6. In order to avoid running spurious regression model, one must be careful to choose the
dependent and independent variable using economic theory and other prior information.

7. The ordinary least squares estimates had many interesting numerical properties. They
are (a) the sample regression line always passes through the sample means of and (b)
The sum and the average value of the residuals � are zero. (c) The mean value of the
predicted (say ̂ ) is equal to the mean value of the actual (d) The regressors and the
errors are uncorrelated (e) The predicted value of � and the errors are uncorrelated.

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 2, ESTIMATION OF REGRESSION
ANALYSIS
16
____________________________________________________________________________________________________

Subject Business Economics

Paper No and Title 8, Fundamentals of Econometrics

Module No and Title 3, The gauss Markov theorem

Module Tag BSE_P8_M3

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

1
____________________________________________________________________________________________________

TABLE OF CONTENTS

1. INTRODUCTION
2. ASSUMPTIONS OF GAUSS MARKOV THEOREM
3. GAUSS MARKOV THEOREM AND PROOF
3.1. PROOF THAT OLS ESTIMATOR ARE LINEAR AND UNBIASED
3.2. PROOF THAT OLS ESTIMATOR IS EFFICIENT
3.3. PROOF THAT OLS ESTIMATOR IS CONSISTENT
4. GOODNESS OF FIT
4.1. MEASURES OF VARIATION
4.2. COEFFICIENT OF DETERMINATION
4.3. COEFFICIENT OF CORRELATION
5. SUMMARY

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

2
____________________________________________________________________________________________________

1. INTRODUCTION

Using OLS we estimate the parameters from the sample regression function.
However this estimates of are from the sample regression function. So we
need to make some assumptions about the population regression function so that the
sample estimates of can be used to make inferences about the population
estimate� � . These sets of assumptions are known as Classical Linear Regression
Model (CLRM) Assumptions.

Under these assumptions the OLS estimators has very good statistical properties. So these
assumptions are also known as the Gauss Markov Theorem assumptions. We now look at
those Gauss Markov assumptions for the Classical Linear Regression (CLRM) Model.

2. ASSUMPTIONS OF GAUSS MARKOV THEOREM

Assumption 1: (Linear Regression Model): The regression model is linear in


the parameters. It need not be linear in explanatory variables

� = � + � � + �

Assumption 2: ( � Values are Non-Stochastic): The values taken by the explanatory


variables remain unchanged in repeated samples. So the regression analysis is a
conditional regression analysis because it is conditional on the given value of �

Assumption 3: (Conditional mean of disturbance term is zero): Given the value of


explanatory variables the conditional mean of disturbance term is zero

�⁄ � =

If this assumption is violated then

[ �⁄ �] ≠ � + � � which is certainly not desirable.

This assumption also implies that information which are not captured by explanatory
variable (s) and falls into the error term are not related to the explanatory variable (s) and
hence do not systematically affect the dependent variable.
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

3
____________________________________________________________________________________________________

Assumption 4: (Homoscedasticity): The conditional variance of the disturbance term


given the values of the explanatory variables are the same for all the observations.

�⁄ � = �

By definition

�⁄ � = [ � − �⁄ � ]

Since by assumption 3: �⁄ � = we have

�⁄ � = � ⁄ � = � �

Diagrammatically the concept of homoscedasticity is shown in figure 1 where the


variation around the regression line is same for all values of � . On the contrary the
concept of heteroscedasticity is shown in figure 2 where the conditional variance of the
population varies with .

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

4
____________________________________________________________________________________________________

Assumption 5: (No Autocorrelation): The correlation between any two disturbance terms
� and � ≠ given any two values � and � are zero.

� , ⁄ � , = {[ � − ]⁄ � }{[ − � ]⁄ }

= [ �⁄ � ⁄ ]=

Assumption 6: Zero Covariance between disturbance term and explanatory variable or


� � =

� � = [ � − � ][ � − � ]

= [ � � − � ] � � � ℎ �

= [ � �] − [ � ] [ � ] Since � =

= �

This basically says that the explanatory variables are uncorrelated with the disturbance
term. So the values of the explanatory variables has nothing to say about the disturbance
term.

Assumption 7: (Identification):

To find unique estimates of the normal equations, the number of observations must be
greater than the number of parameters to be estimated. Otherwise it would not be possible
to find unique OLS estimates of the parameters.

Assumption 8: < <∞

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

5
____________________________________________________________________________________________________

To find the OLS estimates there should be some


variability in the value of the explanatory variables. In other words all the values of �
cannot be the same i.e.

∑� �− ̅
< <∞

If all the values of � are the same we have∑� � − ̅ = . Thus it will not be possible
to estimates the OLS estimates.

Assumption 9: The disturbance term � is assumed to be normally distributed

� ~ �� ,� � = , ,……….,

Where NID stands for Normal Independently Distributed. The normality assumption of
the disturbance term implies that � is also normally distributed. This assumption is
necessary for constructing confidence intervals of � � and hence for conducting
hypothesis testing.

Assumption 10: (Correct functional form Specification)

The functional form of the regression model need to correctly specify. Otherwise there
will specification bias or error in the estimation of the regression model.

Assumption 11: (No Multicollinearity). When the regression model has more than one
explanatory variables there should not be any perfect linear relationship between any of
these variables.

The above assumptions about the regression models relates to the population regression
function. Since we can only observed the sample regression function and not the
population regression function we cannot really know if the above assumptions are
actually valid.

3. GAUSS MARKOV THEOREM

The Gauss Markov Theorem basically states that under the assumptions of the Classical
Linear Regression Model (assumptions 1-8), the least squares estimators are the
minimum variance estimators among the class of unbiased linear estimators; that is, they
are BLUE.

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

6
____________________________________________________________________________________________________

We need to prove that the OLS estimators are (i)


Unbiased (ii) Efficient and (iii) Consistent

3.1. Proofthat OLS estimator are linear and unbiased.

The OLS estimator is unbiased if its expected value is equal to population parameter
� . The estimator is a random variable and takes on different values from sample to
sample. However unbiasedness property implies that on average the value of is equal
to the population parameter �

We know that the OLS estimates


∑�= � − ̅ �− ̅
=
∑�= � − ̅
∑�= � − ̅ � − ̅ ∑�= � − ̅
=
∑�= � − ̅
∑�= � − ̅ �
=
∑�= � − ̅

= ∑ � �
�=

Where
∑�= � − ̅
� =
∑�= � − ̅


The � has the following properties

∑ � =
�=
∑�= �− ̅
∑ � = =
[∑�= � − ̅ ] ∑�= � − ̅
�=
∑�= � − ̅ �
∑ � � = =
∑�= �− ̅
�=

To prove the unbiasedness of the OLS estimator we need to rewrite our estimator in
terms of population parameter.

= ∑ � �
�=

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

7
____________________________________________________________________________________________________

= ∑ � � + � � + �
�=

=� ∑ � + � ∑ � � + ∑ � �
�= �= �=

= � + ∑ � �
�=

The OLS estimator is thus a linear function of � .The explanatory variable (s) are
assumed to be non-stochastic. So the � are also non-stochastic as well.
Taking expectation operator both the sides we have

= � + ∑ � � =�
�=
Therefore OLS estimator is an unbiased linear estimator of �

3.2. Proof that OLS estimator is efficient

The OLS estimator has the second desirable property of being an efficient estimator. This
efficiency property relates to the variance of the estimator. We have to prove that the
variance of OLS estimator has the smallest variance among all the possible estimators.
To prove this we have to first define an arbitrary estimator �̃ which is linear in .
Secondly we impose restrictions implied by unbiasedness. Lastly we will show that
variance of arbitrary estimator �̃ is larger than (or atleast equal to) the variance of OLS
estimator

Let �̃ be an arbitrary estimator which is linear in .

�̃ = ∑ � �
=
Next we substitute the Population Regression Function in �

�̃ = ∑ � �
�=

= ∑ � � + � � + �
�=

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

8
____________________________________________________________________________________________________

=� ∑ � + � ∑ � � + ∑ � �
�= �= �=

For the estimator �̃ to be unbiased we need the following restrictions to hold

∑ � = � ∑ � � =
�= �=

�̃ = � + ∑ � �
�=

The variance of this arbitrary estimator �̃ is

[�̃ − ] = [∑ � �] = � ∑ �
�= �=

� − ̅ � − ̅
= � ∑[ � − + ]
∑�= �− ̅ ∑�= �− ̅
�=

� − ̅ � − ̅
= � ∑[ � − ] + � ∑[ ]
∑�= �− ̅ ∑�= � − ̅
�= �=

� − ̅ � − ̅
+ � ∑[ � − + ]
∑�= �− ̅ ∑�= � − ̅
�=
It can be shown that the last term in the above equation is zero
� − ̅ �− ̅
� ∑[ � − + ]
∑�= � − ̅ ∑�= � − ̅
�=

� − ̅ � − ̅
= � ∑[ � − ] − � ∑[ � + ]
∑�= � − ̅ ∑�= � − ̅
�= �=
= � − � =

[�̃ − ] = � ∑[ � − �] + [ −� ]
�=
�� −�̅
Where � = ∑�
�= �� −�̅

The first term on the Right Hand Side is always positive except when � = � for all
values of i.
So [�̃ − ] [ −� ]

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

9
____________________________________________________________________________________________________

3.3. Proof that OLS estimator is consistent

The property of consistency is a large sample property or an asymptotic property unlike


the property of unbiasedness which holds for any sample size. By consistency we
basically mean that as the sample size tends to infinity the density function of the
estimator collapses to the parameter value. So an OLS estimator is said to be
consistent if
Plim = �
→∞

Where � � means probability limit. In other words converges in probability to �


The operator � � has an invariance property for any continuous function. So if �̂ is a
consistent estimator of � and if ℎ (�̂ ) is any continuous function of �̂ then
Plim →∞ ℎ (�̂ ) = ℎ � . Therefore if �̂ is a consistent estimator of � then ln �̂
̂

are also consistent estimator of � ln � respectively.

This property of invariance does not hold valid for the expectation operator . For
instance if �̂ is an unbiased estimator of �ie (�̂ ) = �. However this does not mean that
̂ is an unbiased estimator of � (ie ̂ ≠ �(�̂) ≠ �. This is because the expectation
� �
operator applies only to linear functions of random variables while � � operator is valid
for any continuous function.

We know that

∑�= � − ̅ � −̅
=
∑�= � − ̅
∑�=�− ̅ � − ̅ ∑�= � − ̅
=
∑�= � − ̅
∑�= � − ̅ �
=
∑�= � − ̅
∑�= � − ̅ � + � � + �
=
∑�= � − ̅
� ∑�= � − ̅ � ∑�= � − ̅ � ∑�= �− ̅ �
= + +
∑�= � − ̅ ∑�= � − ̅ ∑�= � − ̅
∑�= � − ̅ �
= � +
∑�= � − ̅

Take � � operator on both the sides


∑�= � − ̅ �
Plim = Plim [� + ]
→∞ →∞ ∑�= � − ̅

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

10
____________________________________________________________________________________________________

Plim ∑�= � − ̅ �
→∞
= � +
Plim ∑�= � − ̅
→∞

We divide both the numerator and denominator in the second term by so that the
summation does not goes to infinity when → ∞. Then next we apply the law of large
numbers to both numerator and denominator. According to Law of Large number that
under general conditions, the sample moments converge to their corresponding
population moments.

,
�� = � + = �
Provided ≠ . Note that , = [ − ̅ ]= − ̅ [ ]=

Therefore OLS estimator is a consistent estimator.

4. GOODNESS OF FIT
We have estimated our model parameters using OLS and have seen how they have
various desirable statistical properties under certain assumptions. But we are still not sure
if the estimated model fits the data well. If all the observations of the sample lie on the
regression line then we say that the regression model fits the data perfectly. Usually, we
will have some negative and some positive residual term. We want that these residuals
around the regression line as minimum as possible. The coefficient of determination
provides a summary measure of how well the sample regression line fits the data.

4.1 Measures of variation

Recall that the Sample Regression Function is


� = + �+ �

Summing both the sides and dividing it by the sample size we have
̅= + ̅

Subtracting (2) from (1) we have


�−
̅= �−
̅ + �

Writing equation (3) in deviation form we have


� = �+ �
� = ̂� + �

Squaring both the sides and taking summation over the sample we have
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

11
____________________________________________________________________________________________________

∑ � = ∑ ̂� + ∑ � + ∑ ̂� �
�= �= �= �=

The Last term is zero by the assumption that the covariance of fitted value and error is
zero

∑ � − ̅ = ∑( ̂ − ̅ ) + ∑ �
�= �= �=

= +
Or, Total Sum of Squares (TSS) = Explained Sum of Squares (ESS) + Residual Sum of
Squares (RSS)
Where

= ∑�= − ̅ is the total variation of actual values about their sample mean

= ∑�= ( ̂ − ̅ ) = ∑�= ̂ − ̅̂ = ∑�= � − ̅ is the variation of estimated


values about the sample mean
= ∑�= � is the residual or unexplained variation of actual about regression line.

Therefore the Total Variation in can be decomposed into two parts (1) ESS which is
the part accounted for by and (2) RSS which is the unexplained and unaccounted part.
RSS is known as unexplained part of variation because the residual term captures the
effect of variables other than the explanatory variable that are not included in the
regression model.

4.2. Coefficient of Determination

We have TSS = ESS + RSS

Now divide both sides by TSS we have


= +
∑�= ( ̂ − ̅ ) ∑�= �
= +
∑�= � − ̅ ∑�= � − ̅

We define as follows
∑�= ( ̂ − ̅ )
= =
∑�= � − ̅

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

12
____________________________________________________________________________________________________

Therefore measures the percentage of the total


variation in � that is explained by the regression model. In other words, it is the
proportion of Total Sum of Squares (TSS) which is explained by the Explained sum of
squares (ESS).

Alternatively, can also be defined in another form by little manipulation of formulae.


∑�= �
= − = −
∑�= � − ̅

So, is now equal to 1 minus the total sum of squares that is not explained by the
regression model (Residual Sum of Squares). When the observed points are closer to the
estimated regression line, then we say that the data fits the model very well. In such case
ESS will be higher and RSS will be smaller. We want which is a measure of goodness
of fit to be very high. When is low, this means that there are lots of variations in
which cannot be explain by

There are other interpretations for . It also measures the correlation between the
observed value � and the predicted value ̂� ( �,�̂ ). Therefore

̂
= ̂
( �, ̂� ) ÷ ̂� ( ̂� ) = =
��̂

So squaring the simple correlation between � and ̂� gives the coefficient of


determination . This result is valid for multiple regression models as well provided the
regression model has a constant term.

The question which commonly arises relates to the value of the goodness of fit. There is
no rule which suggest what value of is considered as high and what is considered as
low. For time series data the value of is usually high and above 0.9. However for
cross-sectional data, value of 0.6 or 0.7 may be considered as good. We should be
cautious not to depend too much on the value of . is simply one measure of model
adequacy. We should be more concerned about the signs of the regression coefficients
and whether they conform to economic theory or prior informations.

Properties of ��

1. is a non-negative number.
2. It is unit free as both the numerator and the denominator have the same units
3. The following relationship will hold for coefficient of determination
.

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

13
____________________________________________________________________________________________________

When there is perfect relationship between and we


have = and hence = . So all the variation in is explained by the linear
regression model and we have = . When there is no relationship between and
we have = as = . Thus = and = . So all the variation in is left
unaccounted for by the model.

4.3. Coefficient of Correlation

The concept of Coefficient of Correlation is quite different from that of goodness of fit.
However they are closely connected. The Coefficient of Correlation measures the degree
of association between two variables. The sample correlation coefficient can be obtained
as follows:
∑ �− ̅ �−̅
=
√∑ � − ̅ ∑ � − ̅

Alternatively the coefficient of correlation could be obtained as follows:


= ±√

Properties of Coefficient of Correlation

1. The sign of Coefficient of Correlation can be positive or negative depending upon


the sign of sample covariance between

2. It can lie between -1 and +1. So − .

3. The Coefficient of Correlation is symmetrical in nature. So Coefficient of


Correlation between is equal to Coefficient of Correlation
between .

4. The change in origin and scale of measurement does not affect the measurement
of the coefficient of correlation. Suppose �∗ = � + and �∗ = + where
∗ ∗
, , are constants. The correlation coefficient between � � and the
correlation coefficient between �∗ ∗
� are the same.

5. If are statistically independent then the coefficient of correlation


between them is zero. However if the correlation coefficient is zero, this does not
necessary mean that are independent of each other.

6. The Coefficient of Correlation measure on linear association or dependence. So it


is not meaningful to describe nonlinear relationships.

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

14
____________________________________________________________________________________________________

7. The Coefficient of Correlation does not imply any


cause-and-effect relationship between variables.
The goodness of fit is more meaningful than the coefficient of correlation in the
regression context. The goodness of fit measures the proportion of variation in dependent
variable that is caused by the explanatory variable. It provides up to what extent does the
variation in one variable determined the variation in other variable. The coefficient of
correlation does not have such significant meaning.

5. SUMMARY
1. The Classical Linear Regression Model is based on as set of assumptions known as the
Gauss Markov assumptions.

2. The Gauss Markov assumptions include assumption of linearity in parameter, non-


stochastic value of explanatory variable, expectation of disturbance term is zero,
homoscedasticity of disturbance term, no auto correlation between error terms, no
covariance between error term and disturbance term, identification of equation,
variability of explanatory variables, normality of error term, correct functional form.

3. The assumptions under the Classical Linear Regression Model are necessary to prove
the Gauss Markov Theorem. The Theorem basically states that under these assumptions,
the least squares estimators are the minimum variance estimators among the class of
unbiased linear estimators; that is, they are BLUE (Best Linear Unbiased Estimator)

4. The OLS estimator is unbiased if its expected value is equal to population parameter
� .The property of unbiasedness implies that on average the value of is equal to the
population parameter �

5. This efficiency property of estimator relates to the concept of the smallest variance of
the estimator. The variance of OLS estimators has the smallest variance among all the
possible estimators.

6. The property of consistency is a large sample property which basically means that as
the sample size tends to infinity, the density function of the estimator collapses to the
parameter value.

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

15
____________________________________________________________________________________________________

7. The Total Variation in (TSS) is a sum of two parts (1) Explained Sum of Squares
(ESS) which is the part accounted for by and (2) Residual Sum of Squares (RSS) which
is the unexplained and unaccounted part.

8. The coefficient of determination measures the overall goodness of fit of the regression
model. It tells what proportion of the variation in the dependent variable is explained by
the explanatory variable.

9. The coefficient of determination lies between 0 and 1.The closer it is to 1 the better is
the overall goodness of fit of the model. There is no rule which says that such level of
coefficient of determination is high and such level is low. The sign of regression
coefficient is very important.

10. The Coefficient of Correlation measures the degree of association between two
variables. It lies between− . The statistical independence of two variables
implies zero correlation coefficient but not necessarily vice-versa.

11. The Coefficient of determination and the Correlation Coefficient are related as
follows: = ±√

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

16
____________________________________________________________________________________________________

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM

17
Subject Business Economics

Paper No and Title 8, Fundamentals of Econometrics

Module No and Title 4, Further Aspect of the two variable linear regression
model
Module Tag BSE_P8_M4

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
TABLE OF CONTENTS

1. Introduction
2. Regression through the origin
3. for regression model through origin.
4. Change of Scale and Origin of Measurement
4.1 Changing the scale of measurement
4.2 Changing the origin
5. Regression on Standardized Variables
6. Functional Forms
6.1. Various Functional Forms
6.2. Choosing the Best Functional Forms
7. Summary

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
1.Introduction

In this chapter we will further discussed various aspects of linear regression analysis.
First we will study the case of regression through the origin, that is, a situation where the
intercept term � is missing from the model. Second we will study the units of
measurement of X and Y and how changes in the units of measurement of X and Y affect
the regression analysis. Finally we study the various functional form of the linear
regression model. We study models that are linear in parameters but non-linear in
variables.

2. Regression through the origin


Consider the two-variable population regression function where intercept is absent:
� =� �+ �
This model is named regression model through the origin.

The sample counterpart of the above PRF is the sample regression function (SRF) which
is
� = � + �

We apply OLS method to obtain the formula for calculating and its variance.
The SRF is

� = � + �

We want to minimize

∑ � =∑ � − �

Differentiating with respect to we have

∑ �
= ∑ � − � − �

Equating the above equation to zero we get

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
∑ � − � − � =

or, ∑ � � − ∑ � =

∑ � �
, =
∑ �

Now substitute the PRF : � =� �+ � into this equation , we have


∑ � � �+ � ∑ � �
= = � +
∑ � ∑ �

∑ � �
−� =
∑ �

Taking expectation operator on both sides

∑ � �
� −� = �[ ]
∑ �

Since � are nonstochastic and � is homoscedastic and uncorrelated we have


= [ −� ] =
∑ �
Where � is estimated by
∑ �
�̂ =
∑ �

Let us briefly compare the formulas in regression model with and without intercept term

�= �+ � �= + �+ �
∑ � � ∑ � − ̅� � − ̅ �
= =
∑ �
∑ � − ̅�
� �
� = � =
∑ �
∑ � − ̅�

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
∑ � ∑ �

̂ = �
̂ =
− −

An illustration:
Consider the Capital Asset Pricing Model (CAPM) of modern portfolio theory which
states that if capital market works efficiently, then security �′ expected risk premium
(� � − ) is equal to that security’s � coefficient times the expected market risk
premium(� − ). The CAPM postulate can be empirically expressed as follows:
�− = �� ( − )+ �
Where � is the return on security�
is the risk free return, say return of treasury bills
is the return on market portfolio
�� is a measure of systematic risk i.e risk which cannot be eliminated through
diversification
� represents the disturbance term

The postulate of CAPM has been shown diagrammatically in figure 1 given above.

3. for regression model through origin.


For a regression model without intercept the coefficient of determination can be defined
as
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
∑ � �
=
∑ � ∑ �

The row satisfies the relation < < but it is not directly comparable to the
conventional .It is good to stick to intercept model unless there is very strong a priori
expectation. This is because we will be committing a specification error if there is
intercept in the model and we insist on fitting regression through the origin. Second if
intercept term is absent but is included in the model it will turn out to be statistically
insignificant.

4. Change of Scale and Origin of Measurement


4.1. Changing the scale of measurement

Consider the following regression model


� = + �+ � �
Define �∗ = � and ∗
� = �
Where and are scale factors and they may or may not be equal.

Now the regression model with �∗ and �∗ are:



� =

+ ∗ �∗ + �∗ �
∗ ∗ ∗
Where � = � , � = � and � = �
The OLS estimates of equation (1) are

= ̅− ̅

∑ �− ̅ �−̅
=
∑ �− ̅

∑ �
= �
∑ � − ̅


=
∑ �− ̅

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
∑ �
�̂ =

The OLS estimates of equation (2) are:


∗ ∗ ∗
= ̅∗ − ̅


∑ �− ̅ ∗ �∗ − ̅ ∗
=
∑ �∗ − ̅ ∗



∑ �
= ∗ �∗
∑ � − ̅∗


�∗
=
∑ �∗ − ̅ ∗


∑ ��∗
�̂ =

Comparing the OLS estimates of both equations we get

∗ ∗
=( ) =
�̂ ∗ = �̂

=

= ( )
= ∗ ∗

Some special cases


1. =

∗ ∗
Then = and =
∗ ∗
= and =

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
So when we have same scaling factor, the intercept term and its standard error are
multiplied by . However, the slope coefficient and its standard error are the same.

2. scale remain unchanged �. . , =



=

=

=

=

The Slope and intercept coefficients along with their respective standard error are both
multiplied by

3. scale remain unchanged �. . , =



=

=( )

=

=( )

The slope and its standard error are multiplied by but the intercept and its standard
error are unaffected.

4.2. Changing the origin

The change of origin affected the intercept of the regression. However the slope
coefficient is unaffected. The origin can be change by either adding/subtracting a
constant to and/or .

Suppose ∗ = + and ∗
= +

= ∗+ ∗ ∗
+

Then


∑ �− ̅ ∗ �∗ − ̅ ∗ ∑ + − ̅− + − ̅−
= ∗ =
∑ � − ̅∗ ∑( + − ̅− )

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
∑ − ̅ −̅
= =
∑ − ̅

Subtracting a constant from changes the intercept in the following manner:


̂� = + ∗ + �−

Subtracting a constant from changes the intercept in the following manner:


̂� − = − + �

5. Regression on Standardized Variables


We have seen that the OLS estimates are affected by units in which dependent or
explanatory variables are expressed. To avoid this problem, both the variables can be
expressed in terms of standardized variables. A standardized variable is a variable in
which the mean value of the variable is subtracted from its individual values and the
difference is divided by the standard deviation of that variable. Thus standardized
variables are:
∗ � − ̅
� =

∗ � − ̅
� =

Where ̅ = sample mean of � , = sample standard deviation of � , ̅ = sample mean of


�, = sample standard deviation of � . The mean value of standardized variable is
always zero and its standard deviation is always 1

So we could run a regression on the standardized variables as



� =

+ ∗ �∗ + �∗ = ∗ �∗ + �∗
Since intercept term is zero in standardized regression

Interpretation of regression coefficient :

If the (standardized) explanatory variable increases by one standard deviation, on


average, the (standardized) dependent variable increases by ∗ standard deviation units.
The standardized regression model has advantage over the traditional regression model.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
When all the regressors are standardized we give them equal basis and are able to
compare them directly. The size of the coefficient of a regressor indicates its contribution
to the regressand. A regressor with larger coefficient contributes more to the regressand
than a regressor with smaller coefficient. Therefore the regressor coefficient measures the
relative strength of the various regressors.

There are two things to be noted. First, the notion of is not applicable to standardized
regression as it is regression through origin. Second, there is a relationship between
regression coefficient of standardized model and traditional model. The relationship for
bivariate case is as follows:

=

Where represents the standard deviation of the regressor and represents the
standard deviation of the regressand.

6. Functional Forms
For many of the economic applications, relationship between variables are non-linear in
nature. So we extend linear regression models to incorporate non-linearities (in variables)
by appropriately redefining the dependent and independent variables.

Consider the following regression model


� = � + � ln +
Where is the exogenous variable and ln is the regressor

Take another regression model


ln � = � + � +
Where � is the endogenous variable and ln � is the regressand.

Both the above models are linear in the parameters although they are not linear in the
variable (in first model) and variable (in the second model). Since both the models
are linear in parameters, OLS method can be used to estimate the parameters. However if
the models are not linear in the parameters, iterative methods must be used in the
estimation.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
There are certain nonlinear models which can be made linear by suitable transformation.
For instance consider the Cobb-Douglas production function with disturbance term in a
multiplicative form
= �� �

Taking natural logarithm in both the sides we have


= � + � ln +
So it became a linear model by logarithmic transformation

However if the disturbance term is an additive form we have


= � � +

This is a non-linearizable model as we cannot transformed this model into a linear model
Before we proceed to various functional forms we look at few definitions which will help
us later:
(1) The proportional change between and is given by

∆ −
=
To obtain proportional change in % we multiply proportional change by 100

%
(2) The logarithmic change between and is given by

∆ ln = ln − ln

In terms of percentage the logarithmic change is ∆ ln %

Table1: Examples of Proportional change and logarithmic change


102 110 140 160 200
100 100 100 100 100
Proportional 2% 10% 40% 60% 100%
Change in %
Logarithmic 1.9% 9.5% 33.6% 47% 69.3%
change in %

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
When the change is small, proportional change and logarithmic change are almost equal.
But as the change gets larger, the difference between them widens.

Concept of Elasticity

Elasticity measures the ratio of the relative changes of two variables. In terms of
proportional changes, the elasticity of can be defined as
∆ ⁄
� / =
∆ ⁄
Where ∆ = − ∆ = −

In terms of logarithmic change the elasticity of can be defined as



� / = =

The logarithmic definition of elasticity is usually used in econometric models
We now consider various functional forms which are linear in parameters. We will also
interpret the meaning of marginal effect of � on � in each case.

6.1. Various Functional Forms of regression model

We study some regression models that may be non linear in variables but are linear in
parameters or can be linearized using suitable transformation. They are:
1. The log-log model
2. Semilog models
3. Reciprocal models
4. The logarithmic reciprocal model
a.Log-Log model

One important use of double-log or log-log model in economics is in estimating demand


functions or production functions. Let us begin with a exponential regression model
which is as below:

� =


This can be written alternatively as
� = + � + �
Where ln=natural log (i.e., log to the base e and e=2.718).

We rewrite the above equation as follows:


BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
= +
� � + �
Where = . Such models are known as log-log or double-log model.

The log-log model can be estimated by Ordinary Least Square method by letting
∗ ∗
� = + � + �
Where �∗ = ∗
� and � = � . The OLS estimators are unbiased estimator.

The popularity of the log model lies in that the slope coefficient measures the
elasticity of Y with respect to X. For instance, if Y represents the quantity demand of a
commodity and X represents its price, then measures the price elasticity of demand.

The log-log model has two special features: First, the elasticity coefficient between Y and
X remains constant throughout. So no matter at which point of � you measure the
elasticity, it will remain the same. Second, even the estimator = � is biased
estimator although is an unbiased estimator. However, the intercept terms are not very
important in empirical studies.

Illustrative example

Consider the following results of the estimated regression model between consumption
expenditure and income which is below:
ln ̂ � = − . + . ln � �

The elasticity of consumption expenditure with respect to income is about 1.5 indicating
that if the income increases by 1 percent, then the expenditure increase by about 1.5
percent.

b.Linear – Log Model

The Lin-Log model helps us to find the absolute change in dependent variable � for a
percent change in independent variable � . A linear-log model is given by
� = + ln � +

The slope coefficient of a Lin-Log model is defined as follow:


�ℎ � � �ℎ � �
= =
�ℎ � � � ℎ � �

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
Symbolically the slope coefficient can be represented as follows:
∆ �
=
∆ �⁄ �

Where ∆ denotes a small change. Alternatively the above equation can be written as
∆ �= ∆ �⁄ �

The above equation states that the absolute change in � = ∆ � is equal to slope times
the relative change in � . If we multiplied the latter by 100, then the above equation gives
the absolute change in � for a percentage change in �

The most practical application of the Lin-Log model can be found in the Engel-
Expenditure models. According to Engel, the total expenditure spent on food increases in
arithmetic proportion and total expenditure increases in geometric progression.

c.Log-Linear Model

We are often interested in calculating the growth rate of important economic variable
such as GDP, Money Supply, Population, employment etc. Suppose we are interested in
finding the growth rate of a variable using compound interest formula
= +

Where is the compound growth rate of . Take logarithm on both the sides, we get
ln = + +

Suppose = ln = ln + . We get
ln = +

If we add error term to the above equation we get


ln = + +

The above model is known as semilog models because only one variable is in logarithmic
form. Here the explanatory variable is linear and the dependent variable is in logarithmic
form. So it is called lin-log model.

In lin-log model, the slope coefficient measures the relative change in dependent variable
for a given absolute change in the value of independent variable.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
� ℎ � �
=
ℎ � � �

The growth rate of can be obtained by multiplying by 100. Multiplying by 100


is also known as semi elasticity of Y with respect to X.

The coefficient gives us the instantaneous growth rate of the dependent variable .
However this is not the compound rate of growth over a period of time. The compound
rate of growth could be found by taking the antilog of estimated and subtracting 1
from it and multiplying the difference by 100.

Linear Trend Model

A Linear trend model can be written as follows:


= + +

Here we regress on time unlike in Log-Lin model where we regress on time. The
time variable is known as trend variable. We have downward trend in when the slope
coefficient is negative and upward trend in when the slope coefficient is positive.

The choice between the growth rate model and the linear trend model will depend on
whether we are interested in the relative or absolute change in the variable of our interest.

Illustrative example

Consider the regression result for growth rate of expenditure on private consumption over
a period 1990 to 2010.
ln = . + .

Find (1) the instantaneous growth rate of expenditure on private consumption over a
period 1990 to 2010
(2)Find the compounded growth rate on expenditure over the period 1990 to 2010.
Solution: (1) The instantaneous growth rate is 0.743 percent
(2)The compounded growth rate is: antilog (0.00743)-1 = 0.00746 or 0.746 percent

d.Reciprocal Model
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
A reciprocal Model is given by
= + ( )+

The above model is nonlinear in the variable as it enters reciprocally, but it is linear in
parameters . As the independent variable increases indefinitely, the term
tends to zero and the dependent variable approaches its limiting value

Illustrative example

Consider the modern version of Phillip Curve which can be expressed as:
� −� =� � − +

Where � = � � � �
� = � � � � � − . It is
usually replace by � − since expectation inflation is unobserved.
� = � �
= � �
=

In empirical studies, the relationship between inflation and unemployment can either take
a linear regression model and/or a reciprocal model.

Linear Model
� −� − = . − . �

Reciprocal Model
� −� − = − . + . ���

The Linear model shows that if the unemployment rate goes down by 1 percent, then on
average the rate of inflation will go up by 0.5789 percentage point and vice-versa. The
Reciprocal Model shows that even if the rate of unemployment increases indefinitely, the
rate of inflation will come down to at most 2.4 percentage point. The natural rate of
unemployment can be calculated from the linear model as given below:
.
= = = .
− .
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
Thus the natural rate of unemployment is about 5.989% approximately

e.Log Reciprocal Model

The Logarithmic Reciprocal Model takes the following form:

� = + ( )+

Such model can be used to capture the production function in short run. Our knowledge
from microeconomics tell us that if labour and capital are the inputs in production
function and we keep capital constant and keep increasing the labour input, then the short
run output-labour relationship will be capture by Log Reciprocal Model.

6.2. Choosing the best Functional form

The choice of the most appropriate functional form for empirical analysis requires lots of
skill and knowledge of economic theory and other information. The choice becomes
more difficult as the number of explanatory variable increases in the model. Some
practical guidelines for choosing the most appropriate forms are:

1. The economic theory may help us in selecting a particular functional form (eg
Modified Phillips Curve)

2. It is useful to calculate the slope coefficient of the regression model and the elasticity
of the dependent variable with respect to the explanatory variable. The following table
may be of help:

Model Equation �� � �
�� (= ) .
=
.
Linear = + ( )
Log-Log � = + �

Log-Linear � = +

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
Lin-Log = + �
( ) ( )
Reciprocal
= + ( ) − ( ) − ( )
Log Reciprocal �
( ) ( )
= + ( )

3. The regression coefficient of the chosen model should be in conformity with the priori
expectation according to economic theory. For instance, if we consider the demand
function of a commodity with respect to its price, then the slope coefficient should be
negative.

4. It is possible that more than one functional form satisfy the data very well. For instance
we can fit both linear and reciprocal model to the modern version of Phillip curve.

5. The usefulness of the coefficient of determination should not be over emphasised. As


we add more explanatory variables to the model we will have higher value of . What
actually is important is the correct sign of regression coefficient which should conform
economic theory and their statistical significance.

7. Summary
1. If a regression model does not contain an explicit intercept term, it is known as
regression through origin one should be cautions with estimating such models. In such
models, the sum of the residuals ∑ � is non zero and conventionally may not be
meaningful. It is better to introduce the intercept in the model explicitly unless there is a
strong theoretical reason.

2. The interpretation of regression coefficients depend on the units and scale in which the
dependent and the independent variable are expressed.

3. We discussed some important functional forms in this chapter. They are (a) the log-
linear or constant elasticity model (b) semilog regression models and (c) reciprocal
models.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
4. In the log-linear model both the regressand and the regressor(s) are in the log form.
The regression coefficient attached to the log of a independent variable is interpreted as
the elasticity of the dependent variable with respect to the independent variable.

5. In the semilog model, where the dependent variable is logarithmic and the independent
variable is time, the estimated slope coefficient (multiplied by 100) measures the
(instantaneous) rate of growth of dependent variable. In semilog model if the independent
variable is logarithmic, its coefficient measures the absolute rate of change in the
dependent variable for a given percent change in the value of the independent variable.

6. In the reciprocal models either the dependent variable or the independent variable is
expressed in reciprocal or inverse form to capture nonlinear relationships between
economic variables.

7. The choice of most appropriate functional form requires experience and knowledge of
economic theory. Instead of giving importance to the value of , we should look at the
sign of regression coefficient.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 3, FURTHER ASPECT OF THE TWO VARIABLE LINEAR
REGRESSION MODEL
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8, Fundamentals of Econometrics

Module No and Title 5, Probability Distribution of Least Square Estimates

Module Tag BSE_P8_M5

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

TABLE OF CONTENTS

1. Introduction
2. The Probability distribution of disturbances 𝑼𝒊
3. The normality assumption for 𝑼𝒊
4. Why do we need normality assumption?
5. Properties of OLS estimators under normality assumptions
6. Students’ t-statistics
7. Few important probability distribution
7.1. Normal Distribution
7.2. The Chi-Square Distribution
7.3. Student’s t distribution
7.4. The F-Distribution
7.5 Bernoulli Distribution
7.6. The Binomial Distribution
7.7. The Poisson Distribution
8. Summary

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

Learning Outcomes

After reading this chapter, the reader will be able to understand the following concepts:

 Classical Normal Linear Regression Model


 Normality assumption of disturbance term
 Probability distribution of disturbance term
 Probability distribution of OLS estimates
 Properties of OLS estimates under normality assumptions
 Normal, chi-square, 𝑡 𝑎𝑛𝑑 𝐹 distributions

1. Introduction

The classical theory of statistical inference consists of two branches namely estimation
and hypothesis testing. Using the method of OLS we were able to estimates the
parameters of the (two variable) regression model. It was also shown that these estimators
possessed several desirable statistical properties such as unbiasedness minimum variance
etc.( i.e., BLUE property). However these estimators change from sample to sample and
we are not sure if they can represent the population’s parameters.

Hypothesis testing deals with the issue of drawing inferences about the population
regression function from the sample regression functions. To relate the estimated
parameters (𝑏𝑜, 𝑏1, 𝜎̂ 2 ) to their true value (𝛽𝑜, 𝛽1 , 𝜎 2 ) we need to know the probability
distributions of 𝑏𝑜 ,𝑏1, 𝜎̂ 2

2. The Probability distribution of disturbances 𝑼𝒕

𝑥𝑖
We know that 𝑏2 = ∑ 𝑐𝑖 𝑦𝑖 where 𝑐𝑖 = ∑ 𝑥𝑖2
Since the 𝑥′𝑠 are assumed fixed, 𝑏2 is a linear
function of 𝑦𝑖 .

Rewriting the above equation in terms of population regression function we have

𝑏2 = ∑ 𝑐𝑖 (𝛽0 + 𝛽1 𝑋𝑖 + 𝑈𝑖 )

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

Since 𝑐𝑖 , 𝛽0 , 𝛽1, 𝑋𝑖 are fixed, 𝑏2 is a linear function of disturbance term 𝑈𝑖 which is


assumed to be random. So the probability distribution of 𝑏1 ( 𝑎𝑛𝑑 𝑏𝑜 ) will depend on the
probability distribution of 𝑈𝑖 .This assumption of probability distribution of 𝑈𝑖 is
necessary for hypothesis testing. If we assume that 𝑈𝑖 follow normal distribution and add
this assumption to the assumptions of the classical linear regression model (CLRM), we
get the classical normal linear regression model (CNLRM)

3. The normality assumption for Ui

The classical normal linear regression model assumes that each 𝑈𝑖 is distributed normally
with : E{[(Ui – E(Ui)] [ Us-E(Us)}= E(Ui Us)=0

Mean: 𝐸(𝑈𝑖 ) = 0
Variance: 𝐸[𝑈𝑖 − 𝐸(𝑈𝑖 )]2 = 𝐸[𝑈𝑖2 ] = 𝜎 2
: E{[(𝑈𝑖 - E(𝑈𝑖 )][𝑈𝑠 -E(𝑈𝑠 )]} = E( 𝑈𝑖 𝑈𝑠 ) = 0
𝐶𝑂𝑉 (𝑈𝑖 ,𝑈𝑠 ) 𝑖 ≠s

In a more compact form, the assumptions above can be stated as

𝑈𝑖 ~𝑁(0, 𝜎 2 )

Where the symbol ~ means ‘distributed as’ and N stands for the ‘normal distribution’.
The two terms inside the bracket represents the mean and variance respectively.

Since for two normally distributed variables, zero covariance means that the two
variables are independent of each other. So 𝑈𝑖 and 𝑈𝑠 are uncorrelated and independently
distributed. Therefore we can rewrite the above equation as

𝑈𝑖 ~𝑁𝐼𝐷 (0, 𝜎 2 )

Where NID stands for normally and independently distributed


b0

4. Why do we need normality assumption?

1. The disturbance term 𝑈𝑖 represents large number of explanatory variables that are
not introduced in the model but have influence on the dependent variable. We
wish that the influence of this neglected variables is small and at best random.
The central limit theorem states that if there are a large number of independent
and identically distributed random variables, the distribution of their sum will
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

tends towards a normal distribution as the number of such variables tends toward
infinity. The CLT thus provides theoretical justification for the assumption of
normality of 𝑈𝑖 .

2. A less restrictive version of the CLT states that even if the number of variables
are not very large or strictly independent, their sum may still be normally
distributed.

3. The probability distribution of OLS estimators can be easily derived with the
normality assumption of disturbance term 𝑈𝑖 because any linear function of
normally distributed variables is itself normally distributed. So since OLS
estimators 𝑏𝑜 and 𝑏1 are linear functions of 𝑈𝑖 and 𝑈𝑖 is normally distributed; the
two OLS estimators 𝑏0 and 𝑏1 are also distributed normally.

4. The normal distribution involves only two parameters (mean and variance).So it is
comparatively a simple distribution. It is also well known and its properties well-
studied.

5. For small sample size of less than 100 observations the assumption of normal
distribution is important as it helps us to derive the exact probability distributions
of OLS estimators but also enables us to use the t, F and 𝜒 2 statistical tests for
regression models.

6. In large sample size, t and F test can still be applied even if we don’t assume that
the disturbance term is normally distributed.

5. Properties of OLS estimators under normality assumptions


The OLS estimators possessed certain properties under the normality assumption:

1. They are unbiased.

2. They have minimum variance among the class of linear estimators which combined
with the unbiasedness property means that they are efficient.

3. They are consistent estimators converge to their true population values.

4. The OLS estimators 𝑏0 ( being a linear function of 𝑈𝑖 ) is normally distributed with

Means: 𝐸(𝑏0)= 𝛽0

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

∑ 𝑥𝑖2
Var(𝑏0 ): 𝜎𝑏20 = 𝜎2
𝑛 ∑(𝑥𝑖 −𝑥̅ )2

Or in a more compact form, it can be written as

𝑏𝑜 ~𝑁(𝛽𝑜 , 𝜎𝑏20 )

𝑏0 −𝛽𝑜
This implies that the variable Z which is defined as Z = follows the standard normal
𝜎𝑏0
distribution with zero mean and unit (=1) variance or Z~N (0,1).

5. The OLS estimator 𝑏1 (being a linear function of 𝑈𝑖 ) is normally distributed with

Mean: 𝐸(𝑏1 ) = 𝛽1
𝜎2
𝑉𝑎𝑟(𝑏1): 𝜎𝑏21 = ∑(𝑥 −𝑥̅ )2
𝑖

Or in a more compact form it can be written as 𝑏1 ~ 𝑁(𝛽1 , 𝜎𝑏21 )

𝑏1−𝛽1
Then the variable Z which is defined as 𝑍 = 𝜎 𝑏1
also follows the standard normal
distribution

The probability distributions of 𝑏𝑜 and 𝑏1 are shown as in fig.1

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

6.(𝑛 − 2)(𝜎̂ 2 ⁄𝜎 2 ) is distributed as the 𝜒 2 (chi-square) distribution with (𝑛 − 2) 𝑑𝑓 i.e

(𝑁 − 2)𝑠 2
(𝑛 − 2)(𝜎̂ 2 ⁄𝜎 2 ) = ~ 𝜒𝑛−2
𝜎2

Where 𝜎̂ 2 is the estimated value and 𝜎 2 is the true value. This knowledge will helps us to
draw inferences about the true 𝜎 2 from the estimated 𝜎̂ 2 = 𝑠 2

7. The OLS estimators (𝑏0 ,𝑏1 ) are distributed independently of 𝜎̂ 2 .

8. The OLS estimators (𝑏0 ,𝑏1 ) under the assumption of normality have minimum
variance in the entire class of unbiased estimators whether linear or not. This result due to
Rao is more powerful than Gauss Markov Theorem which is restricted only to the class
of linear estimators.
Thus OLS estimator are not only BLUE but are best unbiased estimators (BLUE) as well;
they have minimum variance in the entire class of unbiased estimators.

Conclusion: The assumption of normal distribution of disturbance term 𝑈𝑖 enables us to


derive the sampling distributions of 𝑏0 and 𝑏1 (both normal) and 𝜎̂ 2 (related to the chi
square).This allows us to construct confidence intervals and conduct hypothesis tests.

When we assume that 𝑈𝑖 ~ N(0,𝜎 2 ) and 𝑌𝑖 being a linear function of 𝑈𝑖 ,we have that 𝑈𝑖 is
itself normally distributed with a mean and variance given by

𝐸(𝑌𝑖 )= 𝛽0+ 𝛽1 𝑋𝑖
𝑉𝑎𝑟(𝑦𝑖 ) = 𝜎 2

In other words 𝑌𝑖 ~𝑁(𝛽0 + 𝛽1 𝑋𝑖 , 𝜎 2 )

6. Students’ t-statistics

We have 𝑏1 ~ 𝑁(𝛽1 , 𝜎𝑏21 ) and 𝑏𝑜 ~𝑁(𝛽𝑜 , 𝜎𝑏20 )

We convert the normally distributed OLS estimators into 𝑍 variable which are distributed
as 𝑁(0,1)

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

𝑏0 − 𝛽0 𝑏0 − 𝛽0
𝑍𝑏𝑜 = =
𝜎𝑏0
∑ 𝑥𝑖 2
√ 𝜎2
𝑛 ∑(𝑥𝑖 − 𝑥̅ )2

𝑛 ∑(𝑥𝑖 − 𝑥̅ )2
(𝑏0 − 𝛽0 )√ ⁄∑ 𝑥 2
𝑖
𝑍𝑏𝑜 = ~ 𝑁(0,1)
𝜎

𝑏1 − 𝛽1 𝑏1 − 𝛽1
𝑍𝑏1 = =
𝜎𝑏1
𝜎2

∑(𝑥𝑖 − 𝑥̅ )2

(𝑏1 − 𝛽1 )√∑(𝑥𝑖 − 𝑥̅ )2
𝑍𝑏1 = ~ 𝑁(0,1)
𝜎

We substitute 𝑠 2 for 𝜎 2 . So we have

𝑏0 − 𝛽0 𝑏0 − 𝛽0
𝑇𝑏0 = =
𝑠𝑏0
∑ 𝑥𝑖 2
√ 𝑠2
𝑛 ∑(𝑥𝑖 − 𝑥̅ )2

𝑛 ∑(𝑥𝑖 − 𝑥̅ )2
(𝑏0 − 𝛽0 )√ ⁄∑ 𝑥 2
𝑖
𝑍𝑏𝑜 = ~ 𝑡𝑛−2
𝑠
𝑏1 − 𝛽1 𝑏1 − 𝛽1
𝑇𝑏1 = =
𝑠𝑏1
𝑠2

∑(𝑥𝑖 − 𝑥̅ )2

(𝑏1 − 𝛽1 )√∑(𝑥𝑖 − 𝑥̅ )2
𝑇𝑏1 = ~ 𝑡𝑛−2
𝑠

The degrees of freedom here is 𝑛 − 2 because we are dealing with simple linear
regression model. We get 𝑡 – distribution because the numerator is a standard normal
∑ 𝑒𝑖2
variable and the denominator is a chi-square distribution. Recall that 𝑠 2 = 𝑛−2
is a sum

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

of 𝑛 − 2 independent chi-square variables and so 𝑠 2 itself


is distributed as chi-squared with 𝑛 − 2 degrees of freedom.

7. Few important probability distribution


We look at few important probability distribution which will be used extensively for
statistical inferences of OLS estimates.

(a)Normal Distribution

Normal distribution is the most popular probability distribution with bell-shaped figure.
A normally distributed random variable X has the following Probability Density
Function:

1 (𝑥 − 𝜇)2
𝑓(𝑥) = exp ( ) −∞<𝑥 <∞
𝜎√2𝜋 2𝜎 2

Where 𝜇 and 𝜎 2 are the parameters of the distribution representing mean and variance
respectively. The normal distribution is shown in figure 2.

Properties of normal distribution:

1. The distribution is symmetrical around its mean value


2. The area under the normal curve that lies between the values of 𝜇 ± 𝜎 is
approximately 68%, between 𝜇 ± 2𝜎 is 95% and between 𝜇 ± 3𝜎 is about
99.7%.
3. We can obtained a standardized normal variable 𝑍 from a given normally
distributed variable 𝑋 with mean 𝜇 and variance 𝜎 2 as follows:
𝑥−𝜇
𝑍=
𝜎
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

The standardized normal variable 𝑍 has zero mean and


unit variance. The Probability Distribution Function of the standardized normal variable
is as follows:

1 𝑍2
𝑓(𝑍) = exp ( )
√2𝜋 2

We usually denote a normally distributed variable as


𝑋 ~ 𝑁(𝜇, 𝜎 2 )
Where ~ means ‘distributed as’, 𝑁 stands for normal distribution and 𝜇 𝑎𝑛𝑑 𝜎 2 are
the mean and variance of the distribution. So for standardized normal variable 𝑍 we have
𝑍 ~ 𝑁(0,1)
meaning that 𝑍 is normally distributed with zero mean and unit variance.

4. A linear combination of normally distributed variables is itself normally


distributed.

Let 𝑋1 ~ 𝑁(𝜇1 , 𝜎12 ) and 𝑋2 ~ 𝑁(𝜇2 , 𝜎22 ) be two independent variables. The Linear
Combination of both the variables can be written as follows:
𝑌 = 𝑎𝑋1 + 𝑏𝑋2 Where 𝑎 and 𝑏 are constants.
We can then show that

𝑌 ~ 𝑁[(𝑎𝜇1 + 𝑏𝜇2 ), (𝑎2 𝜎12 + 𝑏 2 𝜎22 )]

5. Central Limit Theorem:

Suppose 𝑥1 , 𝑥2 … … 𝑥𝑛 represent random variables which are independent of each other


and all of them have the same probability distribution function with 𝐸(𝑥𝑖 ) = 𝜇 and
𝑣𝑎𝑟(𝑥𝑖 ) = 𝜎 2 . Then the sample mean tends to normal distribution with mean 𝜇 and
variance 𝜎 2 as the number of observations increases to infinity. In terms of symbol it can
be written as

𝜎2
𝑥̅ ~ 𝑁 (𝜇, ) 𝑎𝑠 𝑛 → ∞
𝑛

Consider a random variable

𝑥̅ − 𝜇 √𝑛(𝑥̅ − 𝜇)
𝑧= =
𝜎 ⁄ √𝑛 𝜎
The random variable will tend towards standard normal. So

𝑧 ~ 𝑁(0,1)

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

6. Test of Normality

A normal distribution will have Skewness =0 and Kurtosis =3. In other words, a normal
distribution is symmetric and mesokurtic. So to find out if a distribution departs from
normal distribution we need to check if computed values of skewness and kurtosis are
different from 0 and 3 respectively. The formal test can be done by the Jarque-Bera test
of normality which is as follows:

𝑆 2 (𝐾 − 3)2
𝐽𝐵 = 𝑛 [ + ]
6 24

Where S means Skewness and K means Kurtosis. Under the null hypothesis, JB is
distributed as a Chi-square statistic with 2 df.

7. The mean and variance of a random variable which follows normal distribution
are independent of each other

Example 1

Suppose that 𝑋 follows a normal distribution with mean of 6 and variance of 9. i.e
𝑋~𝑁(6,9)
Find the probability that 𝑋 will take a value between 𝑋1 = 3 and 𝑋2 = 9.

Solution:
We first calculate the value that 𝑍 takes which is as follows:
𝑋1 − 𝜇 3−6
𝑍1 = = = −1
𝜎 3
𝑋2 − 𝜇 9−6
𝑍2 = = = 1
𝜎 3
We use table under normal distribution to find the probability that the two values of 𝑍
will take. So 𝑃𝑟(0 ≤ 𝑍 ≤ 1) = 0.3413. By using the notion of symmetry we
have 𝑃𝑟(−2 ≤ 𝑍 ≤ 0) = 0.3413. Therefore, the probability that 𝑋 will lie between 𝑋1 =
3 and 𝑋2 = 9 is 0.3413 + 0.3413 =0.6826

b. The Chi-Square ( 𝝌𝟐 ) Distribution

If 𝑍1 , 𝑍2 … … … … … … , 𝑍𝑘 represent an independent standardized normal variable (i.e,


𝑍 ~ 𝑁(0,1)) then the quantity 𝑉 = 𝑍12 + 𝑍22 + 𝑍32 + … … … . . … .. +𝑍𝑘2 is said to
2
possess the Chi-square distribution (𝜒(𝑘) ) with 𝑘 degrees of freedom. Here degree of

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

freedom means the number of independent variables that


are summed up to make the quantity 𝑉. The Chi-Square distribution is shown in figure 3.

Properties of Chi-Square distribution

1. The Chi-Squared distribution has a skewed distribution and the degree of the
skewness depends on the degree of freedom. When degrees of freedom are less
the distribution is highly skewed to the right. The distribution becomes more
symmetrical as the degrees of freedom increases and when the degree of freedom
exceeds 100 the variable
√2𝜒 2 − √(2𝑘 − 1)
becomes a standardized normal variable where 𝑘 is the degree of freedom.
2. The Chi-Squared distribution has a mean 𝑘 and variance 2𝑘 where 𝑘 is the
degrees of freedom.
3. For two independent Chi-Squared variables 𝑍1 and 𝑍2 with 𝑘1 and 𝑘2 degrees of
freedom, their sum 𝑍1 + 𝑍2 is also a Chi-Square variable with degrees of freedom
equal to 𝑘1 + 𝑘2

Example 2
Find the probability of getting a Chi-square value of 32 or greater, given that the degree
of freedom is 14.

Solution:
Using the table from Chi-Square distribution we find that the probability of getting a Chi-
Square value of 31.3193 or greater for a degree of freedom of 14 is 0.005. So, the
probability of getting a chi-square value of 32 or greater is less than 0.005 which is a
small probability indeed.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

(c) Student’s t Distribution

Suppose 𝑍1 is a standardized normal variable (i.e 𝑍1 ~ 𝑁(0,1)) and 𝑍2 is another variable


which follows Chi-Squared distribution with 𝑘 degrees of freedom. If both 𝑍1 𝑎𝑛𝑑 𝑍2 are
independent of each other, then the variable

𝑍1 𝑍1 √𝑘
𝑡= =
√(𝑍2 ⁄𝑘) √𝑍2
follows Student’s t distribution with 𝑘 degrees of freedom. It is usually denoted by 𝑡𝑘
where 𝑘 denotes the degrees of freedom. The t-distribution is shown in figure 4

Properties of t-distribution

1. The t-distribution is symmetrical and is similar to normal distribution. However it


is flatter than normal distribution but as the degrees of freedom increases normal
distribution approximate t-distribution.
𝑘
2. The t-distribution has a mean of zero and variance of (𝑘−2)

Example 3

When the degrees of freedom is 21, what is the probability of getting a 𝑡 value of (a)
about 2 or greater (b) about -2 or smaller (c) of |𝑡| of about 2 or greater where |𝑡| denotes
the absolute value.

Solution:
(a) The probability of getting a 𝑡 value of about 2 or greater when degrees of freedom
is 21 is about 0.025 (See table for 𝑡 distribution)
(b) Since 𝑡 distribution is symmetric, The probability of getting a 𝑡 value of about -2
or smaller when degrees of freedom is 21 is about 0.025

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

(c) The probability of getting |𝑡| value of about 2


when the degrees of freedom is 21 is about 0.05 (=0.025*2)

(d) The F-Distribution


Suppose we have two independently distributed Chi-Squared variables namely 𝑍1 and 𝑍2
whose degrees of freedom are 𝑘1 and 𝑘2 respectively, then the variable

𝑍1 ⁄𝑘1
𝐹=
𝑍2 ⁄𝑘2

follows (Fisher’s) 𝐹 distribution with 𝑘1 and 𝑘2 degrees of freedom. It is usually denoted


by 𝐹𝑘1 ,𝑘2 where 𝑘1 denotes degrees of freedom of numerator and 𝑘2 denotes degrees of
freedom of denominator. The figure for F distribution is shown in figure 5

Properties of F distribution

1. The 𝐹 distribution is similar to Chi-squared distribution and is skewed to the


right. The 𝐹 distribution approaches normal distribution as 𝑘1 and 𝑘2 gets larger.
𝑘2
2. The 𝐹 distribution has a mean of 𝑘 −2 𝑓𝑜𝑟 𝑘2 > 2 and variance of
2
2𝑘22 (𝑘1 + 𝑘2 − 2)
𝑓𝑜𝑟 𝑘2 > 4
𝑘1 (𝑘2 − 2)2 (𝑘2 − 4)

3. An 𝐹 distribution with 1 and 𝑘 degrees of freedom is equivalent to the square of 𝑡


distribution with 𝑘 degrees of freedom. In other words
𝑡𝑘2 = 𝐹1,𝑘

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

4. There exists a relationship between 𝐹 and Chi-


square distributions if the denominator degrees of freedom of 𝐹 distribution (𝑘2 )
is large enough. i.e

𝑘1 𝐹 ~ 𝜒𝑘21

Note that all the three distributions namely 𝑡, 𝐹 and Chi-square distributions are
related to normal distribution. They all tend to approach the normal distribution
when the degrees of freedom get larger and larger.

Example 4
1. Find the probability of getting a 𝐹 value of about 1.6 or greater when 𝑘1 = 10 and
𝑘2 = 8 (Answer: From 𝐹 table, the probability is 0.25)
2. Give a numerical example to show that 𝑘1 𝐹 ~ 𝜒𝑘21 when 𝑘2 is large.
Answer: Suppose 𝑘1 = 20 𝑎𝑛𝑑 𝑘2 = 200. The critical value of 𝐹 at 10% is 1.46. So
𝑘1 𝐹 = 20 ∗ 1.46 = 29.2. From the Chi-square table, the critical chi-square value at
2 (10%)
10% for degrees of freedom equal to 20 is given by 𝜒20 = 28.412

(e)Bernoulli distribution

A random variable X follows Bernoulli distribution if its probability density function is


as follows:

𝑃(𝑋 = 0) = 1 − 𝑝

𝑃(𝑋 = 1) = 𝑝

Where 𝑝, 0 ≤ 𝑝 ≤ 1, is the probability of success of a particular event. For such a


variable,
𝐸(𝑋) = [1 ∗ 𝑝(𝑋 = 1) + 0 ∗ 𝑝(𝑋 = 0)] = 𝑝)

𝑉𝑎𝑟(𝑋) = 𝑝𝑞

Where 𝑞 = (1 − 𝑝) is the probability of failure.

(f) The binomial distribution

Let 𝑛 represent the number of independent trails each of which results in a ‘success’ with
probability 𝑝 and a failure with a probability𝑞 = (1 − 𝑝). The number of successes in
‘n’ number of trails is represented by 𝑋. Then 𝑋 follows binomial distribution whose
Probability density function is

𝑛
𝑓(𝑋) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
𝑥
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

Where 𝑥 is the number of successes in ‘n’ trails and where

𝑛 𝑛!
( )=
𝑥 𝑥! (𝑛 − 𝑥)!

Where 𝑛! (reads as 𝑛 factorial) equals to 𝑛(𝑛 − 1)(𝑛 − 2) … … … 1

The binomial distribution has two parameters namely 𝑛 𝑎𝑛𝑑 𝑝. In binomial distribution
we have

𝐸(𝑋) = 𝑛𝑝

𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞

Example 5

A new treatment for swine flu has 20 percent probability of complete cure. On trial basis,
35 patients were given a treatment. What is the probability atleast 10 patient will be cured
completely?

Solution.
Let 𝑋 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑖𝑛 35 𝑡𝑟𝑖𝑎𝑙𝑠. We need to find 𝑃(𝑋 > 10). From the
binomial distribution table we find that 𝑃(𝑋 > 10) = 0.1457

(g) The Poisson distribution

A random variable 𝑋 follows a Poisson distribution if its probability distribution function


is
𝑒 −𝜆 𝜆𝑥
𝑓(𝑋) = 𝑓𝑜𝑟 𝑥 = 0,1,2 … … … … . , 𝜆 > 0
𝑥!
There are only one parameter 𝜆 in the Poisson distribution. One important character of
the Poission distribution is that the expected value of 𝜆 is equal to its variance
𝐸(𝑋) = 𝑣𝑎𝑟(𝑋) = 𝜆
The Poisson model is use in modelling infrequent or rare phenomena such as number of
accidents during festival seasons, number of telephone calls during next 10 minutes, the
number of birth delivery in a maternity ward during Sunday.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

8. Summary
1. We study the classical normal linear regression model (CNLRM) in this chapter.

2. The CNLRM is different from the classical linear regression model(CLRM).Under the
CNLRM,the disturbance term 𝑈𝑖 entering the regression model is assumed to be normally
distributed.

3. The CNLRM does not require any assumption about the probability distribution of
𝑈𝑖 .It only requires that the mean value of 𝑈𝑖 is zero and its variance is a finite constant.

4. The central limit theorem provides the theoretical justification for the normality
assumption. According to Central limit theorem, if 𝑥1 𝑥2 … … … . , 𝑥𝑛 be a random sample
and 𝐸(𝑥𝑖 ) = 𝜇 and 𝑣𝑎𝑟(𝑥𝑖 ) = 𝜎 2 , then the distribution of random variable 𝑍𝑛 =
√𝑛 (𝑥 − 𝜇)⁄𝜎 converges to standard normal 𝑁(0,1) as 𝑛 increases to infinity.

5. When we assume that disturbance term is normally distributed, the OLS estimators are
best unbiased estimators (BUE).

6. The OLS estimators under the normality assumption follow well-known probability
distributions. The OLS estimators of the intercept and slope are themselves normally
distributed and the OLS estimator of the variance of 𝑈𝑡 is related to the chi-square
distribution.

7. The normal distribution is the most popular probability distribution and has a bell-
shaped picture. Other important distributions include Chi-square distribution, Student’s t
distribution, F-distribution, the Bernoulli distribution, binomial distribution and Poisson
distribution.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 5, PROBABILITY DISTRIBUTION OF LEAST SQUARE
ESTIMATES
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8 , Fundamentals of Econometrics

Module No and Title 6, Hypothesis Testing: Test of significance approach

Module Tag
BSE_P8_M6

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcomes
2. Introduction
3. What does this hypothesis mean and imply?
4. One sided tests
5. Summary

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

1. Learning Outcomes
After reading this module, the learning outcomes are such that the students will be able to:

 Understand the properties of the hypothesis testing


 Identify its implications
 Assess the fitted econometric models

2. Introduction

Hypothesis Testing
Fitting the regression line is only the first and a very small step in econometric analysis. In
applied economics, we are generally interested in testing the hypothesis about some hypothesized
value of the population parameter. Let’s say we have a random sample 𝑥1 , 𝑥2 , … … , 𝑥𝑛 of a
random variable X with PDF 𝑓(𝑥; 𝜇 ). Regression analysis helps us to obtain the estimate of 𝜇,
say 𝜇̂ . Hypothesis testing implies deciding whether 𝜇̂ is compatible with some hypothesized
value𝜇0 . To set up a hypothesis test, we formally state the hypothesis as:
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
Where 𝐻0 is called the null hypothesis and 𝐻1 is called the alternate hypothesis.

3. What does this hypothesis mean and imply ?

.
Let us start with a random variable X i.e. 𝑋~𝑁(𝜇, 𝜎 2 ) Suppose we define our null hypothesis to
be
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
To proceed, first step is to draw a sample from X and calculate its mean, 𝑋̅. Value of 𝑋̅ obtained

from repeated sample will be normally distributed with mean 𝜇0 and variance 𝜎 2 ⁄𝑛 Null .
hypothesis being true resonates with the above explanation. Such a distribution with the null
being true is shown below:

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

If 𝜇̂ is a good estimator of 𝜇, then it will take a value close to 𝜇 i.e. 𝜇̂ -𝜇 will be small. If the null
hypothesis is true, then 𝜇̂ − 𝜇0 = (𝜇̂ − 𝜇)+ (𝜇 − 𝜇0 ) , should be small as 𝜇̂ -𝜇 is small and the
second term is zero. If the alternate hypothesis is true, then 𝜇̂ − 𝜇0 = (𝜇̂ − 𝜇)+ (𝜇 − 𝜇0 ) should
be large as 𝜇̂ -𝜇and𝜇 − 𝜇0 ≠ 0. Small or large depends upon the distribution of the estimator.

In the model set above, we don’t expect 𝑋̅ to be exactly equal to 𝜇0 . There is no reason to deny
the possibility but the chances are rare. If the mean is far off from𝜇0 , there are two possibilities;
either reject the null or do not reject the null. Either way, the decision will contain some element
of error i.e. rejecting a true null hypothesis (Type 1) or accepting a false null hypothesis (Type 2).
There is no fool proof way of deciding.

If the probability of the mean, which lies far off from 𝜇0 , is less than the level of significance,
we can conclude to reject the null hypothesis. For e.g. if the probability is less than 5% i.e. lies in
the upper and lower 2.5% tails (as shown in the figure). Thus, the probability of the mean being
1.96 standard deviations away is 5%.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

Figure 2: Decision Rule

Source: Dougherty

The figure suggests that the null would be rejected if 𝑋̅ lies in the shaded area i.e. if

̅̅̅ or,
𝑋̅ > 𝜇𝑜 + 1.96 𝑠𝑡𝑑 𝑑𝑒𝑣(𝑋)
̅̅̅
𝑋̅ < 𝜇𝑜 − 1.96 𝑠𝑡𝑑 𝑑𝑒𝑣(𝑋)

A simple rearrangement would lead us to,

𝑋̅−𝜇0
𝑠𝑑(𝑋̅)
> 1.96 or,

𝑋̅ − 𝜇0
< −1.96
𝑠𝑑(𝑋̅)
Admittedly, we can term the LHS as z statistic. Therefore, we reject the null hypothesis when
|𝑧|>1.96.
As mentioned before, the decision rule established above is not flawless. Let us go back to our
initial condition of 𝐻𝑜 being true. There is 5% probability that 𝑋̅ will lie far away from 𝜇𝑜 in the
rejection region. It also means that there is 5% probability of a type 1 error i.e. probability of
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

rejecting a true null hypothesis is 5%1. The researcher has the liberty to settle the risk of type 1
error and accordingly look up the critical value.
To sum up, there are three possible outcomes:

• Correct decision
• Rejecting a true hypothesis– Type I error.
• Accepting a false hypothesis– Type II error.

To be able to perform the test, we need an additional knowledge of the sampling distributions of
the estimators. It will be impossible to perform hypothesis testing without this knowledge,
prerequisite of which is the assumption that the error term is normally distributed.
Given the following model,
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + 𝑈𝑖

It simply means that 𝑈𝑖 follows normal distribution with mean zero and variance 𝜎 2 i.e. N (0,𝜎 2 ).
The principle reason behind this assumption lies in the Central Limit Theorem (CLT)2.

Referring back to the basics, error term is defined as all those factors that affect Y but not
included in the model due to non-availability of data, omission etc. Supposing these factors are
random, then U represents the sum of random variables. Hence, by applying CLT we can,
undoubtedly, impose normality assumption on the error term. Thus, 𝑈~𝑁(0, 𝜎 2 ).

From this, we derive the probability distributions of the estimators of𝛼 and𝛽. The estimators are
in fact linear functions of U3. Applying one of the properties of normal distribution that any linear
function of a normally distributed variable is also normally distributed, it can be deduced that,
𝛼̂~𝑁(𝛼, 𝜎𝛼̂ 2)
𝛽̂ ~𝑁(𝛽, 𝜎𝛽̂ 2 )

With this background, we now set the foundation for hypothesis testing. Suppose we wish to test
something about 𝛽, as already mentioned, it follows normal distribution. By standardizing it , we
get,
𝛽̂ − 𝛽
𝑍= ~𝑁(0,1)
𝑠𝑒(𝛽)

Since 𝜎 2 is not known, replacing the population estimator by its sample estimator, we get a new
variable,

1
Similar claim can be made for 1% level of significance. If a coefficient is significant at 1%, it will be
significant at 5% as well. However, vice-versa may not be true.
2
Gujarati and Porter(2010) define CLT in the following way, “If there is a large number of independent
and identically distributed random variables, then, with few exceptions, the distribution of their sum
tends to be a normal distribution as the number of such variables increases indefinitely”.
3
See Know more section 1
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

𝛽̂ − 𝛽
~𝑡
̂ 𝑛−2
𝑠𝑒(𝛽)

Hence, we use t distribution to test the null hypothesis4. The calculated t has different sampling
distribution under the null and under the alternate hypothesis, being higher under the latter. A
higher t would be consistent with the alternate hypothesis. Thus, a higher t would mean rejection
of the null hypothesis.

The following examples intend to illustrate the theory above.

Example 1: Suppose we have data on income and expenditure for 10 households.

Table 1: Income (𝒙𝒕 ) and Expenditure (𝒚𝒕 )

∑(𝒙𝒕 −𝒙
̅)( 𝒚𝒕 −𝒚
̅)
𝛽̂ = ∑(𝒙𝒕 −𝒙̅) 2
= 0.45;

̂𝛼 = 𝑦̅ − 𝛽̂ 𝑥̅ = 35.72

∑ 𝑒̂𝑡 2
𝜎̂ 2 = = 98.98;
𝑛−2

̂
𝜎 2
𝑣𝑎𝑟(𝛽̂ ) = ∑(𝒙 ̅) 2 =0.0030
𝒕 −𝒙

4
Difference between 𝛽̂ and 𝛽 small implies t value to be small. If 𝛽̂ = 𝛽, t will be, zero implying null
hypothesis will not be rejected. Thus, as the difference between 𝛽̂ and 𝛽 increases, likelihood of rejecting
the null hypothesis increases.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

After running through the necessary calculations, the next is to test if income and expenditure are
related i.e. null hypothesis is as follows,

𝐻0 : 𝛽 = 0
𝐻𝑎 : 𝛽 ≠ 0
̂ −𝛽
𝛽 ̂ −0
𝛽
Test statistic, t = 𝑠𝑒(𝛽)
̂ = 0.054 = 8.30

If the level of significance is 5%, next step is to look out for critical value at that level from the t
table i.e.
𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑎𝑡 8 = 2.306. The absolute value of t exceeds the critical value, which implies that we
can reject the null hypothesis i.e. 𝛽 is significantly different from zero. In this case, the test is
statistically significant; hence, we can reject the null. It simply means that the probability of the
difference between 𝛽̂ and zero is due to mere chance is less than the level of significance and thus
can easily reject the null hypothesis. On contrary, if the test is statistically insignificant implies
that the probability of the difference between the estimator and the hypothesized value is more
than the level of significance, thus we cannot reject the null (Gujarati and Porter, 2010).

Let us look at another example5 where in hourly earnings is a function of years of schooling.

Example 2:

In a similar manner, let us test if the level of schooling affects earnings. Hence, the null
hypothesis is set as
𝐻0 : 𝛽 = 0
𝐻𝑎 : 𝛽 ≠ 0

5
Example taken from Christopher Dougherty, Introduction to Econometrics, 4th Edition.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

The intention is to reject the null hypothesis as only then a relation between earnings and
schooling can be established. T statistic can be easily calculated by looking at the regression
output6,

2.45
𝑡 = 0.23 = 10.65
𝑡𝑐𝑎𝑙 > 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 (1.96)

The result decides in favour of rejecting the null hypothesis and concluding that earnings are
significantly affected by the level of schooling. The column next to t stat in the regression output
above is also a useful to test the significance of the coefficients. This P value or the probability
value is the exact probability of committing a Type 1 error, under the null. The level of
significance7 is the largest probability of making a Type 1 error. Whereas, p value is the smallest
probability, given t-statistic. If this p value happens to be smaller than the level of significance,
because the probability of making the type 1 error is very small, we can safely reject the null
hypothesis8. The method has an edge over the earlier as it allows us to know the exact probability
of making a type 1 error.

In our example, the p value for the schooling coefficient is 0 i.e. the exact probability of making a
type 1 error here is zero and thus the coefficient will be significant at all levels.

Example 3: Suppose in our example 1, we want to test whether marginal propensity to consume
is equal to 1. Accordingly, hypothesis is set as,

𝐻0 : 𝛽 = 1
𝐻𝑎 : 𝛽 < 1
̂ −1
𝛽 0.455
𝑡= = = -9.96
̂)
√𝑣𝑎𝑟(𝛽 √0.0030

𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑎𝑡 8 = −1.860

Thus, 𝑡𝑐𝑎𝑙 > 𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 , so we reject the null in favour of the alternate hypothesis.

4. One – sided tests


The above was an example of a one sided test. The decision to perform depends upon the
question that we wish to answer. We shall see that one sided tests reduce the risk of type 1 error.
One way of reducing the risk (in two-sided test) is to perform the test at 1% significance level a

6
Also can be directly seen from t stat column.
7
It is the area right to the critical value.
8
Please refer to the know more section 2 for further understanding of Type 1 and Type 2 errors.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

an alternative to a 5%. The alternate hypothesis was set as 𝜇 ≠ 𝜇0 i.e. must be equal to some 𝜇1 .
For simplicity, let 𝜇1 > 𝜇0 .
Therefore,
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 = 𝜇1

At 5% level of significance, if 𝑋̅ lies in the upper or lower 2.5% tail, we reject the null. Given the
assumption, 𝑋̅ should lie in the upper tail. This would also be compatible with the alternate
hypothesis, if it is true. on the contrary, if it lies in the lower tail rejection region, the test would
suggest to reject the null, although probability of 𝑋̅ lying there should be zero (given the
assumption). Thus, in such a case it would be logical to accept the null i.e. reject 𝐻0 only when it
lies on the upper tail as shown in the figure 3 below:

We could also perform a 5% test by just increasing the rejection region to that extent. Note that
alternate could have another possibility where, 𝜇 > 𝜇0 or 𝜇 < 𝜇0 . These are clearly one-sided
tests and it would be appropriate to consider the right and left rejection region respectively for
any conclusion. Thence, the principle reason for applying one-sided tests should solely depend on
the relevant question-theory and economic sense, like in example 3.

The procedure for a two-sided test is as follows:

1. Set the hypothesis; 𝐻0 : 𝜇 = 𝜇𝑜 , 𝐻1 : 𝜇 ≠ 𝜇0


2. Calculate the test statistic, 𝑡~𝑡𝑛−2 .
3. Given the level of significance, ‘a’, find the critical value under a/2 and across the
degrees of freedom. Call it 𝑡 ∗ .

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

4. If 𝑡 > 𝑡 ∗ , reject the null hypothesis. This implies that 𝜇 is significantly different from 𝜇0 .
5. The null would also be rejected if P value <a.

The procedure for one-sided test is as follows:


1. Set the hypothesis; 𝐻0 : 𝜇 = 𝜇𝑜 , 𝐻1 : 𝜇 > 𝜇0
2. Calculate the test statistic, 𝑡~𝑡𝑛−2 .
3. Given the level of significance, ‘a’, find the critical value under ‘a’ and across the
degrees of freedom. Call it 𝑡 ∗ .
4. If 𝑡 > 𝑡 ∗ , reject the null hypothesis in favour of the alternative and conclude that 𝜇 is
significantly greater than 𝜇0 .
5. If the above alternative was set as 𝐻1 : 𝜇 < 𝜇0 , then we would reject the null if 𝑡 < −𝑡 ∗ .

Let us conclude this module by looking at an example9 using both two-sided and one-sided tests.

Example 4: In the given model with total number of observation as 20,


𝑝̂ = 2 + 0.90𝑤

The standard error for 𝛼 = 0.10 and𝛽 = 0.05. Suppose we want to see if the price inflation and
wage inflation rate is the same at one. Thus, we set the hypothesis as, 𝐻0 : 𝛽 = 1 and 𝐻1 : 𝛽 ≠ 1.
This clearly is a two-sided test. Calculating t statistic as,

0.90 − 1
𝑡= = −2
0.05

The critical value at 5% level of significance with 18 degrees of freedom, 𝑡𝑐 = 2.1. Therefore,
|𝑡| < 𝑡𝑐 , according to the rule, we do not reject the null hypothesis. The estimated parameter has
a coefficient less than the hypothesized value but according to our test conclusion, the difference
is not significant. A two-sided test does not reject the null. Nevertheless, there could be a
possibility where the rate of price inflation is less than the rate of wage inflation. One plausible
reason for this to happen could be increase in productivity. Certainly, it will lead to the other way
round i.e. make the rate of price inflation more than the rate of wage inflation.

To reiterate the difference between the two tests, let us perform the one-sided test on the above
example. Our new hypothesis becomes, 𝐻0 : 𝛽 = 1 and 𝐻1 : 𝛽 < 1. With no other changes in the
estimated model, t statistic remains at -2. The critical value at 5% level of significance with 18
degrees of freedom now becomes, 𝑡𝑐 = 1.73. As |𝑡| > 𝑡𝑐 , we can reject the null hypothesis and
conclude that coefficient of wage inflation is less than one i.e. rate of price inflation is less than
rate of wage inflation. Therefore, the above result can influence us to say that one-sided test has
an edge over two-sided. However, one should refrain from making such statements as even
though the possibility of 𝛽 > 1 can be excluded, the possibility of 𝛽 = 1 cannot be excluded.
The statistical significance of a parameter depends fully on the t statistic, and the economic
significance depends entirely on the magnitude of the regression coefficient and its sign. While
concluding about the estimated model, one should be careful not to emphasize too much on
99
The example has been extracted from Christopher Dougherty, Introduction to Econometrics, 4 th edition,
chap 2, pg 135.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

either. A variable can be economically significant but not statistically. There could also be a case
where a variable could be statistically significant but not much relevant to the estimated model.

Researcher has to be careful in finding a middle ground and at last, presenting her concluding
remarks

5. Summary
 Hypothesis testing is the vital part of econometric analysis.
 The tests explained in this module are associated with a simple linear regression model
and form the foundation for the more general multiple linear regression model.
 Significant result does not mean important, it simply emphasizes on statistical
significance.
 Apart from having a sound knowledge of the terminologies like type 1 error, type 2 error,
one-sided, two-sided tests etc. , it is also important to know how to present the results.

6. Appendix
1. 𝛽 can be decomposed into two components: random and non-random i.e it can be
expressed as a linear function of U. Following is the mathematical proof:

∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)(𝑌𝑖 − 𝑌̅)


𝛽̂ =
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2

The numerator can be written as,


𝑛 𝑛

∑(𝑋𝑖 − 𝑋̅)(𝑌𝑖 − 𝑌̅) = ∑(𝑋𝑖 − 𝑋̅ )([𝛼 + 𝛽𝑋𝑖 + 𝑢𝑖 ] − [𝛼 + 𝛽𝑋̅ + 𝑢̅]


𝑖=1 𝑖=1
𝑛

= ∑(𝑋𝑖 − 𝑋̅)(𝛽[𝑋𝑖 − 𝑋̅] + [𝑢𝑖 − 𝑢̅])


𝑖=1
𝑛 𝑛

= 𝛽 ∑(𝑋𝑖 − 𝑋̅)2 + ∑(𝑋𝑖 − 𝑋̅)(𝑢𝑖 − 𝑢̅)


𝑖=1 𝑖=1

𝛽 ∑𝑛 ̅ 2 𝑛 ̅
𝑖=1(𝑋𝑖 −𝑋) +∑𝑖=1(𝑋𝑖 −𝑋)(𝑢𝑖 −𝑢̅)
Therefore, 𝛽̂ = 𝑛 ̅
∑𝑖=1(𝑋𝑖 −𝑋) 2

∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)(𝑢𝑖 − 𝑢̅)


= 𝛽+
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2

Further, ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)(𝑢𝑖 − 𝑢̅) = ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)𝑢𝑖 − ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)𝑢̅


Taking summation inside the brackets,

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

𝑛 𝑛

= ∑(𝑋𝑖 − 𝑋̅)𝑢𝑖 − 𝑢̅(∑ 𝑋𝑖 − 𝑛𝑋̅)


𝑖=1 𝑖=1
𝑛 𝑛

= ∑(𝑋𝑖 − 𝑋̅)𝑢𝑖 − 0 = ∑(𝑋𝑖 − 𝑋̅)𝑢𝑖


𝑖=1 𝑖=1
Now,
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)𝑢𝑖
𝛽̂ = 𝛽 +
∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
(𝑋𝑖 −𝑋̅)
= 𝛽 + ∑𝑛𝑖=1 𝑎𝑖 𝑢𝑖 , where 𝑎𝑖 = 𝑛
∑𝑖=1(𝑋𝑖 −𝑋̅)2
Hence, proved.
 Type 1 and Type 2 errors
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 = 𝜇1

 If we test at 5% level of significance (two-sided test), the risk of type 1 error is 5%. Let
us say that the null is false and the alternate hypothesis is true. In figure 2 above, if 𝑋̅ lie
in the acceptance region; we do not reject the null. Hence, we commit type 2 error i.e.
accept a false null hypothesis. The total probability of committing a type 2 error is
marked below:

Figure 4: Type 2 error

 Ideally, the power of test should be high. Power of test is defined as the probability of
rejecting the null hypothesis when it is false. It is also defined as 1-Probability of type 2
error. Now, instead of 5%, what if we apply a 1% test. As discussed before, reducing the
level of significance is equivalent to reducing the probability of type 1 error.
Consequently, the rejection region would shrink.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

 As we can see from figure 5, reducing the probability of type 1 error increases the
probability of type 2 error. Thus, a trade off exists, which clearly tells that decision
cannot be fool proof.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 6, HYPOTHESIS TESTING: TEST OF SIGNIFICANCE
APPROACH
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8 , Fundamentals of Econometrics

Module No and Title 7, Multiple Linear Regression Model

Module Tag
BSE_P8_M7

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcomes
2. Introduction
3. Estimation of Regression Coefficient
4. Interpretation of Estimated Parameters
5. Classical Linear Assumptions Revisited
6. Variance of Estimators
7. Goodness of fit
8. Summary
9. Appendix

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

1. Learning Outcomes
After reading this module, the students will be able to:

 Understand the concept of multiple linear regression model and differentiate it from the
simple linear regression model
 Estimate the parameters using ordinary least squares with a different formula
 Interpret the regression coefficients
 Understand the difference between unadjusted and adjusted 𝑅 2

2. Introduction
Until now, we have been dealing with the simple linear regression where, the dependent
variable(Y) relates to only one explanatory variable(X). However, it is seldom that the variations
in the variable Y can be explained only by a single explanatory variable. Certainly, there would
be other variables affecting Y. In the earnings-schooling example from module 8, we saw that
level of schooling positively affects the hourly earnings. Nevertheless, there are variable like
parental education, work experience, age etc. that also affects earnings. Multiple regression model
without doubt is preferred to simple regression model. However, the analysis has to be carried out
carefully as there are dangers of including an irrelevant variable and excluding a relevant
variable. Any of the two would lead to specification bias(the topicis dealt later in the course!).
Therefore, A model where a dependent variable depends on several independent variables is
known as multiple regression model i.e. a model with k+11 parameters is as below where, i is the
observation.

Slope parameters

Intercept

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + ⋯ … … . +𝛽𝑘 𝑋𝑖𝑘 + 𝑈𝑖 …………………(1)

Dependent Variable Explanatory variable Error term

Certainly, there is no upper limit on the number of independent variables (although, there is a cost
as mentioned above). There is a lower limit though i.e. 𝑛 ≥ 𝑘, n must at least be equal to k. It is
now time to derive he regression coefficients. We shall deal with its interpretation afterwards.

1
So, a total of k+1 parameters; k is the number of regression coefficients plus the intercept.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

3. Estimation of Regression Coefficient


For simplicity, let us take the case of two explanatory variables,

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + 𝑈𝑖 ……………………(2)

The estimated model,

𝑌̂𝑖 = 𝛽̂1 + 𝛽̂2 𝑋𝑖2 + 𝛽̂3 𝑋𝑖3…………………………....(3)

As in the simple regression model, the next step is to find the best linear estimators. Clearly, there
are three unknown parameters, 𝛽1 , 𝛽2 𝑎𝑛𝑑 𝛽3 . Applying OLS, the procedure begins with
minimizing the residual sum of squares,

𝑒𝑖 = 𝑌𝑖 − 𝑌̂𝑖 …………………………………….(4)

𝑅𝑆𝑆 = ∑𝑛𝑖=1 𝑒 2 𝑖 = ∑𝑛𝑖=1(𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋𝑖2 − 𝛽̂3 𝑋𝑖3 )2 ………………………(5)

FOC,

𝑑𝑅𝑆𝑆
𝑑𝛽̂1 = −2 ∑𝑛𝑖=1(𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋𝑖2 − 𝛽̂3 𝑋𝑖3 )=0............... .(6)

𝑑𝑅𝑆𝑆
̂2 = −2 ∑𝑛𝑖=1 𝑋𝑖2 (𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋𝑖2 − 𝛽̂3 𝑋𝑖3 ) = 0……(7)
𝑑𝛽

𝑑𝑅𝑆𝑆
𝑑𝛽̂3 = −2 ∑𝑛𝑖=1 𝑋𝑖3 (𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋𝑖2 − 𝛽̂3 𝑋𝑖3 ) = 0 … ….(8)

There are three normal equations and three unknowns, therefore, can easily derive the estimators.
Equation 6 can be simplified as,

∑𝑛𝑖=1(𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋𝑖2 − 𝛽̂3 𝑋𝑖3 ) = 0……………….……(9)

𝑛 𝑛 𝑛

∑ 𝑌𝑖 − 𝑛 𝛽̂1 − 𝛽̂2 ∑ 𝑋𝑖2 − 𝛽̂3 ∑ 𝑋𝑖3 = 0


𝑖=1 𝑖=1 𝑖=1

𝑛𝑌̅ − 𝑛𝛽̂1 − 𝛽̂2 𝑛𝑋̅2 − 𝛽̂3 𝑛𝑋̅3 =0

Dividing the equation throughout by n and rearranging,

𝑌̅ = 𝛽̂1 + 𝛽̂2 𝑋̅2 + 𝛽̂3 𝑋̅3

Therefore,
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

𝛽̂1 = 𝑌̅ − 𝛽̂2 𝑋̅2 − 𝛽̂3 𝑋̅3………………….……(10)

Equation 7 & 8 implies,

∑ 𝑋𝑖2 (𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋𝑖2 − 𝛽̂3 𝑋𝑖3 ) = 0………….(11)

∑ 𝑋𝑖3 (𝑌𝑖 − 𝛽̂1 − 𝛽̂2 𝑋𝑖2 − 𝛽̂3 𝑋𝑖3 ) = 0………….(12)

These are the two normal equations and solving this could be a tedious job. For simplicity, we
can transform them into deviations from mean and express them in lower case letters. Taking the
mean in equation 2 we get,

𝑌̅ = 𝛽1 + 𝛽2 𝑋̅2 + 𝛽3 𝑋̅3 + 𝑈
̅……………………(13)

Subtracting (13) from (2) will give us the model in deviation form,

𝑦𝑖 = 𝛽2 𝑥𝑖2 + 𝛽3 𝑥𝑖3 + 𝑒𝑖 ………………….……..(14)

Ignoring the i-subscript, (11)& (12) can be written as,

∑ 𝑥2 (𝑦 − 𝛽̂2 𝑥2 − 𝛽̂3 𝑥3 ) = 0………….………...(15)

∑ 𝑥3 (𝑦 − 𝛽̂2 𝑥2 − 𝛽̂3 𝑥3 ) = 0……………………(16)

The two normal equations are,

𝛽̂2 ∑ 𝑥 2 2 + 𝛽̂3 ∑ 𝑥2 𝑥3 = ∑ 𝑦𝑥2 ……………..….(17)

𝛽̂2 ∑ 𝑥2 𝑥3 + 𝛽̂3 ∑ 𝑥 2 3 = ∑ 𝑦𝑥3 ………………….(18)

Let 𝑠22 = ∑ 𝑥 2 2, 𝑠23 = ∑ 𝑥2 𝑥3 , 𝑠33 = ∑ 𝑥 2 3 , 𝑠𝑦2 = ∑ 𝑦𝑥2 and 𝑠𝑦3 = ∑ 𝑦𝑥3

𝛽̂2 𝑠22 + 𝛽̂3 𝑠23 = 𝑠𝑦2 …………………………….(19)

𝛽̂2 𝑠23 + 𝛽̂3 𝑠33 = 𝑠𝑦3 …………………………….(20)

Solution to the above two normal equations are,

̂
𝑠𝑦2 −𝛽3 𝑠23
𝛽̂2 = 𝑠
……………………..……………..(21)
22

Substituting this into equation 20,

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

̂3 𝑠23
𝑠𝑦2 −𝛽
[ 𝑠22
] 𝑠23+ 𝛽̂3 𝑠33 = 𝑠𝑦3

𝑠 2 𝑠 𝑠23
𝛽̂3 [𝑠33 − 𝑠23 ]= 𝑠𝑦3 − 𝑦2
𝑠
22 22

𝑠22 𝑠𝑦3 −𝑠𝑦2 𝑠23


𝛽̂3 = 𝑠 𝑠 −𝑠 2 ………………………………….(22)
22 33 23

Substituting this into equation 21, we get

𝑠 𝑠𝑦2 −𝑠𝑦3 𝑠23


𝛽̂2 = 33
𝑠 𝑠 −𝑠 2
…………………….……………(23)
22 33 23

In estimating the parameters, the procedure is the same as in the simple linear regression model.
However, the procedure for deriving the slope parameters is not as simple. It becomes
complicated in k variable case. In a k variable case,

𝛽̂1 = 𝑌̅ − 𝛽̂2 𝑋̅2 − 𝛽̂3 𝑋̅3 − ⋯ … … . . −𝛽̂𝑘 𝑋̅𝑘

The derivations for the slope parameters become very complex. The analysis is generally done
using matrix algebra2.

4. Interpretation of Estimated Parameters


In our two variable case, 𝛽1 is the intercept. It is the mean value of Y when the explanatory
variables are zero. As mentioned earlier, 𝛽2 and 𝛽3 are slope coefficients, likewise in simple
regression model. However, there is a slight change in the terminology here; they are now called
partial slope coefficients. 𝛽2 represents the average change in Y due to a unit change in variable
𝑋2 , keeping the value of 𝑋3 constant. Hence, the name partial; as we are interested in finding out
the effect of 𝑋2 on Y, and by separating the effect of 𝑋3 , one can find out the change in Y
attributed to the change in 𝑋2 only. Likewise for𝛽3 .Let us illustrate with an example below:

𝑌𝑖 = 10 + 6𝑋𝑖2 + 2𝑋𝑖3

𝛽2 here is 6. Suppose we hold 𝑋3 constant at 5. We get,

𝑌𝑖 = 10 + 6𝑋𝑖2 + 2(5)

= 10 + 6𝑋𝑖2 + 10

2
Please refer to know more section for matrix estimation of slope parameters.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

= 20 + 6𝑋𝑖2

𝛽2 = 6implies that the average value of Y increases by 6 with an unit increase in 𝑋2 , holding 𝑋3
constant. The interpretation remains the same irrespective of the value at which 𝑋3 is held
constant. The value does not affect the slope parameter as it is summed up with the intercept.
Now, holding 𝑋2 constant at 2 we get,

𝑌𝑖 = 10 + 6(2) + 2𝑋𝑖3

= 10 + 12 + 2𝑋𝑖3 = 22 + 2𝑋𝑖3

𝛽3 = 2 implies that the average value of Y increases by 2 with an unit increase in 𝑋3 , holding 𝑋2
constant. Therefore, multiple regression allows us to differentiate the effect of an explanatory
variable on the dependent variable. Let us just continue with our example 2 in module 8 except
that we include an extra explanatory variable- Experience. Thus, the estimated model3 becomes,

Example 1:

̂
𝐸𝐴𝑅𝑁𝐼𝑁𝐺𝑆 = −26.49 + 2.68𝑆 + 0.56𝐸𝑋𝑃

Intercept is negative implying that when the level of education/schooling and experience are zero,
average hourly earnings of an individual will be negative. This does not make economic sense
and can be easily ignored4. The coefficient of schooling suggests that with an extra year of
schooling, average hourly wage increases by 2.68$, keeping experience constant. Similarly, the
coefficient of experience suggest that average hourly wage increases by 0.56$ with an additional
year of experience, keeping schooling constant.

Example 25:

̂ = 130 + 0.24𝑆𝐹𝑇 − 20𝐵𝐸𝐷𝑅 − 14𝐵𝐴𝑇𝐻𝑅


𝑃𝑅𝐼𝐶𝐸

Let us look at another estimated example where we want to understand what influences the
property prices (in dollars). We have three explanatory variables; SFT-square feet, BEDR-
bedroom, BATHR-bathroom. Before attempting to interpret, one should look at the signs of the
coefficient. Notice that the sign of both BEDR and BATHR are negative, going beyond our
expectations. This means that with an increase in a bedroom/bathroom, the property price would
go down by 20$/14$. In fact, it should be opposite. An extra room should increase the price. This
emphasize on the importance of holding other variable constant or other things being equal. The
interpretation of the coefficients would be meaningful only when we include other things being
equal. Therefore, keeping other things equal, an extra bathroom would lead to fall in the average
price of the property by 14$. This is a reasonable argument as by keeping SFT and BEDR the

3
Example from Christopher Dougherty
4
Sometimes, meaningless intercept is suggestive of model misspecification.
5
Example from RamuRamanathan, chapter 4, pg146.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

same, an extra bathroom means splitting up the given area to make room for a bathroom, hence
smaller rooms demanding lesser price.

5. Classical Linear Assumptions Revisited


The properties of the estimators i.e. unbiasedness, efficiency and consistency, is contingent on the
classical linear assumptions. In this section, we shall revisit the assumptions, however, in relation
to the multiple linear model.

Assumption 1: Model is linear in parameters

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + ⋯ … … . +𝛽𝑘 𝑋𝑖𝑘 + 𝑈𝑖

This is same as in the simple linear regression analysis. In the above model, we assume that the
sample contains n random observations. It is important to assume that none of the explanatory
variables is correlated with the error term, especially if X is random. Xs can also be given i.e.
fixed in repeated samples. If correlation is found, we encounter a problem known as Endogeneity
Problem wherein estimates of the regression coefficients are biased. This leads us to our next
assumption.

Assumption 2: 𝑪(𝑿, 𝑼) = 𝟎

This is in the case where Xs are non-random. A simple proof can help comprehend:

𝐶(𝑋, 𝑈)6 = 𝐸(𝑋, 𝑈) − 𝐸(𝑋)𝐸(𝑈)

= 𝑋𝐸(𝑈) − 𝑋𝐸(𝑈) = 0

Assumption 3: Error term has zero

𝐸(𝑢|𝑋) = 0

Even though the above assumption states that expectation of the error term given x is zero; law of
iterated expectations7 implies that it is also true for unconditional mean as well i.e.
𝐸(𝑢) = 0. This assumption helps in interpreting the deterministic part of the model.

Assumption 4: No perfect multicollinearity

Multicollinearity8is defined as a linear relation between any two explanatory variables. This is the
new addition when compared with simple linear regression. For an estimator to be BLUE, it is

6
Expectation of a constant is a constant itself, E(a)=a
7
Wooldridge, “Introductory Econometrics: A Modern Approach”, 4 th edition, Glossary, pg 841
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

required that there exists no perfect relationship between any two explanatory variables.
However, if it all there is one such relationship, then it becomes difficult to estimate the
population parameters and to differentiate the partial effect of an explanatory variable on the
dependent variable.

Assumptions 1-4 are sufficient conditions the estimated parameters to be unbiased. By now, we
should know that, regression coefficients can be decomposed into two components,

(𝑋𝑖 −𝑋̅)
𝛽̂ = 𝛽 + ∑𝑛𝑖=1 𝑎𝑖 𝑢𝑖 , where 𝑎𝑖 = ∑𝑛 ̅ 2
……………(24)
𝑖=1(𝑋𝑖 −𝑋)

This is the case with only one explanatory variable. The second term in equation (24) becomes
complicated with multiple linear model i.e. with more than one explanatory variables. However,
matrix algebra helps us to solve and prove the properties of the estimated parameter9. The
property of unbiasedness does not get affected even in the presence of multicollinearity. Any
linear relationship among the explanatory variables only affects the precision.

Assumption 5: Homoscedasticity

The error terms are identically distributed i.e. 𝑣𝑎𝑟(𝑢) = 𝜎 2

Assumption 6: No serial Correlation

None of the error terms is related i.e. 𝑐𝑜𝑣(𝑢𝑖 , 𝑢𝑗 ) = 0, 𝑖 ≠ 𝑗

Assumption 7: Error term is normally distribution i.e. 𝒖~𝑵(𝟎, 𝝈𝟐 )

This is to facilitate hypothesis testing. We shall revisit this in the next module.

Assumptions 1-6 ensure that estimators from OLS are unbiased, efficient and consistent i.e. they
are best linear unbiased estimators or BLUE. Gauss Markov theorem states that among all linear
unbiased estimates, OLS estimators are the most efficient i.e. have minimum variance. Before
proving the theorem, let us find out the variance and standard error of the parameters.

6. Variance of the Estimators


𝑠33 𝑠𝑦2 − 𝑠𝑦3 𝑠23
𝛽̂2 =
𝑠22 𝑠33 − 𝑠23 2

To find out the variance we need the homoscedasticity assumption as it simplifies the derivation.
However, we do not show the proof here and simply state the formula as in Gujarati & Porter.
Refer to the Know More section for its derivation using matrix algebra. The formula is as below:

8
This topic is dealt in detail in later modules.
9
For proofs, please refer to Know More section.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

2 2
1 𝑋̅2 ∑ 𝑥𝑖3 2 + 𝑋̅3 ∑ 𝑥𝑖2 2 − 2𝑋̅2 𝑋̅3 ∑ 𝑥𝑖2 𝑥𝑖3
̂
𝑣𝑎𝑟(𝛽1 ) = [ + ] 𝜎𝑢 2
𝑛 ∑ 𝑥𝑖2 2 ∑ 𝑥𝑖3 2 − (∑ 𝑥𝑖2 𝑥𝑖3 )2

∑ 𝑥𝑖3 2
𝑣𝑎𝑟(𝛽̂2 ) = 𝜎 2
∑ 𝑥𝑖2 2 ∑ 𝑥𝑖3 2 − (∑ 𝑥𝑖2 𝑥𝑖3 )2 𝑢

∑ 𝑥𝑖2 2
𝑣𝑎𝑟(𝛽̂3 ) = 𝜎 2
∑ 𝑥𝑖2 2 ∑ 𝑥𝑖3 2 − (∑ 𝑥𝑖2 𝑥𝑖3 )2 𝑢

To know the variance, we need to know the variance of the error term, 𝜎𝑢 2. Therefore, the
estimate of the error variance is as given,

∑ 𝑒𝑖 2
𝜎̂𝑢 2 = 𝑛−𝑘
, where 𝑒𝑖 is the residual, n = no. of observations and k=no. of parameters (3 in this
case).

7. Goodness of Fit
In simple regression model, 𝑅 2was defined as variation in the dependent variable explained by
𝐸𝑆𝑆
the variation in the independent variable i.e. 𝑅 2 = 𝑇𝑆𝑆 , where ESS= explained sum of squares
and TSS= total sum of squares. However, in multiple model case, ESS will imply variation in all
the independent variable taken together. Thus, we can easily apply the concept to multiple linear
regression. It is now called Multiple Coefficient of Determination.

It can be shown that10,

𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖2 + 𝛽3 𝑋𝑖3 + 𝑈𝑖

In deviation form, equation 14 from above,

𝑦𝑖 = 𝛽2 𝑥𝑖2 + 𝛽3 𝑥𝑖3 + 𝑒𝑖

𝑒𝑖 = 𝑦𝑖 − 𝛽2 𝑥𝑖2 − 𝛽3 𝑥𝑖3

∑ 𝑒𝑖 2 = ∑ 𝑒𝑖 𝑒𝑖

= ∑ 𝑒𝑖 ( 𝑦𝑖 − 𝛽2 𝑥𝑖2 − 𝛽3 𝑥𝑖3 )

= ∑ 𝑒𝑖 𝑦𝑖 − 𝛽2 ∑ 𝑒𝑖 𝑥𝑖2 − 𝛽3 ∑ 𝑒𝑖 𝑥𝑖3

= ∑ 𝑒𝑖 𝑦𝑖

10
The derivation is taken from Gujarati &Porter, Essentials of Econometrics, chap 4, pg 129.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

As ∑ 𝑒𝑖 𝑥𝑖 = 0. See, we can write ∑ 𝑒𝑖 𝑥𝑖 = ∑ 𝑥𝑖 ( 𝑦𝑖 − 𝛽1 − 𝛽2 𝑥𝑖 ) = ∑ 𝑥𝑖 𝑦𝑖 − 𝛽1 ∑ 𝑥𝑖 −


𝛽2 ∑ 𝑥𝑖 2 = 0. This is the normal equation derived in simple regression model. Thus,

=∑( 𝑦𝑖 − 𝛽2 𝑥𝑖2 − 𝛽3 𝑥𝑖3 )𝑦𝑖

=∑ 𝑦𝑖 2 − (𝛽2 ∑ 𝑦𝑖 𝑥𝑖2 + 𝛽3 ∑ 𝑦𝑖 𝑥𝑖3 )

=TSS – ESS

Therefore,

RSS = TSS-ESS

𝐸𝑆𝑆 𝑅𝑆𝑆
TSS=ESS+RSS and, 𝑅 2 = 𝑇𝑆𝑆
or 1 − 𝑇𝑆𝑆.

Let us just illustrate the above with an example using Gretl software11. We start with a simple
regression model where we suppose that education only depends on aptitude. Below is the
regression output

Model 1: OLS, using observations 1-540


Dependent variable: EDUC

Coefficient Std. Error t-ratio p-value


const 6.06622 0.467226 12.9835 <0.00001 ***
APTITUDE 0.148084 0.00894305 16.5586 <0.00001 ***

Mean dependent var 13.67222 S.D. dependent var 2.438476


Sum squared resid 2123.013 S.E. of regression 1.986484
R-squared 0.337590 Adjusted R-squared 0.336359
F(1, 538) 274.1859 P-value(F) 4.51e-50
Log-likelihood -1135.863 Akaike criterion 2275.726
Schwarz criterion 2284.309 Hannan-Quinn 2279.083

The result show that a point increase in aptitude increases education level by 0.14 years. P value
of the coefficient is very small implying that the coefficient is statistically significant. 𝑅 2is 0.33
which means that 33% of the variation in education is explained by a student’s aptitude. The
number is quite low which is indicative of the fact that some of the important variables are
missing from the model. Those variables could be both observable and unobservable. Now, let us
add another variable that could explain the dependent variable.

11
Gretl is a freely downloadable software.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

Model 2: OLS, using observations 1-540


Dependent variable: EDUC

Coefficient Std. Error t-ratio p-value


const 5.42073 0.493022 10.9949 <0.00001 ***
APTITUDE 0.132807 0.00973893 13.6367 <0.00001 ***
MOTHEREDUC 0.123507 0.0330837 3.7332 0.00021 ***

Mean dependent var 13.67222 S.D. dependent var 2.438476


Sum squared resid 2069.309 S.E. of regression 1.963023
R-squared 0.354347 Adjusted R-squared 0.351942
F(2, 537) 147.3578 P-value(F) 9.66e-52
Log-likelihood -1128.945 Akaike criterion 2263.890
Schwarz criterion 2276.765 Hannan-Quinn 2268.925

Mother’s education also has a role to play in influencing the level of an individual’s education.
This is clearly seen in the results above. It is also seen that adding another variable has made a
difference to the overall fit. 𝑅 2has increased to 0.35 i.e. 35% of the variation in the education
variable is explained jointly by both aptitude and mother’s education. Adding another variable
would further influence the overall fit. In model 3, we add another variable, father’s education.
Although there is not a significant increase in𝑅 2 , it is enough to claim that as we keep on adding
more explanatory variables, the overall fit improves. Addition of any variable, whether related or
not related to the model, will always increase𝑅 2.

Model 3: OLS, using observations 1-540


Dependent variable: EDUC

Coefficient Std. Error t-ratio p-value


const 5.37063 0.488216 11.0005 <0.00001 ***
APTITUDE 0.125709 0.00985334 12.7580 <0.00001 ***
MOTHEREDUC 0.0492425 0.0390901 1.2597 0.20832
FATHEREDUC 0.107683 0.0309522 3.4790 0.00054 ***

Mean dependent var 13.67222 S.D. dependent var 2.438476


Sum squared resid 2023.614 S.E. of regression 1.943038
R-squared 0.368604 Adjusted R-squared 0.365070
F(3, 536) 104.3042 P-value(F) 3.41e-53
Log-likelihood -1122.916 Akaike criterion 2253.832
Schwarz criterion 2270.998 Hannan-Quinn 2260.546

Although increased 𝑅 2 is indicative of a good model, such a method may not be desirable. To
discourage the use of such practice, in multiple linear regression, a different measure is looked at
for analyzing the goodness of fit. It is called the Adjusted 𝑅 2. This gives the correct picture as far

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

as the variability of the model is concerned. The measure adjusts for the loss in degree of freedom
that occurs by including an extra variable. The formula for calculating is as follows:

𝑅𝑆𝑆/(𝑛 − 𝑘)
𝑅̅ 2 = 1 −
𝑇𝑆𝑆/(𝑛 − 1)

𝑛−1
=1− (1 − 𝑅 2 )
𝑛−𝑘

Note that:

1. 𝑅̅ 2 ≤ 𝑅 2 When 𝑅 2 = 1,𝑅̅ 2 = 1. This directly comes from the formula. Now, as k12>1,
(n-k) <(n-1). So, 𝑅̅ 2 < 𝑅 2 . Therefore, 𝑅̅ 2 ≤ 𝑅 2 always.
2. Even though 𝑅 2 can never be negative, 𝑅̅ 2can fall below zero.

High 𝑅̅ 2in no way implies statistical significance of the coefficients. Hypothesis testing of the
regression coefficients is probably the most important part of the econometric analysis, without
which the results will be ineffectual. We shall deal with this in detail in our next module.

8. Summary
 The module is an extension of simple linear regression model with more than one
explanatory variable.
 It shows how regression coefficients are estimated in a general case and how the
interpretation differs from simple linear model.
 This module concludes with the concept of goodness of fit and adjusted 𝑅 2.
 Multiple linear model best explains the reasons and consequences of violations of
classical linear assumptions.

9. Appendix

1. Multivariate model using matrix analysis

Estimating slope parameters becomes easy with matrix algebra. The k variable model can
be written as,
𝑦1 1 𝑥12 … 𝑥1𝑘 𝛽1 𝑢1
𝑦2 1 𝑥22 … 𝑥2𝑘 𝛽2 𝑢2
⋮ = ⋮ ⋮ ⋮ ⋮ ⋮ + ⋮
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
[𝑦𝑛 ] [1 𝑥𝑛2 ⋯ 𝑥𝑛𝑘 ] [𝛽𝑘 ] [𝑢𝑛 ]

12
Not including the intercept
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

That is,
𝑦 = 𝑥𝛽 + 𝑢
OLS method would give us those parameters, which would minimize residual sum of
squares,
RSS= ∑𝑛𝑖=1 𝑒 2 𝑡 ; 𝑒𝑡 = 𝑦 − 𝑥𝛽̂
In matrix algebra,
RSS= 𝑒 ´ 𝑒
= (𝑦 − 𝑥𝛽̂ )´ (𝑦 − 𝑥𝛽̂ )
=(𝑦 ´ -𝛽̂ ´ 𝑥 ´ )(𝑦 − 𝑥𝛽̂ )
=𝑦 𝑦 − 𝑦 ´ 𝑥𝛽̂ − 𝛽̂ ´ 𝑥 ´ 𝑦 + 𝛽̂ ´ 𝑥 ´ 𝑥𝛽̂

=𝑦 ´ 𝑦 − 2𝑦 ´ 𝑥𝛽̂ + 𝛽̂ ´ 𝑥 ´ 𝑥𝛽̂ , since 𝛽̂ ´ 𝑥 ´ 𝑦 is a scalar


FOC,
𝑑𝑅𝑆𝑆
=0
𝑑𝛽̂
Two results will be handy for further calculations,
𝑑𝐴𝑏 𝑑𝑏´ 𝐴𝑏
I. 𝑑𝑏
= 𝐴´ II. 𝑑𝑏
= 2𝐴𝑏

𝑑𝑅𝑆𝑆
Therefore, ̂ = −2𝑥 ´ 𝑦 + 2𝑥 ´ 𝑥𝛽̂=0
𝑑𝛽
𝑥 𝑥𝛽 = 𝑥 ´ 𝑦

−1
𝛽̂ = ( 𝑥 ´ 𝑥) 𝑥 ´ 𝑦
Therefore, 𝑦̂ = 𝑥𝛽̂ is the fitted value.
1. Properties of estimated parameter

a) Unbiasedness - From above,


−1
𝛽̂ = ( 𝑥 ´ 𝑥) 𝑥 ´ 𝑦
= (𝑥 ′ 𝑥)−1 𝑥′(𝑥𝛽 + 𝑢)
= (𝑥 ′ 𝑥)−1 𝑥 ′ 𝑥𝛽 + (𝑥 ′ 𝑥)−1 𝑥′𝑢
= 𝛽 + (𝑥 ′ 𝑥)−1 𝑥′𝑢
Taking expectations,
𝐸(𝛽̂ ) = 𝐸(𝛽) + 𝐸((𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢)
= 𝛽 + (𝑥 ′ 𝑥)−1 𝑥′𝐸(𝑢)
From assumption 3, therefore,
E(𝛽̂ ) = 𝛽. Hence, proved.

b) Variance

𝑉(𝛽̂ ) = 𝐸[(𝛽̂ − 𝛽)(𝛽̂ − 𝛽)′]


= 𝐸[(𝑥 ′ 𝑥)−1 𝑥′𝑢((𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢)−1 ]
= 𝐸[(𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢𝑢′ 𝑥(𝑥 ′ 𝑥)−1 ]
= (𝑥 ′ 𝑥)−1 𝑥 ′ 𝐸(𝑢𝑢′ )𝑥(𝑥 ′ 𝑥)−1
= (𝑥 ′ 𝑥)−1 𝑥 ′ (𝜎 2 𝐼)𝑥(𝑥 ′ 𝑥)−1
= 𝜎 2 (𝑥 ′ 𝑥)−1 𝑥′𝑥(𝑥 ′ 𝑥)−1
= 𝜎 2 (𝑥 ′ 𝑥)−1

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

As mentioned in the text above, to determine


the variance, we need to estimate the error
variance i.e. 𝑉(𝛽̂ ) = 𝜎̂ 2 (𝑥 ′ 𝑥)−1

c) 𝜎̂ 2

RSS = e′e [as in part (a)]


e = y-x𝛽̂
= 𝑥𝛽 + 𝑢 − 𝑥𝛽̂
′ −1 ′
= 𝑥𝛽 + 𝑢 − 𝑥(𝛽 + (𝑥 𝑥) 𝑥 𝑢) [from part (a)]
= 𝑢 − 𝑥(𝑥 ′ 𝑥)−1 𝑥 ′ 𝑢
= [𝐼𝑛 − 𝑥(𝑥 ′ 𝑥)−1 𝑥 ′ ]𝑢
= 𝑀𝑢, where 𝑀 = 𝐼𝑛 − 𝑥(𝑥 ′ 𝑥)−1 𝑥 ′

Now,
𝑒 ′ 𝑒 = (𝑀𝑢)′ 𝑀𝑢
= 𝑢′ 𝑀′ 𝑀𝑢
= 𝑢′𝑀𝑢 , since M is a symmetric idempotent matrix13
Trace of the above along with the cyclical property14,
𝑇𝑟(𝑢′ 𝑀𝑢) = 𝑇𝑟(𝑀𝑢𝑢′ )
Taking expectations, 𝐸(𝑇𝑟(𝑀𝑢𝑢 = 𝑇𝑟(𝑀𝐸(𝑢𝑢′ )) = 𝑇𝑟(𝑀𝜎 2 ) = 𝜎 2 𝑇𝑟(𝑀)
′)

Next, substituting for M,


𝑇𝑟(𝑀) = 𝑇𝑟(𝐼𝑛 − 𝑥(𝑥 ′ 𝑥)−1 𝑥 ′ )
= 𝑇𝑟(𝐼𝑛 ) − 𝑇𝑟(𝑥(𝑥 ′ 𝑥)−1 𝑥 ′ )
= 𝑇𝑟(𝐼𝑛 ) − 𝑇𝑟(𝐼𝑘 ) = 𝑛 − 𝑘
Thus,
𝐸(𝑒 ′ 𝑒) = 𝜎 2 (𝑛 − 𝑘)
𝐸(𝑒 ′ 𝑒) (𝑒 ′ 𝑒)
𝜎̂ 2 = =
(𝑛−𝑘) (𝑛−𝑘)

13
An idempotent matrix is a square matrix, which fulfills the condition, M*M=M
14
Trace of a matrix remains constant with cyclical permutations for e.g. Tr(XYZ)=Tr(YZX). However,
Tr(XYZ)≠TR(YXZ).
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

2. Gauss Markov Theorem

We say that 𝛽̂ is BLUE when among the class of unbiased estimators, which are linear
functions of y, OLS estimator 𝛽̂ has the minimum variance. Here, we intend to prove this
using matrix algebra.

Let there be another estimator 𝛽̌ = KY, which is a linear function of Y. Let us assume K
to be fixed. So, 𝐸(𝛽̌ ) = 𝐸(𝐾𝑌)
= 𝐾𝐸(𝑌)
= 𝐾𝐸(𝑋𝛽 + 𝑈)
= 𝐾𝑋𝛽 + 𝐾𝐸(𝑈)
= 𝐾𝑋𝛽

For 𝛽̌to be unbiased, we need KX=I. A restriction, we need to impose. Supposing that it
is unbiased, to be the best linear estimator, the coefficient needs to have the lowest
variance.

𝑉𝑎𝑟(𝛽̌ ) = 𝐸[(𝛽̌ − 𝛽)(𝛽̌ − 𝛽) ]
= 𝐸[𝐾𝑈𝑈 ′ 𝐾 ′ ]
= 𝐾𝐸[𝑈𝑈 ′ ]𝐾 ′
= 𝜎 2 𝐾𝐾′
Since, 𝛽 = 𝐾(𝑋𝛽 + 𝑈) = 𝐾𝑋𝛽 + 𝐾𝑈 = 𝛽 + 𝐾𝑈 . Hence, 𝛽̌ − 𝛽 = 𝐾𝑈
̌
By comparing 𝑉𝑎𝑟(𝛽̂ ) and𝑉𝑎𝑟(𝛽̌ ), we can prove that 𝛽̂ indeed is the best linear unbiased
estimator. Proving the Gauss Markov Theorem would require some simple mathematical
manipulations. Variable K can be written as,
𝐾 = 𝐾 + (𝑋 ′ 𝑋)−1 𝑋 ′ − (𝑋 ′ 𝑋)−1 𝑋 ′
= 𝑀 + (𝑋 ′ 𝑋)−1 𝑋 ′
𝑀𝑋 = 𝐾𝑋 − (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑋
= 𝐼−𝐼 = 0
Now,𝜎 2 𝐾𝐾′= 𝜎 2 [(𝑀 + (𝑋 ′ 𝑋)−1 𝑋 ′ )(𝑀 + (𝑋 ′ 𝑋)−1 𝑋 ′ )′ ]
= 𝜎 2 [(𝑀 + (𝑋 ′ 𝑋)−1 𝑋 ′ )(𝑀′ + 𝑋(𝑋 ′ 𝑋)−1 )]
= 𝜎 [𝑀𝑀 + 𝑀𝑋(𝑋 𝑋) +(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑀′+(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑋(𝑋 ′ 𝑋)−1 ]
2 ′ ′ −1

The second term in the above expression is zero as 𝑀𝑋 = 0. 𝑋 ′ 𝑀′ is also zero as 𝑋 ′ 𝑀′ = (𝑀𝑋)′.
Thus, the second and the third term vanishes with the variance reducing to,
𝜎 2 𝐾𝐾′=𝜎 2 𝑀𝑀′ + 𝜎 2 (𝑋 ′ 𝑋)−1
Clearly, 𝑎𝑟(𝛽̌ ) > 𝑉𝑎𝑟(𝛽̂ ). This proves our theorem.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS
MODULE NO. : 7, MULTIPLE LINEAR REGRESSION MODEL
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8: Fundamentals of Econometrics

Module No and Title 8: MRM: Inference Problem

Module Tag BSE_P8_M8

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning outcomes
2. Introduction
3. Hypothesis testing
4. Hypothesis testing: Individual regression model
5. Testing overall significance of sample regression
6. Summary

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module, you shall be able to

 Know about the term Hypothesis testing


 Learn how to conduct hypothesis testing in multiple regression
 Identify significance of beta coefficients using hypothesis testing
 Evaluate the individual and group significance of independent variables
 Analyze the results of testing.

2. Introduction

Hypothesis testing allows us to carry out inferences about population parameters using
data from a sample. In order to test a hypothesis in statistics, we must perform three
steps: 1) Formulate a null hypothesis and an alternative hypothesis on population
parameters. 2) Build a statistic to test the hypothesis made. 3) Define a decision rule to
reject or not to reject the null hypothesis.

3. Hypothesis testing

Before establishing how to formulate the null and alternative hypothesis, let us make the
distinction between simple hypotheses and composite hypotheses. The hypotheses that
are made through one or more equalities are called simple hypotheses. The hypotheses
are called composite when they are formulated using the operators "inequality", "greater
than" and "smaller than". It is very important to remark that hypothesis testing is always
about population parameters. Hypothesis testing implies making a decision, on the basis
of sample data, on whether to reject that certain restrictions are satisfied by the basic
assumed model. The restrictions we are going to test are known as the null hypothesis,
denoted by H0. Thus, null hypothesis is a statement on population parameters. An
alternative hypothesis, denoted by H1, will be our conclusion if the experimental test
indicates that H0 is false.

4. Hypothesis testing: Individual regression coefficients

If we invoke the assumption that ui ∼ N (0, σ2), then we can use the t test to test a
hypothesis about any individual partial regression coefficient. To illustrate the mechanics,
consider the child literacy regression. Let us postulate that
H0: β3 = 0 and H1: β3 ≠ 0
The null hypothesis states that, with X2 (female literacy rate) held constant, X3 (PGNP)
has no (linear) influence on Y (child literacy).To test the null hypothesis, we use the t

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

test. If the computed t value exceeds the critical t value at the chosen level of
significance, we may reject the null hypothesis; otherwise, we may not reject it.

One does not have to assume a particular value of α to conduct hypothesis testing. One
can simply use the p value. The interpretation of this p value (i.e., the exact level of
significance) is that if the null hypothesis were true, the probability of obtaining a t value
of T or greater (in absolute terms) can be a small probability, much smaller than the
artificially adopted value of α = 5%.
This example provides us an opportunity to decide whether we want to use a one-tail or a
two-tail t test. Since a priori child mortality and per capita GNP are expected to be
negatively related we should use the one-tail test. That is, our null and alternative
hypothesis should be:
H0: β3 < 0 and H1: β3 ≥ 0
We can reject the null hypothesis on the basis of the one-tail t test in the present instance.
There is intimate connection between hypothesis testing and confidence interval
estimation. For our example, the 95% confidence interval for β3 is:

β̂3 − tα/2 se (β̂3) ≤ β3 ≤ ̂β3 + tα/2 se (̂β3)

that is, the interval includes the true β3 coefficient with 95% confidence coefficient. Thus,
if 100 samples of size 64 are selected and 100 confidence intervals are constructed, we
expect 95 of them to contain the true population parameter β3. Since the interval does not
include the null-hypothesized value of zero, we can reject the null hypothesis that the true
β3 is zero with 95% confidence.

Thus, whether we use the t test of significance or the confidence interval estimation, we
reach the same conclusion. However, this should not be surprising in view of the close
connection between confidence interval estimation and hypothesis testing. Following the
procedure just described, we can test hypotheses about the other.
Before moving on, remember that the t-testing procedure is based on the assumption that
the error term ui follows the normal distribution. Although we cannot directly observe ui,
BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

we can observe their proxy, the ûi , that is, the residuals. The claim for normality is
usually made on the basis of the Central Limit Theorem (CLT), but this is restrictive in
some cases. That is to say, normality cannot always be assumed. In any application,
whether normality of u can be assumed is really an empirical matter. It is often the case
that using a transformation, i.e. taking logs, yields a distribution that is closer to
normality, which is easy to handle from a mathematical point of view. Large samples will
allow us to drop normality without affecting the results too much.

5. Testing Overall Significance of Sample Regression


The test for significance of regression is a test to determine whether a linear relationship
exists between the response variable y and a subset of the regressor variables x 1, x2… xk.
The appropriate hypotheses are H0: β1= β2=………… =βk =0 and H1: βj≠0 for at least one
j.
Rejection of H0 implies that at least one of the regressor variables x1, x2… xk contributes
significantly to the model. The test for significance of regression is a generalization of the
procedure used in simple linear regression. The total sum of squares SST is partitioned
into a sum of squares due to regression and a sum of squares due to error. Recall the
identity,

SST = SSR + SSE

Now if H0: β1= β2=………… =βk =0 is true, SSR/σ2 is a chi-square random variable with
k degrees of freedom. Note that the number of degrees of freedom for this chi-square
random variable is equal to the number of regressor variables in the model. We can also
show the SSE/ σ2 is a chi-square random variable with n-p degrees of freedom, and that
SSE and SSR are independent. The test statistic for is H0: β1= β2=………… =βk =0 is

F0 = (SSR/k)/ (SSE (n-p)) = MSR / MSE

We should reject H0 if the computed value of the test statistic, f0, is greater than fα,k,n-p.
The procedure is usually summarized in an analysis of variance table as below.

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

We can find a computing formula for SSE as follows:

Substituting into the above, we obtain

Relation between F and t statistics

So far we have seen how to use the F statistic to test several restrictions in the model, but
it can be used to test a single restriction. In this case, we can choose between using the F
statistic or the t statistic to carry out a two-tail test. The conclusions would nevertheless
be exactly same. But, what is the relationship between an F with one degree of freedom
in the numerator (to test a single restriction) and a t? It can be shown that

This fact is illustrated in the figure below where we observe that the tail of the F splits
into two tails of the t. Hence, the two approaches lead to exactly the same outcome,
provided that the alternative hypothesis is two sided. However, the t statistic is more
flexible for testing a single hypothesis, because it can be used to test H0 against one-tail
alternatives.

Moreover, since the t statistics are also easier to obtain than the F statistics, there is no
good reason for using an F statistic to test a hypothesis with a unique restriction.

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

The normality of the OLS estimators depends crucially on the normality assumption of
the disturbances. What happens if the disturbances do not have a normal distribution? We
have seen that the disturbances under the Gauss-Markov assumptions, and consequently
the OLS estimators are asymptotically normally distributed, i.e. approximately normally
distributed. If the disturbances are not normal, the t statistic will only have an
approximate t distribution rather than an exact one. As it can be seen in the t student
table, for a sample size of 60 observations the critical points are practically equal to the
standard normal distribution. Similarly, if the disturbances are not normal, the F statistic
will only have an approximate F distribution rather than an exact one, when the sample
size is large enough and the Gauss-Markov assumptions are fulfilled. Therefore, we can
use the F statistic to test linear restrictions in linear models as an approximate test. There
are other asymptotic tests (the likelihood ratio, Lagrange multiplier and Wald tests) based
on the likelihood functions that can be used in testing linear restriction if the disturbances
are non-normally distributed. These three can also be applied when a) the restrictions are
nonlinear; and b) the model is nonlinear in the parameters. For non-linear restrictions, in
linear and non-linear models, the most widely used test is the Wald test. For testing the
assumptions of the model (for example, homoscedasticity and no autocorrelation) the
Lagrange multiplier (LM) test is usually applied. In the application of the LM test, an
auxiliary regression is often run. The name of auxiliary regression means that the
coefficients are not of direct interest: only the R2 is retained. In an auxiliary regression
the regressand is usually the residuals (or functions of the residuals), obtained in the OLS
estimation of the original model, while the regressors are often the regressors (and/or
functions of them) of the original model.

Example:

The Bush Company is engaged in the sale and distribution of gifts imported from the
Near East. The most popular item in the catalog is the Guantanamo bracelet, which has
some relaxing properties. The sales agents receive a commission of 30% of total sales
amount. In order to increase sales without expanding the sales network, the company
established special incentives for those agents who exceeded a sales target during the last
year. Advertising spots were radio broadcasted in different areas to strengthen the
promotion of sales. In those spots special emphasis was placed on highlighting the well-
being of wearing a Guantanamo bracelet. The manager of the Bush Company wonders
whether a dollar spent on special incentives has a higher incidence on sales than a dollar
spent on advertising. To answer that question, the company's econometrician suggests the
following model to explain sales:

Sales = β1 + β2 advert + β3 incent +u

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

Where incent are incentives to the salesmen and advert are expenditures in advertising.
The variables sales, incent and advert are expressed in thousands of dollars. Using a
sample of 18 sale areas (work file advance), we have obtained the output and the
covariance matrix of the coefficients that appears in table below.

Standard output of the regression

Covariance matrix

In this model, the coefficient 2 indicates the increase in sales produced by a dollar
increase in spending on advertising, while3 indicates the increase caused by a dollar
increase in the special incentives, holding fixed in both cases the other regressor. To
answer the question posed in this example, the null and the alternative hypothesis are the
following:

The t statistic is built using information about the covariance matrix of the estimators:

For =0.10, we find that t15 0.10  1.341. As t<1.341we do not reject H0 for =0.10, nor
for =0.05 or =0.01. Therefore, there is no empirical evidence that a dollar spent on
special incentives has a higher incidence on sales than a dollar spent on advertising.

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
____________________________________________________________________________________________________

6. Summary

 This chapter extended and refined the ideas of interval estimation and hypothesis
testing first introduced in Chapter 5 in the context of the two-variable linear
regression model.
 In a multiple regression, testing the individual significance of a partial regression
coefficient (using the t test) and testing the overall significance of the regression
(i.e. H0: all partial slope coefficients are zero or R2 = 0) are not the same thing.
 In particular, the finding that one or more partial regression coefficients are
statistically insignificant on the basis of the individual t test does not mean that all
partial regression coefficients are also (collectively) statistically insignificant. The
latter hypothesis can be tested only by the F test.
 The F test is versatile in that it can test a variety of hypotheses, such as whether
(1) an individual regression coefficient is statistically significant, (2) all partial
slop coefficients are zero, (3) two or more coefficients are statistically equal, (4)
the coefficients satisfy some linear restrictions, and (5) there is structural stability
of the regression model.
 As in the two-variable case, the multiple regression model can be used for the
purpose of mean and or individual prediction.

BUSINESS PAPER NO.8:FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 2: MRM: INFERENCE PROBLEM
___________________________________________________________________________________________________
_

Subject BUSUNESS ECONOMICS

Paper No and Title 8, Fundamentals of Econometrics

Module No and 9: Functional Forms Of Regression Models


Title
Module Tag BSE_P8_M9

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

TABLE OF CONTENTS
1. Learning outcomes
2. Introduction
3. Theory of functional forms
4. The Log-linear model
5. Semi-log models
5.1 Lin-log Model
5.2 Log-lin Model
6. Reciprocal Model
7. Illustrative Examples
8. Choice of functional form
9. Summary

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

1. Learning Outcomes
After studying this module, you shall be able to

 Know about the term functional form


 Learn how to use functional forms in different models
 Identify the interpretations of coefficients and their significance
 Evaluate the model which best suits to given problem
 Analyze whether the outcomes are consistent with real life situations.

2. Introduction
Through the use of multiplicative terms in interaction models, we can assess how slopes vary,
conditional on the value of some other covariate. In this sense, we can estimate more
sophisticated linear relationships among our variables. However, one thing that we haven’t
considered to this point is how to incorporate nonlinear relationships inside of a linear
regression model. Why might we want to think about doing this? The reasons are simple. First,
given the desirable properties of the OLS model, it would be nice to use our data and stay
within the framework of the linear model. Second, if a relationship is nonlinear, then
transformation on X may produce a better fitting regression model than one where the
relationship between X and Y is constant (and linear). Despite its name, the classical linear
regression model is not limited to a linear relationship between the dependent and the
independent variables. In this module we are concerned primarily with models that are linear in
the parameters; they may or may not be linear in the variables.

3. Theory of functional forms


Consider a vector xi = (xi1 xi2 ... xiK) of K variables for each observation i. The L functions
f1(xi), f2(xi)... fL(xi) map the K-dimensional vector xi into L scalars zi1, zi2, ..., ziL. The function
g(yi) is a univariate function of the dependent variable. The non-linear econometric model

g(yi) = β0 + β1f1(xi) + β2f2(xi) + · · · + βLfL(xi) + ui

can therefore be written as

g (yi) = β0 + β1zi1 + β2zi2 + · · · + βLziL + ui

The latter is the usual multiple linear regression model with L + 1 regressors as long as all
necessary assumptions about the error term and the transformed independent variables z i = (1zi1
zi2 ... ziL) are satisfied. All properties of OLS are therefore preserved. The use of vector
approach is to show that we could extend the one equation analysis to many equations in n
dimensional space and can find the solutions to the beta coefficients. Coming back to the one
regression equation model; we consider below some of the other functional forms.

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

In the sections that follow we consider some commonly used regression models that may be
nonlinear in the variables but are linear in the parameters or that can be made so by suitable
transformations of the variables. In particular, we discuss the following regression models:
1. The log-linear model
2. Semi-log models
3. Reciprocal models
4. The logarithmic reciprocal model

4. The Log-Linear Model


Consider the following model, known as the exponential regression model:
Yi = β1Xi β2eui
which may be expressed alternatively as ln Yi = α + β2 ln Xi + ui where α = ln β1,ln = natural
log (i.e., log to the base e, and where e = 2.718)

we can write this model as linear in the parameters α and β2, linear in the logarithms of the
variables Y and X, and can be estimated by OLS regression. Because of this linearity, these
models are called log-log, double-log, or log-linear models. If the assumptions of the classical
linear regression model are fulfilled, the parameters can be estimated by the OLS method by
letting

Y*i= α + β2X*i+ ui , where Y*i= ln Yi and X*i= ln Xi .

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

The OLS estimators ˆα and ˆ β2 obtained will be best linear unbiased estimators of α and β 2,
respectively.

The slope coefficient of the log-log model, β2, measures the elasticity of Y with respect to X,
that is, the percentage change in Y for a given (small) percentage change in X. Thus, if Y
represents the quantity of a commodity demanded and X its unit price, β2 measures the price
elasticity of demand, a parameter of considerable economic interest.
Two special features of the log-linear model are: The model assumes that the elasticity
coefficient between Y and X, β2, remains constant throughout. Another feature of the model is
that although α̂ and β̂2 are unbiased estimates of α and β2, β1 (the parameter entering the original
model) when estimated as β̂1 = antilog (α̂) is itself a biased estimator.

5. Semilog Models : Log-lin and Lin-log Models

5.1 The Log–Lin Model

This model is like any other linear regression model in that the parameters β1 and β2 are linear.
The only difference is that the regressand is the logarithm of Y and the regressor is “time,”
which will take values of 1, 2, 3, etc.
Models like these are called semi-log models because only one variable (in this case the
regressand) appears in the logarithmic form. A model in which the regressand is logarithmic
will be called a log-lin model. Later we will consider a model in which the regressand is linear
but the regressor(s) are logarithmic and call it a lin-log model. Before we present the regression
results, let us examine the properties proportional or relative change in Y for a given absolute
change in the value of the regressor (in this case the variable t), that is
Yt  Y0 (1  r )t
lnYt  ln Y0  t ln(1  r )
let 1  ln Y0 2  ln(1  r )
lnYt  1   2t  ut
β2 = relative change in regressand / absolute change in regressor

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

If we multiply the relative change in Y by 100, it will then give the percentage change, or the
growth rate, in Y for an absolute change in X, the regressor. That is, 100 times β2 gives the
growth rate in Y; 100 times β2 is known in the literature as the semi-elasticity of Y with respect
to X. Instantaneous versus Compound Rate of Growth. The coefficient of the trend variable in
the growth model, β2, gives the instantaneous (at a point in time) rate of growth and not the
compound (over a period of time) rate of growth. But the latter can be easily found from by
taking the antilog of the estimated β2 and subtracting 1 from it and multiplying the difference
by 100. Thus, for illustrative example, suppose the estimated slope coefficient is 0.00743.
Therefore, [antilog(0.00743) − 1] = 0.00746 or 0.746 percent. Thus, in the illustrative example,
the compound rate of growth was about 0.746 percent per quarter, which is slightly higher than
the instantaneous growth rate of 0.743 percent.
This is of course due to the compounding effect.

5.2 Lin-log Model

Instead of estimating model, researchers sometimes estimate the following model:

Yt = β1 + β2t + ut

That is, instead of regressing the log of Y on time, they regress Y on time, where Y is the
regressand under consideration. Such a model is called a linear trend model and the time
variable t is known as the trend variable. If the slope coefficient in is positive, there is an
upward trend in Y, whereas if it is negative, there is a downward trend in Y. Unlike the growth
model just discussed, in which we were interested in finding the percent growth in Y for an
absolute change in X, suppose we now want to find the absolute change in Y for a percent
change in X. A model that can accomplish this purpose can be written as:

Yi = β1 + β2 ln Xi + ui

For descriptive purposes we call such a model a lin–log model. Let us interpret the slope
coefficient β2. As usual,
β2 = change in Y/ change in ln X
= change in Y / relative change in X

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

If X changes by 0.01 unit (or 1 percent), the absolute change in Y is 0.01(β 2); if in an
application one finds that β2 = 500, the absolute change in Y is (0.01)(500) = 5.0. Therefore,
when regression) is estimated by OLS, do not forget to multiply the value of the estimated slope
coefficient by 0.01, or, what amounts to the same thing, divide it by 100. If you do not keep this
in mind, your interpretation in an application will be highly misleading. The practical question
is: When is a lin–log model useful? An interesting application has been found in the so-called
Engel expenditure. Engel found that food expenditures are an increasing function of income
(but that food budget shares decrease with income – which explains the nonlinearity) Goods
with income elasticity <= & 1 = luxuries respectively Engel found that food is a necessity.
Example: an estimated coefficient of .37 from a regression of log food expenditure on log
income suggests that A 1% rise in income generates a 0.37% rise in food expenditure A 10%
rise in income generates a 3.7% rise in the share of household budget spent on food So the food
income elasticity is indeed between 0 & 1, so that food is a necessity.

6. Reciprocal Models
So far we have considered models written in linear form
5.
6. Y = b0 + b1X + u

Implies a straight line relationship between y and X .Sometimes economic theory and/or
observation of data will not suggest that there is a linear relationship between variables. One
way to model a non-linear relationship is the equation

Y = a + b/X + e
(Where the line asymptotes to the value “a” as X ↑ to infinity)

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

However Y = a + b/X + e is not a linear equation since it does not trace out a straight line
between Y and X - and OLS only works (i.e. fit a straight line to minimize RSS) if can
somehow we make it linear. In the above case do this by creating a variable equal to the
reciprocal of X, 1/X, so that the relationship between y and 1/X is linear (i.e. a straight line)

So that Y = a + b/X + e becomes Y = a + b*(1/X) + e

and is now “linear in parameters” i.e. regress Y on 1/X rather than Y on X.


The only thing now need to be careful about is how to interpret the coefficients from this
specification
7. dY/d((1/X) = b but dY/dX = -b/X2
which is linear in parameters which isn’t (and effect not constant)
8.

Models of the following type are known as reciprocal models. Although this model is nonlinear
in the variable X because it enters inversely or reciprocally, the model is linear in β1 and β2 and
is therefore a linear regression model. This model has these features: As X increases
indefinitely, the term β2(l/X) approaches zero (note: β2 is a constant) and Y approaches the
limiting or asymptotic value β1. Therefore, models like above have built in them an asymptote
or limit value that the dependent variable will take when the value of the X variable increases
indefinitely.

Polynomial functional forms express Y as a function of the independent variables, some of


which are raised to powers other than 1.For example, in a second-degree polynomial (also
called a quadratic) equation, at least one independent variable is squared:
Yi = β0 + β1X1i + β2(X1i)2 + β3X2i + εi

The slope of Y with respect to X1 in Equation is:

Note that the slope depends on the level of X1

7. Illustrative Examples
Example1

If you're trying to estimate the amount of time it takes a person to complete a complex physical
task as a function of the air temperature (holding humidity constant), you might suspect that
time would be minimized at about 70o F and higher when the air is either colder or warmer.
The relationship should be U-shaped.
You might choose to model this as:

Ti = β0 + β1Fi + β2Fi 2 + εi

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

You would expect the estimated coefficient on Fi to be negative and the estimate coefficient on
Fi 2 to be positive. This will produce a curve with the desired U-shape.

Example 2

One part of my dissertation looked at the cost of operating scrubbers on coal burning power
plants. The dependent variable was the annual operating and maintenance costs (Yi) and the
explanatory variables were such things as the level of efficiency of the scrubbers (E i), the total
amount of coal burned (Ci), the sulfur content of the coal (Si) and the percentage of hours
during which the scrubber was operated (Pi). If any of these explanatory variables took the
value 0, then there would be no scrubbing done and the costs should be zero. Further, if any of
these were to double, I would expect that the total cost would rise by some percentage, perhaps
doubling. As a result, the model I used was:

Yi = a Ei b Ci c Si d Pi f εi

Taking the natural log of each side allowed me to estimate:

lnYi = lna + blnEi + clnCi + dlnSi + flnPi + ε’i

Example 3

In talking about demand relationships, one of the most important considerations is the price
elasticity of demand. Among other things, this reveals the effect that a price change will have
on revenue. If we estimate the equation:

Qt = β0 + β1Pt + εi
We get the slope of the demand curve, which isn't terribly interesting. However, if we estimate:

Qt = a Pt β εi
the coefficient β is the price elasticity of demand. To do this estimation, we need to rewrite the
equation as:

lnQt = lna + βlnPt + lnεi

or

lnQt = β0 + β1lnPt + ε'i

The estimate of β1 is the estimate of the price elasticity of demand. To be more clear about
doing this, if you have data on prices and quantities at different points in time, all you need to
do is generate new variables equal to the log of the price and the log of the quantity and then do
a linear regression using these new variables. The resulting coefficient on price will be the

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

estimate of elasticity. In general, anytime you estimate a model of this form, the slope
coefficients are elasticities. That is, they represent the relationship between a percentage change
in the explanatory variable and the resulting explanatory variable in the dependent variable. For
example, looking through a catalog for bicycle parts, you would notice that lighter parts carry
higher prices. If you regressed ln(price) on ln(weight) you would calculate the weight elasticity
of price, or the percentage increase in price resulting from a 1% decrease in weight.

8. Choice of functional form


The choice of a particular functional form may be comparatively easy in the two-variable case,
because we can plot the variables and get some rough idea about the appropriate model. The
choice becomes much harder when we consider the multiple regression model involving more
than one regressor, some guidelines are offered:
1. The underlying theory (e.g., the Phillips curve) may suggest a particular functional form.
2. It is good practice to find out the rate of change (i.e., the slope) of the regressand with
respect to the regressor as well as to find out the elasticity of the regressand with respect to the
regressor

If we want to compare goodness of fit of models in which the dependent variable is in logs or
levels then cannot use the R2. The TSS in Y is not the same as the TSS in lnY,

so comparing R2 is not valid. Instead the basic idea behind testing for the appropriate functional
form of the dependent variable is to transform the data so as to make the RSS comparable.

9. Summary
A functional form refers to the algebraic form of a relationship between a dependent variable
and regressors or explanatory variables.

The simplest functional form is the linear functional form, where the relationship between the
dependent variable and an independent variable is graphically represented by a straight line.

The interpretation of coefficients is different in alternative functional forms. In the following


formulations Y represents the dependent variable, x the independent variable, a is the y-
intercept, b is the slope coefficient, ln(y) and ln(x) represent the natural logarithm of y and x,
respectively and e is an error term.

(1) Linear: y = a + b x + e

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
___________________________________________________________________________________________________
_

In this functional form b represents the change in y (in units of y) that will occur as x changes
one unit.

(2) Semi-log: ln(y) = a + b x + e

In this functional form b is interpreted as follows. A one unit change in x will cause a (100) %
change in y, e.g., if the estimated coefficient is 0.05 that means that a one unit increase in x will
generate a 5% increase in y.

(3) Double-log: ln(y) = a + b ln(x) + e

In this functional form b is the elasticity coefficient. A one percent change in x will cause a b%
change in y, e.g., if the estimated coefficient is -2 that means that a 1% increase in x will
generate a -2% decrease in y.

Sometimes a regression model may not contain an explicit intercept term known as regression
through the origin. In such models the sum of the residuals of ui is nonzero and the
conventionally computed r2 may not be meaningful. Unless there is a strong theoretical reason,
it is better to introduce the intercept in the model explicitly. The units and scale in which the
regressand and the regressor(s) are expressed are very important because the interpretation of
regression coefficients critically depends on them. In the log-linear model both the regressand
and the regressor(s) are expressed in the logarithmic form. Models like reciprocal models have
built in them an asymptote or limit value that the dependent variable will take when the value of
the X variable increases indefinitely.

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 9: FUNCTIONAL FORMS OF REGRESSION MODELS
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8: Fundamentals of Econometrics

Module No and Title 10: Simple Regression Model: Interval Estimation

Module Tag BSE_P8_M10

BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning outcomes
2. Introduction
3. Point Estimate Vs Interval Estimate
3.1 Confidence Intervals
3.2 Confidence Levels
3.3 Margin of Errors
4. Confidence Intervals for Regression Coefficients β1 and β2
5. Confidence interval for 𝝈𝟐
6. Confidence Sets for Multiple Coefficients
7. Application of Regression Analysis: The Problem of Prediction
7.1 Mean Prediction
7.2 Individual Prediction
8. Summary

BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module, you shall be able to

 Know about the term Estimation


 Learn how to conduct interval estimation in linear regression model
 Identify significance of beta coefficients using interval estimation
 Evaluate the individual and group significance of independent variables
 Analyze the results of testing.

2. Introduction

Estimation theory is a branch of statistics that deals with estimating the values of parameters based
on measured/empirical data that has a random component. The parameters describe an underlying
physical setting in such a way that their value affects the distribution of the measured data. In
statistics, estimation refers to the process by which one makes inferences about a population, based
on information obtained from a sample.

3. Point Estimate vs. Interval Estimate

Statisticians use sample statistics to estimate population parameters. For example, sample means
are used to estimate population means; sample proportions, to estimate population proportions.

An estimate of a population parameter may be expressed in two ways:

 Point estimate. A point estimate of a population parameter is a single value of a statistic.


For example, the sample mean x is a point estimate of the population mean μ. Similarly,
the sample proportion p is a point estimate of the population proportion P.

 Interval estimate. An interval estimate is defined by two numbers, between which a


population parameter is said to lie. For example, a < x < b is an interval estimate of the
population mean μ. It indicates that the population mean is greater than a but less than b.

3.1 Confidence Intervals

BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________
Statisticians use a confidence interval to express the precision and uncertainty associated with a
particular sampling method. A confidence interval consists of three parts.

 A confidence level.
 A statistic.
 A margin of error.

The confidence level describes the uncertainty of a sampling method. The statistic and the margin
of error define an interval estimate that describes the precision of the method. The interval estimate
of a confidence interval is defined by the sample statistic + margin of error.

For example, suppose we compute an interval estimate of a population parameter. We might


describe this interval estimate as a 95% confidence interval. This means that if we used the same
sampling method to select different samples and compute different interval estimates, the true
population parameter would fall within a range defined by the sample statistic + margin of
error 95% of the time.

Confidence intervals are preferred to point estimates, because confidence intervals indicate (a) the
precision of the estimate and (b) the uncertainty of the estimate.

3.2 Confidence Level

The probability part of a confidence interval is called a confidence level. The confidence level
describes the likelihood that a particular sampling method will produce a confidence interval that
includes the true population parameter.

Here is how to interpret a confidence level. Suppose we collected all possible samples from a given
population, and computed confidence intervals for each sample. Some confidence intervals would
include the true population parameter; others would not. A 95% confidence level means that 95%
of the intervals contain the true population parameter; a 90% confidence level means that 90% of
the intervals contain the population parameter; and so on.

3.3 Margin of Error

In a confidence interval, the range of values above and below the sample statistic is called
the margin of error. For example, suppose the local newspaper conducts an election survey and
reports that the independent candidate will receive 30% of the vote. The newspaper states that the
survey had a 5% margin of error and a confidence level of 95%. These findings result in the
BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________
following confidence interval: We are 95% confident that the independent candidate will receive
between 25% and 35% of the vote.

4. Confidence Intervals for Regression Coefficients β1 and β2


Instead of relying on the point estimate alone, we may construct an interval around the point
estimator, say within two or three standard errors on either side of the point estimator, such that
this interval has, say, 95 percent probability of including the true parameter value. This is roughly
the idea behind interval estimation. To be more specific, assume that we want to find out how
“close” is, say, β̂2 to β2. For this purpose we try to find out two positive numbers δ and α, the latter
lying between 0 and 1, such that the probability that the random interval (β̂2− δ, β̂2+ δ) contains the
true β2 is 1 − α. Symbolically,

Pr (β̂2− δ ≤ β2 ≤ β̂2+ δ) = 1 − α

Such an interval, if it exists, is known as a confidence interval; 1 − α is known as the confidence


coefficient; and α (0 < α < 1) is known as the level of significance. The endpoints of the confidence
interval are known as the confidence limits (also known as critical values), β̂2− δ being the lower
confidence limit and β̂2+ δ the upper confidence limit. In passing, note that in practice α and 1 −
α are often expressed in percentage forms as 100α and 100(1 − α) percent.) An interval estimator,
in contrast to a point estimator, is an interval constructed in such a manner that it has a specified
probability 1 − α of including within its limits the true value of the parameter. For example, if α =
0.05, or 5 percent, we would read: The probability that the (random) interval shown there includes
the true β2 is 0.95, or 95 percent. The interval estimator thus gives a range of values within which
the true β2 may lie.

Recall that in our regression model, we are stating that E (y|x) = β0 + β1x. In this model, β1 represents
the change in the mean of our response variable y, as the predictor variable x increases by 1 unit.
Note that if β1 = 0, we have that E (y|x) = β0 + β1x = β0 + 0x = β0, which implies the mean of our
response variable is the same at all values of x.
In the context of the coffee sales example (below), this would imply that mean sales are the same,
regardless of the amount of shelf space, so a marketer has no reason to purchase extra shelf space.
This is like saying that knowing the level of the predictor variable does not help us predict the
response variable. Under the assumptions stated previously, namely that y ∼ N (β0 + β1x, σ2), our
estimator b1 has a sampling distribution that is normal with mean β1 (the true value of the
parameter), and standard error.
𝜎
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2

That is:
𝜎
𝑏1 ~𝑁(𝛽1 , )
𝑆𝑆𝑥𝑥
BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________

First, we obtain the estimated standard error of b1 (this is the standard deviation of its sampling
distribution :

The interval can be written as:


𝑆𝑒
𝑏1 ± 𝑡𝛼,𝑛−28 𝑏 ≡ 𝑏1 ± 𝑡 𝛼
1
2 2𝑛−2 √𝑆𝑆𝑥𝑥

𝑆𝑒
Note that is the estimated standard error of b1 since we use se = √ MSE to estimate σ. Also,
√𝑆𝑆𝑥𝑥
we have n − 2 degrees of freedom instead of n − 1, since the estimate se has 2 estimated parameters
used in it.

Examples:

For the coffee sales example, we have the following results:


Suppose,

b1 = 34.5833, SSxx = 72, se = 51.63, n = 12.

So a 95% confidence interval for the parameter β1 is:

34.5833 ± t.025,12−2 (51.63/ √72) = 34.5833 ± 2.228(6.085) = 34.583 ± 13.557,


which gives us the range (21.026, 48.140). We are 95% confident that the true mean sales increase
by between 21.026 and 48.140 bags of coffee per week for each extra foot of shelf space the brand
gets (within the range of 3 to 9 feet). Note that the entire interval is positive (above 0), so we are
confident that in fact β1 > 0, so the marketer is justified in pursuing extra shelf space.

Hosiery Mill Cost Function

Suppose,
b1 = 2.0055, SSxx = 7738.94, se = 6.24, n = 48.

For the hosiery mill cost function analysis, we obtain a 95% confidence interval for average unit
variable costs (β1). Note that t.025,48−2 = t.025,46 ≈ 2.015, since t.025,40 = 2.021 and t.025,60 = 2.000 (we
could approximate this with z.025 = 1.96 as well).

2.0055 ± t.025,46(6.24 /√7738.94) = 2.0055 ± 2.015(.0709) = 2.0055 ± 0.1429 = (1.8626, 2.1484)

We are 95% confident that the true average unit variable costs are between $1.86 and $2.15 (this is
the incremental cost of increasing production by one unit, assuming that the production process is
in place.
BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________

Similarly we can define it for coefficient β2.

5. Confidence Interval for σ2


We know that test for chi-square can be written as

χ2 = (n− 2) σ̂2/σ2 (1)

Above follows the χ2 distribution with n − 2 df. Therefore, we can use the χ2 distribution to
establish a confidence interval for σ2

Pr (χ21−α/2≤ χ2 ≤ χ2α/2) = 1 − α (2)

where the χ2 value in the middle of this double inequality is as given by equation1 and where χ21−α/2
and χ2α/2 are two values of χ2 (the critical χ2 values) obtained from the chi-square table for n − 2
df in such a manner that they cut off 100(α/2) percent tail areas of the χ2 distribution. Substituting
χ2 from (1) into (2) and rearranging the terms, we obtain which gives the 100(1 − α) % confidence
interval for σ2. The interpretation of this interval is, if

we establish 95% confidence limits on σ2 and if we maintain a priori that these limits will include
true σ2, we shall be right in the long run 95 percent of the time.

6. Confidence Sets for Multiple Coefficients


Yi = β0 + β1X1i + β2X2i + … + βkXki + ui, i = 1… n

For the above regression model, a joint confidence set for β1 and β2 that is 95% joint confidence set
is a set-valued function of the data that contains the true parameter(s) in 95% of hypothetical
repeated samples. The set of parameter values cannot be rejected at the 5% significance level. We
can find a 95% confidence set as the set of (β1, β2) that cannot be rejected at the 5% level using an
F-test.
Let F(β1,0,β2,0) be the (heteroskedasticity-robust) F-statistic testing the hypothesis that β1 = β1,0 and
β2 = β2,0. For this the 95% confidence set is {β1, 0, β2, 0: F (β1,0, β2,0) < 3.00}. Here 3.00 is the 5%
critical value of the F2, infinity distribution. This set has coverage rate 95% because the test on which
it is based (the test it “inverts”) has size of 5%. That is 5% of the time, the test incorrectly rejects
the null when the null is true, so 95% of the time it does not; therefore the confidence set constructed
as the non-rejected values contains the true value 95% of the time (in 95% of all samples).

The confidence set based on the F-statistic is an ellipse

BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________
1 𝑡12 + 𝑡22 − 2𝜌̂𝑡1 𝑡2 𝑡1 𝑡2
{𝛽1 , 𝛽2 : 𝐹 = ( ) ≤ 3.00}
2 1 − 𝜌̂𝑡21 𝑡2
Now
1
𝐹= × [𝑡12 + 𝑡22 − 2𝜌̂𝑡1 𝑡2 𝑡1 𝑡2 ]
2(1 − 𝜌̂𝑡1 𝑡2 )

2 2
1 𝛽̂2 − 𝛽2,0 𝛽̂1 − 𝛽1,0 𝛽̂2 − 𝛽2,0 𝛽̂1 − 𝛽1,0
= × [( ) +( ) + 2𝜌̂𝑡1 𝑡2 ( )( )
2(1 − 𝜌̂𝑡1 𝑡2 ) 𝑆𝐸(𝛽̂2 ) 𝑆𝐸(𝛽̂1 ) 𝑆𝐸(𝛽̂2 ) 𝑆𝐸(𝛽̂1 )

This is a quadratic form in β1,0 and β2,0 – thus the boundary of the set F = 3.00 is an ellipse.

7. Application of Regression Analysis: The Problem of Prediction


Suppose we have below regression showing consumption expenditure (Y) corresponding to level
of income (X):

𝑌̂𝑖 = 24.4545 + 0.5091𝑋𝑖


where Ŷt is the estimator of true E(Yi) corresponding to given X. What use can be made of this
historical regression? One use is to “predict” or “forecast” the future consumption expenditure Y
corresponding to some given level of income X. Now there are two kinds of predictions: (1)
prediction of the conditional mean value of Y corresponding to a chosen X, say, X0, that is the point
on the population regression line itself and (2) prediction of an individual Y value corresponding
to X0. We shall call these two predictions the mean prediction and individual prediction.

7.1 Mean Prediction


To fix the ideas, assume that X0 = 100 and we want to predict E(Y | X0 = 100).Now it can be shown
that the historical regression above provides the point estimate of this mean prediction as follows:

𝑌̂0 = 𝛽̂1 + 𝛽̂2 𝑋0


= 24.4545+0.5091(100)
=75.3645
where Ŷ0 = estimator of E(Y | X0). It can be proved that this point predictor is a best linear unbiased
estimator (BLUE).Since Ŷ0 is an estimator, it is likely to be different from its true value. The
difference between the two values will give some idea about the prediction or forecast error. To
assess this error, we need to find out the sampling distribution of Ŷ0. Ŷ0 is normally distributed
with mean (β1 + β2X0) and the variance is given by the following formula:

BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________
̅̅̅2
1 (𝑋0 − 𝑋)
̂ 2
𝑉𝑎𝑟(𝑌0 ) = 𝜎 [ + ]
𝑛 ∑ 𝑥𝑖2

By replacing the unknown σ2 by its unbiased estimator σ 2̂ , we see that the variable follows the t
distribution with n−2 df. The t distribution can therefore be used to derive confidence intervals for
the true E (Y0|X0) and test hypotheses about it in the usual manner, namely,

Pr [𝛽̂2 𝑋0 − 𝑡𝛼 𝑠𝑒 (𝑌̂0 ) ≤ 𝛽1 + 𝛽2 𝑋0 ≤ 𝛽̂1 + 𝛽̂2 𝑋0 + 𝑡𝛼 𝑠𝑒 (𝑌̂0 )] = 1 − 𝛼


2 2

On substituting the values in the above equations:

1 (100 − 170)2
𝑣𝑎𝑟(𝑌̂0 ) = 42.159[ + = 10.4759
10 33,000
and

𝑠𝑒(𝑌̂0 ) = 3.2366

Therefore, the 95% confidence interval for true E(Y | X0) = β1 + β2X0 is given by
75.3645 − 2.306(3.2366) ≤ E (Y0 | X = 100) ≤ 75.3645 + 2.306(3.2366) that is,
67.9010 ≤ E(Y | X = 100) ≤ 82.8381
Thus, given X0 = 100, in repeated sampling, 95 out of 100 intervals will include the true mean
value; the single best estimate of the true mean value is of course the point estimate 75.3645.

7.2 Individual Prediction


If our interest lies in predicting an individual Y value, Y0, corresponding to a given X value, say,
X0, then a best linear unbiased estimator of Y0 is also given as in first equation under this topic, but
its variance is as follows:

1 (𝑋0 − 𝑋̅)2
𝑣𝑎𝑟(𝑌0 − 𝑌̂0 ) = 𝐸[𝑌0 − 𝑌̂0 ] = 𝜎 2 [1 + + ]
𝑛 ∑ 𝑥𝑖2

It can be shown further that Y0 also follows the normal distribution with mean and variance given
by above formulas, respectively. Substituting σ̂ 2 for the unknown σ2, it follows that

𝑌0 − 𝑌̂0
𝑡=
𝑠𝑒(𝑌0 − 𝑌̂0 )
also follows the t distribution. Therefore, the t distribution can be used to draw inferences about the
true Y0. Continuing with our consumption–income example, we see that the point prediction of Y0
is 75.3645, the same as that of Yˆ0, and its variance is 52.6349. Therefore, the 95% confidence
interval for Y0 corresponding to X0 = 100 is seen to be

BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________
(58.6345 ≤ 𝑌0 |𝑋0 = 100 ≤ 92.0945)

8. Summary
Estimation and hypothesis testing constitute the two main branches of classical statistics. Preceding
question: confidence interval and test of significance. Underlying the confidence-interval approach
is the concept of interval estimation. An interval estimator is an interval or range constructed in
such a manner that it has a specified probability of including within its limits the true value of the
unknown parameter. The interval thus constructed is known as a confidence interval, which is often
stated in percent form, such as 90 or 95%. The confidence interval provides a set of plausible
hypotheses about the value of the unknown parameter. If the null-hypothesized value lies in the
confidence interval, the hypothesis is not rejected, whereas if it lies outside this interval, the null
hypothesis can be rejected

BUSINESS PAPER NO.8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 10: SIMPLE REGRESSION MODEL:INTERVAL ESTIMATION
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8: Fundamentals of Econometrics

Module No and Title 11: Introduction to Prediction

Module Tag BSE_P8_M11

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcome
2. Introduction
3. Mean Prediction
3.1 Variance of Mean Prediction
3.2 Factors affecting the variance of 𝒀̂𝒐
3.3 Confidence Interval for Mean Predictor
3.4 Interpretation of a Confidence Interval
4. Individual Prediction
4.1 Prediction Interval for Individual Predictor
4.2 Interpretation of a Prediction Interval
4.3 Difference between Confidence Interval and Prediction Interval
4.4 Numerical example
5. Forecasting Methodologies
5.1 Econometric Forecasting
5.2 Time series forecasting
5.3 Fitted values, In-sample and Out-of-Sample Forecasts
5.4 Conditional and Unconditional Forecasts
6. Evaluation of forecasts
7. Summary

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

1. Learning Outcomes
After reading this module, the reader will be able to understand:

 Mean prediction
 Individual prediction
 Forecasting methodologies
 Evaluation of forecast

2. Introduction
Regression analysis has been used to estimate economic relationship among one (or more)
variable(s). The regression results can be used to predict the value of dependent variable when the
independent variables are given or specified. This is known as prediction or forecasting. Typically
we have two types of predictions in the literature namely Mean Prediction and Individual
Prediction. The Mean Prediction basically relates to finding the Mean value of dependent variable
for a specified value of explanatory variable while the Individual Prediction relates to finding the
Individual value of dependent variable for a specified value of explanatory variable.

3. Mean Prediction

The Mean Prediction calculates the average or the expected value of dependent variable (say) when
the independent variable (say 𝑋 ) are given. To illustrate the concept of mean prediction; we take
the following simple regression model:

𝑌 = 𝛽𝑜 + 𝛽1 𝑋 + 𝑈

The fitted value for the regression equation given above is:

𝑌̂ = 𝛽̂𝑜 + 𝛽̂1 𝑋

When 𝑋 = 𝑋𝑜 , the predicted mean value of 𝑌 is

𝑌̂𝑜 = 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜

Where 𝑌̂𝑜 = 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟 𝑜𝑓 𝐸(𝑌⁄𝑋𝑜 )

3.1 Variance of Mean Prediction

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

The predicted mean value of 𝑌 can be different from its true value. The prediction or forecast error
can be calculated as the difference between these two values. If we want to access how good this
prediction is, we have to find out the variability of this prediction. So we calculate the Variance of
Mean Prediction.

First we will show that 𝑌̂𝑜 is an unbiased predictor of 𝐸(𝑌𝑜 ⁄𝑋𝑜 )

𝐸(𝑌̂𝑜 ) = 𝐸( 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 ) = 𝐸( 𝛽̂𝑜 ) + 𝐸( 𝛽̂1 )𝑋𝑜 ) = 𝛽𝑜 + 𝛽1 𝑋𝑜

Please note that 𝛽̂𝑜 𝑎𝑛𝑑 𝛽̂1 are unbiased estimators of 𝛽𝑜 𝑎𝑛𝑑 𝛽1 respectively.
So

𝐸(𝑌̂𝑜 ) = 𝐸(𝑌𝑜 ⁄𝑋𝑜 ) = 𝛽0 + 𝛽1 𝑋0

Therefore the expected value of the predicted value of 𝑌 and the expected value of 𝑌 conditioned
on the given values of 𝑋 are equal.

Recall that, the following property of variance for any two variables 𝐴 𝑎𝑛𝑑 𝐵 which are random

𝑉𝑎𝑟 (𝐴 + 𝐵) = 𝑉𝑎𝑟(𝐴) + 𝑉𝑎𝑟(𝐵) + 2𝐶𝑜𝑣(𝐴, 𝐵)

Thus we find that the variance of the prediction is

𝑉𝑎𝑟 (𝑌̂𝑜 ) = 𝑉𝑎𝑟 ( 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 )

𝑣𝑎𝑟(𝑌̂𝑜 ) = 𝑣𝑎𝑟 (𝛽̂𝑜 ) + 𝑣𝑎𝑟(𝛽̂1 )𝑋𝑜2 + 2 𝑐𝑜𝑣(𝛽̂𝑜 𝛽̂1 )𝑋𝑜

∑ 𝑋𝑖2 2
𝜎2 2
−𝑋̅𝜎 2
= 𝜎 + [ ] 𝑋 + 2 [ ]𝑋
𝑛 ∑(𝑋𝑖 − 𝑋̅)2 ∑(𝑋𝑖 − 𝑋̅)2 𝑜 ∑(𝑋𝑖 − 𝑋̅)2 𝑜

By rearranging the terms, we write the equation above as follows:

1 (𝑋𝑜 − 𝑋̅)2 1 (𝑋𝑜 − 𝑋̅)2


𝑉𝑎𝑟 (𝑌̂𝑜 ) = 𝜎 2 [ + ] = 𝜎 2
[ + ]
𝑛 ∑(𝑋𝑖 − 𝑋̅)2 𝑛 ∑ 𝑥𝑖2

We can show that 𝑌̂𝑜 is normally distributed as 𝑌̂𝑜 is a linear function of random variables which
are normally distributed.
1 (𝑋𝑜 − 𝑋̅)2
𝑌̂𝑜 ~ 𝑁 (𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 , 𝜎 2 [ + ])
𝑛 ∑ 𝑥𝑖2

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

1 (𝑋𝑜 −𝑋̅)2
Where 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 is the mean of the prediction and 𝜎 2 [𝑛 + ∑ 𝑥𝑖2
] is the variance of the
prediction.

̂𝒐
3.2 What factors affects the variance of 𝒀

1 (𝑋𝑜 − 𝑋̅)2
𝑉𝑎𝑟 (𝑌̂𝑜 ) = 𝜎 2 [ + ]
𝑛 ∑ 𝑥𝑖2
1. An increase in the sample size 𝑛 decreases the variance of𝑌̂𝑜 . So the variability of the mean
prediction decreases when sample size increased.
2. An increase in the variability of X decreases the variance of𝑌̂𝑜 . So the variability of the
mean prediction decreases when the variability of 𝑋 increases.
3. The variance of 𝑌̂𝑜 increases when 𝑋𝑜 is very far from𝑋̅. So the variability of the mean
prediction increases. Therefore we should be careful about the predicted value of 𝑌 when
𝑋𝑜 is very far from𝑋̅.

3.3 Confidence Interval for Mean Predictor

The mean predictor’s standard error can be obtained by taking the square root of 𝑉𝑎𝑟(𝑌̂𝑜 ).

1 (𝑋𝑜 − 𝑋̅)2
𝑠𝑌̂𝑜 = √𝜎 2 [ + ]
𝑛 ∑ 𝑥𝑖2
The variance of the error (𝜎 2 ) is not known. So we have to replace it with its unbiased estimator𝑆 2 .
Accordingly, we can use the t- distribution to make inferences about the predictions.

𝑌̂𝑜 − (𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 )


𝑇=
𝑠𝑌̂𝑜
As with actual coefficients the above 𝑇 statistics follows the t distribution with 𝑛 − 2 degrees of
freedom. We can therefore calculate (1 − 𝛼)% confidence intervals for the mean predicted value
of 𝑌𝑜 i.e. for 𝐸(𝑌𝑜 /𝑋0 )

𝑃𝑟 [ 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 − 𝑡𝑛−2,𝛼⁄ ∗ 𝑠𝑌̂𝑜 ≤ 𝛽𝑜 + 𝛽1 𝑋𝑜 ≤ 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 + 𝑡𝑛−2,𝛼⁄ ∗ 𝑠𝑌̂𝑜 ] = 1 − 𝛼


2 2

3.4 Interpretation of a Confidence Interval

A confidence interval estimates the mean or average value of dependent variable 𝑌 for a specified
value of independent variable𝑋. Or equivalently, it give us the range of mean or average values of

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

dependent variable for a specified central percentage of a population for a given value of
independent variable𝑋.
We can discuss the results of regression model with the help of Predicted values of Dependent
variable 𝑌 and Confidence Interval. For example, we can make a statement like: The predicted
value of 𝑌 is 230 when𝑋 = 20. It is useful to incorporate the notion of Confidence Interval to
access the accuracy of our statement. This will help us to understand the variability around those
predictions. A better statement would now be: The predicted value of 𝑌 ranges from 190 to 255
with 95 % Confidence interval when the value of𝑋 = 20. Similarly we could make a statement like
‘the predicted value of 𝑌 increases from 120 to 150 when the value of 𝑋 decreases from 12 to 9’.
A better statement would be: when the value of 𝑋 decrease from 12 to 9, the 95 % confidence
interval of the predicted value of 𝑌 increases from (90; 135) to (143; 167). We need to report the
confidence interval around the predicted value to access the accuracy of any prediction. One could
also plot the predicted values and the confidence intervals over some range of𝑋.

4. Individual Prediction
For Individual Prediction, we are interested in predicting an individual value of the dependent
variable 𝑌 when the independent variable𝑋 = 𝑋𝑜 . Accordingly we obtain;

𝑌𝑜 = 𝛽𝑜 + 𝛽1 𝑋𝑜 + 𝑢𝑜

An individual prediction of the regression equation given above is

𝑌̂𝑜 = 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜

This is because the expected value of error is zero i.e. 𝐸(𝑢𝑜 ) = 0


The prediction error measures the difference between the values of 𝑌𝑜 𝑎𝑛𝑑 𝑌̂𝑜

𝑌𝑜 − 𝑌̂𝑜 = 𝛽𝑜 + 𝛽1 𝑋𝑜 + 𝑢𝑜 − (𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 )

We rearrange the above equation to get:

𝑌𝑜 − 𝑌̂𝑜 = (𝛽𝑜 − 𝛽̂𝑜 ) + (𝛽1 − 𝛽̂1 )𝑋𝑜 + 𝑢𝑜

This is the equation for measures of prediction error. Taking expectations on both the sides we get

𝐸(𝑌𝑜 − 𝑌̂𝑜 ) = 𝐸(𝛽𝑜 − 𝛽̂𝑜 ) + 𝐸(𝛽1 − 𝛽̂1 )𝑋𝑜 + 𝐸(𝑢𝑜 ) = 0

We used the fact that 𝛽̂𝑜 , 𝛽̂1 are unbiased estimator of 𝛽𝑜 𝑎𝑛𝑑 𝛽1 respectively; 𝑋𝑜 is a fixed
number and 𝐸(𝑢𝑜 ) is zero by assumption.

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

We obtained the variance of the prediction error by squaring the equation for prediction error and
taking expectations on both sides which gives us:

𝑣𝑎𝑟(𝑌𝑜 − 𝑌̂𝑜 ) = 𝑣𝑎𝑟(𝛽̂𝑜 ) + 𝑋𝑜2 𝑣𝑎𝑟(𝛽̂1 ) + 2𝑋𝑜 𝐶𝑜𝑣(𝛽𝑜 , 𝛽1 ) + 𝑣𝑎𝑟 (𝑢𝑜 ).

Using the formulae for variance and covariance of 𝛽̂𝑜 𝑎𝑛𝑑 𝛽̂1 and given that 𝑣𝑎𝑟 (𝑢𝑜 ) = 𝜎 2 we get
1 (𝑋𝑜 − 𝑋̅)2
𝑣𝑎𝑟(𝑌𝑜 − 𝑌̂𝑜 ) = 𝜎 2 [1 + + ]
𝑛 ∑ 𝑥𝑖2

We compare the variance of the individual prediction with the variance of the mean prediction and
found that the former is larger than the latter by 𝜎 2 . This is because in the mean prediction we
average out the effect of the disturbance terms but not so in individual predictions.

We can show that 𝑌̂𝑜 is normally distributed with mean of 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 and variance of

1 (𝑋𝑜 − 𝑋̅)2
𝜎 2 [1 + + ]
𝑛 ∑ 𝑥𝑖2

1 (𝑋 −𝑋̅) 2
i.e. 𝑌̂𝑜 ~ 𝑁 (𝛽̂𝑜 + 𝛽̂1 , 𝜎 2 [1 + 𝑛 + ∑𝑜 𝑥 2 ] )
𝑖

4.1 Prediction Interval for Individual Predictor

The Individual Predictor’s standard error can be obtained by taking the square root of 𝑉𝑎𝑟(𝑌̂𝑜 ).

1 (𝑋𝑜 − 𝑋̅)2
𝑠𝑌𝑜−𝑌̂𝑜 = √𝜎 2 [1 + + ]
𝑛 ∑ 𝑥𝑖2

The variance of the error (𝜎 2 ) is not known, so we need to replace it with its unbiased estimator𝑆 2 .
We can then use the t- distribution to make inferences about the predictions. The variable
𝑌𝑜 − 𝑌̂𝑜
𝑡=
𝑠𝑌𝑜 −𝑌̂𝑜

follows the t distribution with 𝑛 − 2 degrees of freedom. The Individual predicted value of 𝑌𝑜 i.e.
for 𝑌𝑜 ⁄𝑋0 with confidence interval of (1 − 𝛼)% is given by:

𝑃𝑟 [ 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 − 𝑡𝑛−2,𝛼⁄2 ∗ 𝑠𝑌𝑜 −𝑌̂𝑜 ≤ 𝑌𝑜 ⁄𝑋0 ≤ 𝛽̂𝑜 + 𝛽̂1 𝑋𝑜 + 𝑡𝑛−2,𝛼⁄2 ∗ 𝑠𝑌𝑜 −𝑌̂𝑜 ] = 1 − 𝛼

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

4.2 Interpretation of a Prediction Interval

Prediction Interval gives us the value of dependent variable 𝑌 for a speficied value of
independent variable 𝑋.In other words, it give us the range of values of the dependent
variable for a specified central percentage of a population with a specified value of
independent variable.

4.3. Difference between Confidence Interval and Prediction Interval

The Prediction Interval predicts one particular value of dependent variable 𝑌 for a given
value of. The Prediction Interval is given below:

𝑦̂ ± 𝑡𝑛−2,𝛼⁄2 ∗ 𝑠𝑌𝑜 −𝑌̂𝑜

The Confidence Interval predicts the mean value of dependent variable 𝑌 for a given value
of𝑋. The confidence Interval is given below:

𝑦̂ ± 𝑡𝑛−2,𝛼⁄2 ∗ 𝑠𝑌̂𝑜
We observed that the width of these intervals is smallest when 𝑋𝑜 = 𝑋̅. when 𝑋𝑜 moves
farther from 𝑋̅ the width of the interval increases rapidly. This suggests that the predictive
accuracy decreases as 𝑋𝑜 differs too much from 𝑋̅ . We can also see that the Confidence
Interval is much wider than the Prediction interval. This is because there is less error in
predicting the mean value than in predicting individual values of dependent variable.

4.4. Numerical Example:

The data on weekly expenditure on consumption and weekly income of a particular hypothetical
community are given below:

Weekly Weekly
Consumption Income
Expenditure (in
(in Rupees) Rupees)
80 90
86 102
90 115
94 123
99 131
105 145

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

109 156
115 167
121 175
128 188 1. Find the estimated regression line
2. Calculate the marginal Propensity to consume of the above hypothetical community. Does
it conform to Keynes Psychological Law of Consumption?
3. Calculate the mean prediction of weekly consumption expenditure when the weekly
income is Rs 120.
4. Calculate the 95% confidence interval and 95% prediction interval and gave its
interpretation.

Solutions:

𝑦 𝑥 ̂
𝑈
Y X = 𝑌 − 𝑌̅ = 𝑋 − 𝑋̅ 𝑥2
𝑥∗𝑦 𝑌̂ = 𝑌 − 𝑌̂ ̂2
𝑈
2420.6
80 90 -22.7 -49.2 4 1116.84 78.97374 1.02626 1.05321
1383.8
86 102 -16.7 -37.2 4 621.24 84.76064 1.239356 1.536003
90 115 -12.7 -24.2 585.64 307.34 91.02979 -1.02979 1.060467
94 123 -8.7 -16.2 262.44 140.94 94.88773 -0.88773 0.788057
99 131 -3.7 -8.2 67.24 30.34 98.74566 0.254338 0.064688
105 145 2.3 5.8 33.64 13.34 105.4971 -0.49705 0.247059
109 156 6.3 16.8 282.24 105.84 110.8017 -1.80171 3.246166
115 167 12.3 27.8 772.84 341.94 116.1064 -1.10637 1.224063
1281.6
121 175 18.3 35.8 4 655.14 119.9643 1.03569 1.072654
2381.4
128 188 25.3 48.8 4 1234.64 126.2335 1.766544 3.120678
Sum 1027 1392 0 0 9471.6 4567.6 1027 0 13.41305
Ave
rage 102.7 139.2 0 0 947.16 456.76 102.7 0

Slope Coefficient:
∑ 𝑥𝑦 4567.6
𝛽̂1 = ∑ 𝑥 2 = 9471.6 = 0.482242

Intercept Coefficient:
𝛽̂0 = 𝑌̅ − 𝛽̂1 𝑋̅ = 102.7 − 0.482242 ∗ 139.2 = 35.57196

Estimated variance of error:

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

̂𝑖2
∑𝑈 13.41305
𝜎̂ 2 = 𝑛−2
= 8
= 1.676631

1. The estimated regression line is


𝑌̂𝑖 = 35.57196 + 0.482242 Xi
2. The marginal propensity to consume is 0.482242 meaning that for every rupee increase in
income, expenditure on consumption increases by 48 paise. Since the marginal propensity
to consume lies between 0 and 1, it conforms to Keynes Psychological Law of
consumption.
3. Mean Prediction when
𝑋 = 120 𝐸(𝑌̂⁄𝑋 = 120) = 35.57196 + 0.482242 ∗ 120 = 93.441
When the weekly income is Rs 120, the mean prediction of consumption is Rs 93.441
4. The Mean Prediction has a confidence interval which is given by:

𝑦̂ ± 𝑡𝑛−2,𝛼⁄2 ∗ 𝑠𝑌̂𝑜
Where
1 (𝑋0 −𝑋̅)2
𝑠𝑌̂𝑜 = 𝜎√[𝑛 + ] = 0.482616
∑ 𝑥𝑖2

The Individual Prediction has a prediction interval which is given by:


𝑦̂ ± 𝑡𝑛−2,𝛼⁄ ∗ 𝑠𝑌𝑜 −𝑌̂𝑜
2

Where
1 (𝑋0 − 𝑋̅)2
𝑠𝑌𝑜−𝑌̂𝑜 = 𝜎√[1 + + ] = 1.381865
𝑛 ∑ 𝑥𝑖2

Since 𝑦̂ = 93.441 and 𝑡𝑛−2,𝛼⁄2 = 2.306 (see t-distribution table)

The Confidence interval with 95% level of confidence is (92.328088, 94.553912) which means that
when Income is Rs 120, the mean consumption expenditure will lie between Rs 92.328088 and Rs
94.553912 with a probability of 95%.
The Prediction Interval with 95% level of confidence is (90.25442, 96.62758) which means that
when income is Rs 120, the individual consumption expenditure will lie between Rs 90.25442 and
Rs 96.62758 with a probability of 95%.

5. Forecasting Methodologies

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

The subject of econometric forecasting is quite vast and it is not possible to discuss all the
issues involved in it. The term Forecasting or Prediction usually connotes an attempt to
predict future. However it can also be used in a situation of predicting cross section
variables. We have two main types of the methodologies of forecasting. The first one is
known as Econometric Forecasting and the second methodology is known as Time Series
Forecasting.

5.1. Econometric Forecasting

It deals with a regression model where we are trying to predict the unknown value of
dependent variable with the help of explanatory variables. This type of forecasting has been
used for policy making as they can explain changes in economic and other behavioral
variables. For Instance we can predict the housing demand of a particular city based on the
city’s population, household income and some other variables. In time series forecasting,
our aim is to predict the future value of a variable based on its past values.

The simplest model for econometric forecasting is when the dependent variables have no
lagged values or its error is serially uncorrelated. It is possible to have a case when the
dependent variables have lagged values and their errors are serially correlated. When the
errors are serially correlated, we modeled with an autoregressive process to obtain more
efficient forecast. We illustrate the most general econometric formulation of a single
dependent variable with both lagged dependent variables and autocorrelated errors.

𝑌𝑡 = 𝛼𝑜 + 𝛼1 𝑌𝑡−1 + ⋯ … … … . . +𝛼𝑝 𝑌𝑡−𝑝 + 𝛽1 𝑋𝑡1 + ⋯ … … … … … . +𝛽𝑘 𝑋𝑡𝑘


+ 𝑢𝑡

𝑢𝑡 = 𝜌1 𝑢𝑡−1 + 𝜌2 𝑢𝑡−2 + ⋯ … … … … … . + 𝜌𝑞 𝑢𝑡−𝑞 + 𝜀𝑡

A one-step-ahead forecast for given values of 𝑋𝑛+1,1 , 𝑋𝑛+1,2… 𝑋𝑛+1,𝑘 are given below:

𝑌̂𝑛+1 = 𝛼̂1 𝑌𝑛 + 𝛼̂2 𝑌𝑛−1 + ⋯ … … … … … . + 𝛼̂𝑝 𝑌𝑛+1−𝑝 + 𝛽̂1 𝑋𝑛+1,1 + 𝛽̂2 𝑋𝑛+1,2
+ 𝛽̂3 𝑋𝑛+1,3 + ⋯ … … … … … . +𝛽̂𝑘 𝑋𝑛+1,𝑘 + 𝑢̂𝑛+1

Where

𝑢̂𝑛+1 = 𝜌̂1 𝑢̂𝑛 + 𝜌̂2 𝑢̂𝑛−1 + ⋯ … … … … … … . . + 𝜌̂𝑞 𝑢̂𝑛+1−𝑞

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

5.2. Time series forecasting

In time series forecasting, attempt is made to forecast the future value of a variable by using
the past value of that variable. A time series follows purely autoregressive (AR model)
structure if the dependent variable 𝑌 is related to its past values with white noise errors. It
is purely moving average (MA model) if the dependent variable 𝑌 are related to a linear
combinations of white noise error terms. A more general model is an ARMA model which
combines both the AR and MA model into a single model. The general form of a
𝐴𝑅𝑀𝐴 (𝑝, 𝑞) model is given below:

𝑌𝑡 = 𝛼1 𝑌𝑡−1 + 𝛼2 𝑌𝑡−2 + … … + 𝛼𝑝 𝑌𝑡−𝑝 + 𝑣𝑡 + 𝛽1 𝑣𝑡−1 + … . . + 𝛽𝑞 𝑣𝑡−𝑞

The one-step-ahead forecast for an 𝐴𝑅𝑀𝐴 (𝑝, 𝑞) model would be as follows:

𝑌̂𝑡+1 = 𝛼̂1 𝑌𝑡 + 𝛼̂2 𝑌𝑡−1 + … … . + 𝛼̂𝑝 𝑌𝑡+1−𝑝 − 𝛽̂1 𝑣̂𝑡 − 𝛽̂2 𝑣̂𝑡−1 − … … − 𝛽̂𝑞 𝑣̂𝑡+1−𝑞

A time series possess a property of stationarity if the series has time-invariant mean and
constant variance and covariance. A non-stationarity series can be converted to stationarity
series by differencing. There are three steps involved in estimating a time series model
such as identification, estimation and diagnostic checking. The first step of identification
specifies the order of the ARMA model using correlograms and partial correlograms.
Diagnostic test such as Box-Pierce and Ljung-Box are conducted to see it the model fits
the data adequately. Only when the data passed adequacy test, a time series data can
generate forecast from the model.

It is usually observed that in short term, the time series forecast offers better results while
in the longer term econometric forecasts are preferable. Models that can combine both
these two methods can give better short and long term forecasts. However it is difficult to
draw a line between these two methodologies.

5.3. Fitted values, In-sample and Out-of-sample Forecasts

We usually differentiate between two time periods in forecasting literature. First, we


estimate one / more model for the period 𝑛1 to 𝑛2 . We next made a forecast for the sample
period 𝑛1 to 𝑛2 using the fitted values of these estimates. This is known as in-sample
forecast. This is illustrated below by considering the following regression model:

𝑌𝑡 = 𝛽0 + 𝛽1 𝑋𝑡1 + 𝛽2 𝑋𝑡2 + … … … … … … … … … … … . . +𝛽𝑘 𝑋𝑡𝑘 + 𝑢𝑡

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

The fitted value for the time period 𝑡 is

𝑌̂𝑡 = 𝛽̂0 + 𝛽̂0 𝑋𝑡1 + 𝛽̂2 𝑋𝑡2 + … … … … … … … … … … … … . + 𝛽̂𝑘 𝑋𝑡𝑘

We next generate the Out-of-Sample forecasts for time periods 𝑛2 + 1 onward. The
forecasts generated for the time period outside the data used to estimate the model is known
as Out-of-sample forecast. These two forecasting periods are illustrated below:

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝑃𝑒𝑟𝑖𝑜𝑑: 𝑖𝑛 − 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡 𝑂𝑢𝑡 − 𝑠𝑎𝑚𝑝𝑙𝑒 − 𝑓𝑜𝑟𝑒𝑐𝑎𝑠𝑡


⟨ | ⟩
𝑛1 to 𝑛2 𝑝𝑒𝑟𝑖𝑜𝑑 𝑛2 + 1 𝑡𝑜 𝑡 𝑝𝑒𝑟𝑖𝑜𝑑

5.4. Conditional and Unconditional Forecasts

A conditional forecast implies the prediction of dependent variable when the value of
independent variables are known or given. Suppose we try to predict housing demand in a
particular city. Let us hypothesize that housing demand depends on the city’s population.
If the population of the city is known and is assumed to remain constant in the future, then
we can forecast the demand for housing based on the city’s population.

In Unconditional Forecasts, the explanatory variables are unknown but can be generated
from the model or auxiliary model itself. For instance if we consider the example given
above and the population of the city is unknown, then it can be obtain by using auxiliary
model of birth rate, death rate or population migration. Accordingly the demand for
housing can be forecasted.

6. Evaluation of forecasts
It is often the case that more than one model are used to generate forecasts. It is necessary
to determine which of the models provide the most accurate forecast as different models
generate different forecasts. There is huge literature on the evaluation of forecasts and
historically, the most preferred measure has been the mean squared error (MSE). MSE
provides a quadratic loss function and is useful in situations where large forecast errors are
more serious than smaller error. This may, however, also be viewed as a disadvantage if
large errors are not disproportionately more serious. The other measures of forecast
accuracy are: Mean Absolute Error (MAE), Mean Absolute Percent Error (MAPE) and
Mean Square Percentage error (MSPE). The following are the definition for these various
measures of forecast evaluation.
1. Mean Square Error:

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

2
∑(𝑌𝑡𝑓 − 𝑌𝑡 )
𝑀𝑆𝐸 =
𝑛−2

2. Mean Absolute Error:

∑|𝑌𝑡𝑓 − 𝑌𝑡 |
𝑀𝐴𝐸 =
𝑛−2

3. Mean Absolute Percent Error

𝑓
1 |𝑌𝑡 − 𝑌𝑡 |
𝑀𝐴𝑃𝐸 = ∑ 100
𝑛 𝑌𝑡

4. Mean Square Percent Error


𝑓 2
1 𝑌𝑡 − 𝑌𝑡
𝑀𝑆𝑃𝐸 = ∑ [100 ]
𝑛 𝑌𝑡

𝑓
Where 𝑌𝑡 denote the forecast of the dependent variable for the observation; 𝑌𝑡 denote the
actual value and 𝑛 denotes the sample size.

If two or more models are used to predict the dependent variable then the model that has
lower values for these measures would be considered a better model for forecasting
purposes. We have other ways of evaluating the forecasting accuracy of a model. This can
be done by carrying out a post sample forecast. We will not use the last few observations
in estimating the model, but instead use the parameter estimates from the first set of
observations to predict the dependent variable for the reserved sample. We then calculate
the MSE and other measures of forecast accuracy and chose the model that gives us the
lowest values of these measures for forecasting purposes.

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

7. Summary

 Prediction is about finding the unknown dependent variable when the independent
variables are given. We have two types of predictions namely Mean Prediction and
Individual Prediction.
 The Mean Prediction find out the average or mean value of dependent variable for a
specified value of explanatory variable while the Individual Prediction find out the
Individual value of dependent variable for a specified value of explanatory variable.
 Variance of Prediction accesses the variability of prediction and tells us how good
the prediction is.
 A confidence interval estimates the range of mean values of dependent variable for a
specified central percentage of a population for a given value of independent
variable𝑋.
 Prediction Interval estimates the range of values of the dependent variable for a
specified central percentage of a population with a specified value of independent
variable 𝑋
 The two methodologies of forecasting are Econometric Forecasting and Time Series
Forecasting. In the first method, we predict the value of dependent variable with the
help of explanatory variables. In time series forecasting, we forecast the future value
of a variable by using the past value of that variable.
 Some measures of forecast accuracy are: Mean Square Error, Mean Absolute Error,
Mean Absolute Percent Error and Mean Square Percentage Error.

BUSINESS PAPER NO. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. 11: INTRODUCTION TO PREDICTION
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8, Fundamentals of Econometrics

Module No and Title 12, Specification errors and Diagnostic Testing

Module Tag BSE_P8_M12

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcomes

2. Introduction

3. Types of Errors

4. Consequences of model specification errors.

5. Tests of Specification errors

6. Errors of measurement

7. Nested vs Non-Nested Models

8. Tests of Non-Nested Hypothesis.

9. Model Selection Criteria

10. Summary

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module, you shall be able to
 know different types of errors
 Consequences of specification errors
 Detect these errors using certain tests
 Remedies to solve the errors
 Establishing comparisons in different models.

2. Introduction
Classical linear regression model assumes that the regression model considered is “correctly
specified”. If this assumption is not satisfied, then there will be problem of model specification
bias or model specification error. It is important to search for the correct model for estimation,
otherwise the whole implication would be spurious. In this module, we explore in considerable
detail the different types of errors and their consequences on the regression models. the following
section of the module discuss the tests to detect these errors and the different approaches to solve
these errors. towards, the end, model selection criteria’s are discussed to finally search for the
correct model so as to render a meaningful estimation of an empirical model.

3. Types of Errors
To start with, lets consider a model based on production function. let Yt be the output produced
and Xt be the amount of labour used given the technology and let Zt denote the amount of capital
used. So the model becomes:
Yt = α + β Xt + Ɣ Zt+ ut (3.1)
There are different types of errors that can occur in estimation of this model. These can be
attributed to the following reasons:
1. specification error with respect to choice of variables
2. functional form of the model
3. error structure of the model

we will elaborate on each of these attributes in the next part.

3.1 Specification error with respect to choice of variables.

The specification error with respect to choice of the variable occurs when either one of the
variable from the true model is omitted or an irrelevant variable has been included in the model.

3.1.1 Omission of a relevant variable

Suppose for some reason we forgot to include the variable Zt in the model and there fore purr
estimated model becomes:
Yt = α + β Xt + ut (3.2)
BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

By estimating equation 3.2, we have committed an error of omission.

3.1.2 Inclusion of an irrelevant variable

Suppose the estimator believes that a variable Qt denoting the hours labour spend talking should
be included in estimating the production function and hence the estimated model now becomes:
Yt = α + β Xt + Ɣ Zt+ � Qt + ut (3.3)
By estimating 3.3, the estimator commits an error of including an irrelevant variable.

3.2 Functional form error


Supposingly , the estimator estimates the following model:
Yt = α + β Xt2 + Ɣ Zt2+ ut (3.4)
by estimating equation 3.4, the estimator is making error of using the wrong functional form for
the model.

3.3 Error structure of the model.


The possible problems in estimation that result from error structure can be listed as :
1. measurement errors.
2. violating the assumptions imposed on the error term by classical linear regression model.

3.3.1 Measurement errors.


Consider a researcher who is able to get the values of variables Yt, Xt, Zt measured with some
errors, such he only has proxies to each of these variables available for estimation.thus the
estimated model becomes:
Y*t = α + β X*t + Ɣ Z*t+ u*t (3.5)

where Y*t =Yt + et , X*t =Xt + wt andZ*t =Zt + vt .


in estimating equation 2.5, the estimator commits errors of measurement or measurement error
bias.

3.3.2 Violating the assumptions imposed on the error term by classical linear
regression model
Suppose the researcher estimates the following model:
Yt = Xt βut (3.6)
In equation 3.6, the stochastic error term ut enters multiplicatively in the model. this might result
in another specification error that results from incorrect specification of ut. To estimate this
model, if log transformation is applied to equation 2.6, the error term now becomes ln ut.
ln Yt = β ln Xt + ln ut (3.7)
In estimation of 3.7, the entire model gets transformed and may or may not satisfy the basic
assumptions of classical linear regression model.

To sum up, the different types of errors that can be encountered in empirical investigation of an
hypothesis are:

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

1. Omission of a relevant variable


2. Inclusion of an irrelevant variable
3. Use of wrong functional form
4. Errors of measurement
5. CLRM assumptions not being satisfied by the error structure of the model.
However, when we begin with estimation, we have a theoretical hypothesis to start with, based on
which we develop a model. so first four types of errors in the list occur if we know the correct
model and somehow do not estimate the correct model thus, they are called model specification
errors.
In the next section, we continue to evaluate the consequences of committing the model
specification errors.

4. Consequences of Model Specification Errors


Having understood the different types of errors, lets examine the consequences of committing
these errors. To keep the discussion simple, we consider a three variable model and consider in
this section consequences of errors with respect to choice of variables.

4.1. Consequences of omission of a relevant variable.


Suppose the true model is given by :
Yt = α1 + α2 Xt + α3 Zt+ ut (4.1)
However, the researcher estimates the following model due to some reason:
Yt = β1 + β2 Xt + vt (4.2)
The consequences of using equation 4.2 for estimation and omitting Zt will be as follows:
1. If an independent variable whose regression coefficient is non-zero is omitted from the model,
the estimated values of all other regression coefficients will be biased and inconsistent unless the
excluded variable is uncorrelated with every included variable. This would mean that E (β1) ≠ α1
and E (β2) ≠ α2 and the bias does not disappear as the sample size becomes larger.

2. The estimated content term is generally biased implying that the forecast made from the
equation will also be biased. This is true even when the excluded variable is uncorrelated with
every included variable.

3. The estimated variance of of the stochastic term is incorrectly estimated.

4. the estimated variance of the regression coefficients of the included variables will generally be
biased. This implies that usual confidence interval and tests of hypothesis are invalid.

4.1.1 Proof of Omitted variable bias for the included variable.

From estimating equation 3.2, it appears that


vt = α3 Zt+ ut
E(vt) = α3 Zt [ Because E (ut) = 0]
Cov ( Xt , vt ) = Cov ( Xt , α3 Zt+ ut)
= α3 Cov ( Xt , Zt) + Cov ( Xt , ut)
= α3 Cov ( Xt , Zt) [ Because Cov ( Xt , ut)= 0]
BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

≠0
E (β2) = α2 + α3 b32

where b32 is the slope in the regression of the excluded variable Zt on the included variable Xt.
Hence, coefficient of Xt will be biased and inconsistent unless Xt is uncorrelated with Zt or α3 is
zero.
The extent of bias will depend on the magnitude of the term α3 b32 and the direction of the bias
will depend on the sign of α3 and direction of b32.
proof of bias in constant term.
_ _
β1 = Y —β2 X
_ _
E( β1) = E( Y—β2 X)
_ _ _ _
Y = α1 + α2 X+ α3 Z+ u
_ _ _
E( Y) = α1 + α2 X+ α3 Z

_ _
E( β1) = E( Y—β2 X)
_ _
= E( Y) -E ( β2 X)
_ _
= E( Y) - [α2 + α3 b32] X
_ _ _ _
= α1 + α2 X+ α3 Z - α2 X - α3 b32 X

_ _
= α1 + α3 {Z - α3 b32 X}
_ _
For the constant term to be unbiased, Z - α3 b32 X = 0 . Thus, even if there is no correlation of
omitted variable with the included variables, the constant term continues to be biased. This also
shows one of the reason why a constant term should always be included in the model.

As we can see, the consequences of omitting a relevant variable can be really serious. Hence, if
the hypothetical theory suggests for inclusion of a variable, it should never be dropped from the
empirical estimation of the model.

4.2 Consequences of including an Irrelevant Variable

Suppose the true model is given by :


Yt = β1 + β2 Xt + vt (4.3)
However, the researcher estimates the following model due to some reason:
Yt = α1 + α2 Xt + α3 Zt+ ut (4.4)
By estimating equation 3.4, the researcher commits an error of including an irrelevant variable in
the model. the consequences of such an error will be:

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

1. If an independent variable whose true regression coefficient is zero is included in the model,
the estimated values of other regression coefficients will be unbiased and consistent.
2. The estimated variances of the regression coefficients will also be unbiased and hence, the
tests of hypothesis will be valid.
3. The error variance is correctly estimated.
4. ariance of the regression coefficients will be higher than that without the irrelevant variable.
hence, the coefficient will become inefficient with larger variances.

Caution: Comparing the consequences of two types of errors with respect to choice of variables,
it appears that there are less serious consequences of including an irrelevant variable. hence, it
might become more tempting to include more variables in the model to avoid omitting any
variable from the model. This may not be the right conclusion, as by including more variables, it
leads to loss in efficiency of variables and might also result in problem of multi-collinearity,
along with lost degrees of freedom.

5. Tests of Specification Errors


As Davidson notes, “ Because of non-experimental nature of economics, we are never sure how
the observed data were generated. The test of any hypothesis in economics always turns out to
depend on additional assumptions necessary to specify a reasonably parsimonious model, which
may or may not be justified.” Thus, it is in the very nature of the empirical data to lead to
specification errors of one or the other type, given a theory. So, it becomes important to devise
methods to detect such errors in the model. This section is devoted for the same purpose.

5.1 Ramsey’s RESET Test.

Ramsey has proposed a general test of specification error called RESET ( Regression
Specification error test). The intuition behind formulating the test is finding the omitted variable
in the fitted values of the dependent variable. If the fitted values of the dependent variables are
found to be related with the residuals obtains from the model and increases the R 2 of the model,
it is logical to include the fitted values of the dependent variable in the model in some form.
Suppose we consider the model:
Yt = α + β Xt + ut (5.1)
the steps involved in the Ramsey RESET test are as follows:

1. estimate the model given by the equation 4.1 and obtain the fitted values of the dependent
variable ^
Yt.
^ ^
2. Regress Yt on Xt , Y2t and Y3t as additional regressors.
^ ^
Yt = β1+ β2 Xt + β3 Y2t +β4Y3t + ut (5.2)
3. Save the R2 obtained from both the models and name them as R2old and R2new, respectively .
4. Perform an F-test using the formula,
BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

Fc = (R2new - R2old ) / number of new regressors


(1 - R2new) / ( n - number of new parameters in the new model )
5. if the computed F statistic ( Fc) is significant, say at 5 percent level of significance, we do not
reject the hypothesis that the model 4.1 is not correctly specified, otherwise reject it.

one advantage of using the RESET test is that it is easy to apply but at the same time it only tells
us that the model is not correctly specified, it does not help in choosing a better alternative.

5.2 Lagrange Multiplier (LM) test for Adding Variables


The LM test is used when the researcher wants to know whether additional variables should be
included in the model or not. Consider the following restricted and unrestricted model for
example:

(R) Yt = β1+ β2 X2t + β3X3t+β4X4t+ ………………………………+ βm Xmt + ut


(UR) Yt = β1+ β2 X2t + β3X3t+β4X4t+ ………………+ βm Xmt + βm+1 Xm+1t +……..+ βk Xkt +
vt
In model UR, there are (k-m) new variables i.e. Xm+1,Xm+2,……,Xk. The steps for LM test can be
described as:
1. Stating the null and alternative hypothesis.
H0: βm+1 =βm+2=βm+3=…………..=βk= 0
H1: Atleast one of them is non zero.
2. Estimate the restricted model R and obtain the estimated residuals from this model as:
^ ^ ^ ^ ^ ^
uR = Yt - ( β1+ β2 X2t + β3X3t+β4X4t+ ………………………………+ βm Xmt )
If UR is the correct model and variables Xm+1,Xm+2,……,Xk should have been included in the model,
then in the estimation of the R model, their effect must have been captured in the residuals. So
the residual must be related to these omitted variables.
^
3. Regress uR against a constant and all the X’s, including the ones in the restricted model, i.e.
against all the independent variables in the unrestricted model. This is called as the auxiliary
regression.
4. Calculate the LM statistic as:
LM = nR2 ∼ �2 k-m
this LM statistic has the chi- square distribution with degrees of freedom equal to number
of restrictions in the null hypothesis.
5. If nR2 >�*2 k-m (a) , we reject the null hypothesis and conclude that the coefficients of all added
variables are zero.

6. Errors of Measurement
The theoretical argument helps us in building an empirical model. In the empirical model, given
the set of dependent and independent variables, the dataset for these variables has to be collected
from the real world. We assume that the dataset available is accurate. However, this may not be
BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

true always. the data may suffer from various kinds of errors including non response errors,
computing errors, reporting errors, etc and hence result in errors of measurement. the
consequences of such errors will be discussed in this section.

6.1 Errors of Measurement in the Dependent Variable Y.


Consider the model
Y*t = α + β Xt + ut (6.1)
in this model let Y* be the variable measured with an error, such that,
Yt = Y*t + et (6.2)
where et denoted the error of measurement in Y*t . Using 6.2, we estimate the following equation,
Yt = (α + β Xt + ut)+ et
Yt= α + β Xt + ( ut+ et)
Yt= α + β Xt + vt (6.3)

where vt = ( ut+ et) is a composite error term, containing the population disturbance term (ut) and
measurement error term (et).
Assuming that E( ut) = E (et) = 0 , cov( Xt, ut) = 0 and for simplicity , cov (Xt, et) = 0 , i,e. the
errors of measurement in Y*t are uncorrelated with Xt and cov (ut, et) = 0 , i,e. the errors of
measurement and population disturbance term is uncorrelated. Given these assumptions, the β
estimated from equation 6.3, will be unbiased estimator of true β. Hence, the errors of
measurement in the dependent variable do not cause any harm to unbiasedness property of the
OLS estimators. However, the variance and the standard errors estimated from this equation will
be very different from those of OLS estimators.
From estimating equation 6.1, ^
Var( β) = σ2u
∑ xt2
From estimating equation 6.3, ^
Var (β) = σ2v
∑ xt2
= σ2u+σ2e
∑ xt2

As can be seen, estimated variance of β is larger in second case than the first one. This is one of
the consequence of errors of measurement in dependent variable.

6.2 Errors of Measurement in the Explanatory Variable X.

Consider the model


Yt = α + β X*t + ut (6.4)
in this model let Y* be the variable measured with an error, such that,
Xt = X*t + wt (6.5)
where wt denoted the error of measurement in X*t . Using 6.5, we estimate the following
equation,
Yt = α + β (Xt -wt)+ ut
Yt= α + β Xt - βwt + ut
Yt= α + β Xt + zt (6.6)

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

where zt = ( ut- βwt) is a composite error term, containing the population disturbance term (ut) and
measurement error term (wt).
Making standard assumptions for the error term, namely, E(wt) = 0 ,E( ut) = 0, cov( Xt, ut) =0,
cov (wt , wt-1) = 0 ,and cov (ut, wt) = 0 , can we also assume independence of composite error
term with reference to Xt ?( Assuming E(zt) = 0)
The answer is No. lets see why?

Cov (zt, Xt)= E [ zt- E(zt)][Xt -E(Xt )]

= E [( ut- βwt)] [X*t + wt-E(X*t + wt)]


= E [ ( ut- βwt)][wt] [because E(wt)=0, E (X*t )
=X*t ]
= E [ utwt ]- E [βwt2 ]
= - βσ2w
≠ 0
These shows that explanatory variable and error term are correlated, which violates the crucial
assumptions of CLRM that the explanatory variable is uncorrelated with the stochastic
disturbance term. if this assumption is violated, it can be shown that OLS estimators will be
biased and inconsistent.

In conclusion, the measurement errors can lead to serious problems when they are present in the
explanatory variable because they lead to inconsistent estimation. However, if the errors are
present in the dependent variable, it does not create a problem, the estimators remain unbiased
and consistent.
For a solution to problem of measurement error in explanatory variable, we can find a proxy
variable which is highly correlated with X’s variable but uncorrelated with both the error terms.

6.3 Incorrect Estimation of the Stochastic Error Term


The error term ut is not directly observable. It is a stochastic variable. Therefore it is not easy to
determine in what form it should enter the model. For example, if the true model is given by the
equation,
Yt = βXt ut (6.7)
and here ut is such that lnut satisfies the CLRM assumptions.
However, the researcher estimates the following model
Yt = α Xt + ut
What will be the consequences of such estimation?
Given that ln ut satisfies OLS assumptions, it must be true that ln ut ∼ N(0, σ2)
Then, ut ∼ log normal [ eσ2/2 ,eσ2 (eσ2-1)]
^
As a result , E (α) = β eσ2/2

where e is the base of natural logarithm.


[proof: ^
E (α) = β E ( ∑Xt2ut )
∑Xt2

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

=β[E ( ∑X1 u1 + X2 u2+ ……..+Xn un )


2 2 2

∑Xt 2

= β eσ2/2 (∑Xt2 ) (because X are non stochastic and ut has expected value
σ2/2
e )
∑Xt2

= β eσ2/2
^
Hence, α is a biased estimator of β.

7. Nested vs Non- Nested Models


Nested models are those where one of the model can be derives form the other, imposing
certain restrictions by tests of hypothesis. If such a derivation is not possible, they are
Non-Nested Models. As an example, consider the next five models:

Model A : Yt = β1 + β2 X2t +β3 X3t +β4 X4t +β5 X5t + ut


Model B : Yt = β1 + β2 X2t +β3 X3t + ut
Model C : Yt = α1 + α2 X2t +α3 X3t + ut
Model D : Yt = β1 + β2 Z2t +β3 Z3t + vt
Model E : Yt = β1 + β2 lnZ2t +β3 lnZ3t + wtI

In these models, Model A and Model B are nested models as model A can be reduced to Model B
if we add variable X4 to model B and β5 is zero. We can use T test to test the hypothesis that the
coefficient of X5is zero.
In this sense, specification error tests are essentially tests of nested hypothesis.
Also, Model C and Model D are Non- Nested because one cannot be derived as a special case
from the other as X and Z are different variables. even if variable X 3 is added to Model D and Z3
is included in Model C , still these will be Non-nested Models because Model C does not contain
Z2 and Model D does not contain X2.
Also, Model E and Model D is non-nested as one cannot be derived from the other.

8. Tests of Non- Nested Hypothesis


In this section, we discuss the tests devised for non-nested models, namely, Non-nested
F-Test and Davidson-MacKinnon J test.

8.1 Non- nested F test

Given Model C and Model D, how can one decide how to choose between these two
models.consider another Model F.
BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

Model F: Yt = λ1 + λ2 X2t + λ3 X3t + λ4 Z2t + λ5 Z3t + ut


Both Model C and D can be derived from Model F. But C is not nested in D or vice -versa.
The Non- nested F test suggests to find out from Model F, whether Model C or Model D is
correct. By applying F-test for testing λ2 = λ3 =0, implying Model D is correct can be tested, or
the other way round for Model C.

Problems in Non-Nested F test:


1. It is possible that X’s and Z’s variable are highly correlated causing problem of multi-
collinearity. As a consequence, one or more of the λ’s will be individually statistically
insignificant. Hence, it won’t be possible to find out which among Model C or D is correct.
2. Suppose we start with Model C as reference hypothesis and find coefficients of both X
variables to be significant. Further, if we add variables Z3 and Z4 or both to the model and
then apply F-test to find out if their addition has made any significant contribution to the
model. Supposingly Not. Hence, we decide to choose Model C as the correct model to explain
Yt.

Next, if we begin with Model D and repeat the same procedure and find out Model D to be the
correct model. Hence the choice of reference hypothesis could determine the outcome of the
choice model, making the entire empirical investigation spurious.

8.2 Davidson-MacKinnon J test.

The steps involved in this test are as follows:


^
1. Estimate model C and obtain the estimated Y values, YC .
2. At the predicted value of Y from model C to Model D as an additional regressor.
^
Yt = β1 + β2 Z2t +β3 Z3t + β4 YC + vt
3. conduct a t- test on the coefficient of the the predicted value of Y form model C.
4. if the hypothesis that β4 = 0 is not rejected, we can accept model D as a correct model and
Model C does not have any additional explanatory power beyond that added by Model D.
5. Now, reverse the two models and repeat steps 1 to 4 and decide whether to accept model C
over model D. Hence, we now estimate

^
Yt = α1 + α2 X2t +α3 X3t + α4 YD + ut
and test for α4 = 0. If this hypothesis is not rejected, we choose model C over Model D.. If this
hypothesis is rejected, model D is chosen.

Problems :
It is possible to arrive at a conclusion where both models are either accepted and both models are
either rejected, in which case it wont be possible to make a decision for the choice of the model.

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

9. Model Selection Criteria


The concern of a research is that for a given set of models, how to identify the correct model for
the purpose of empirical research. This empirical research is intended for some forecasting
purpose. Hence, it is important to find a correct model so that forecast made on the basis of
selected model is absolutely correct. there can be two types of forecasting in the mind of
researcher- in sample forecast or out of sample forecast. While on one hand, in sample forecast
tells how the chosen model fits the data in a given sample, the out of sample forecasting,
determines how a fitted model forecasts future values of regress and, given values of regressors,
on the other hand.
Developing any set of criteria to choose a model aims at minimising the error in
forecasting. It is for this purpose, most selection criteria aims to minimise the residual sum of
squares, along with imposing penalty for including a large number of regressors to improve the
efficiency of the model. Thus, there is a trade off between goodness of fit ( being improved by
reducing error sum of squares) and additional regressors added ( because of imposed penalty). the
different model selection criteria are:
1. R2 criterion
2. Adjusted R2
3. Akaike’s Information Criterion(AIC)
4. Schwarz’s Information Criterion (SIC)

Lets discuss some more detail about these criteria.

9.1 R2 criterion

As is known, R2 measure the goodness of fit of a regression model. It is calculated as:


R2 = ESS = 1 - RSS
TSS TSS
where ESS denotes the explained sum of squares and TSS denotes the total sum of squares and
RSS is the residual sum of squares.
the value of R2 lies between 0 and 1. It can never be negative. It shows how well the data fit a
particular theory and therefore explains the goodness of the model. the higher the value of R2 i.e.
the closer it is to 1, the better the fit and better the model.
Problems
1. It is a good measure for in sample forecasting. R2 tells us how well the estimated Y value fits
the actual sample data. But, it is no such help in Out of sample forecasting.
2. A restriction for comparison of two models using R2 is that the dependent variable should be
the same. Hence, limiting the use of this criteria to models with same regressand only.
3. Adding more variables to the model may increase R2 . So, it might be tempting to add more
variables to improve goodness of fit a model, as there is no penalty of adding more variables.
But, more regressors may also increase the variance of forecast error.

9.2 Adjusted R2 criterion

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

Although R2 criterion does not imposes any penalty for adding more variables, the adjusted R2
does so. It is defined as:
_
R2 = 1 - RSS /(n-k)
TSS/ (n-1)
= 1 - (1-R2) (n-k)
(n-1)
_
from the definition itself, R2 ≤ R2 and hence shows adjusted R2 penalises for adding more
variables into the model. The adjusted R2 will increase only if the absolute t value of added
regressors is greater than 1.
_
Hence, R2 is a better measure than R2. But like R2 , the requirement for dependent variable to be
the same to enable comparison between models.

9.3 Akaike’s Information Criterion(AIC)


The idea of imposing penalty for adding regressors to the model is further established by AIC
criterion, which is defined as:
AIC = e2k/n∑ u2t = e2k/nRSS
n n

where k is the number of regressors and n is the number of observations.


For mathematical convenience, AIC critrion can be rewritten as,
ln AIC = ( 2k ) + ln (RSS)
n n
where ln AIC = natural log of AIC and 2k/n = penalty factor.
Advantages:
1. It imposes harsher penalty than adjusted R2 for adding more regressors.
2. There is no restriction of same dependent variable in comparison of models when AIC
criterion is used. In comparison of models, the model with lower value of AIC is preferred.
3. AIC can be used for both in sample or out of sample forecasting performance of regression
model.
4. it is also useful for both nested and non-nested models.

9.4 Schwarz’s Information Criterion (SIC)

The sic criterion is defined as:

SIC = nk/n∑ u2t = n k/n RSS


n n

or in log form:
ln SIC = ( k ) ln n + ln (RSS )
n n

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

where [(k/n) ln n] is the penalty factor which is even harsher than AIC. The lower the value of
SIC, the better the model.
SIC also incorporates the same advantages as AIC along with imposing a harsher penalty.

10. Summary

• Classical linear regression model assumes the model is correctly specified. This assumption
may not be true always and there may exists errors in the model. These may be specification
errors or mis-specification errors.
• If the specification errors exist, they may lead to serious consequences in the model. Different
types of specifications errors include omitting a variable, including an irrelevant variable,
specification error in the error structure of the model.
• It is important to know the consequences of these specification errors.
• Having understood the consequences, the model needs to be detected for these errors if present
through the help of various tests, namely, Ramsey’s RESET test and LM test.
• If errors are found in the model, approaches should be used to solve these errors to arrive at the
correct model for estimation.
• Evaluation of the model should be done on model selection criteria like R2, adjusted R2, AIC
AND SIC.

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :12, SPECIFICATION ERRORS AND DIAGNOSTIC TESTING
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8, Fundamentals of Econometrics

Module No and Title 13, Multicollinearity

Module Tag BSE_P8_M13

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcomes

2. Introduction

3. What is Multicollinearity

3.1 Exact Multicollinearity


3.2 Near Multicollinearity
3.3 Multicollinearity - Violation of the Assumption of the Classical
Regression Model

4. Consequences of Multicollinearity

4.1 Consequence of Exact Multicollinearity


4.2 Consequence of Near Multicollinearity

5. Detection of Multicollinearity

5.1 High value of covariance between estimated regression coefficients


5.2 High value of R2 and few significant values of t- statistics
5.3 High values of correlation coefficients
5.4 Auxiliary Regression
5.5 Variance Inflation Factor

6. Remedies of Multicollinearity

6.1 Ignore
6.2 Drop Variables
6.3 Obtain Extraneous Data
6.4 Increase Sample Size
6.5 Transform the variables
6.6 Principal Component Analysis

7. Summary

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module you shall be able to
 Understand what happens to regression analysis when multicollinearity is present in the
regression model.
 Understand what multicollinearity is and what its consequences on regression estimates
are.
 Learn about tests for detecting the presence of multicollinearity.
 Learn about remedies for carrying out regression analysis in the presence of
multicollinearity.

2. Introduction
Suppose, a researcher is trying to explain the consumption expenditure (y) in a group of
individuals and uses both the current income (x1) and accumulated wealth (x2) of these
individuals as explanatory variables, it is possible that while running a regression analysis, the
regression coefficient for income may turn out to be negative! Or, t−tests for the regression
coefficients of income and wealth might suggest that none of the two predictors are significantly
associated with consumption expenditure, while the F−test indicates that the model is useful for
predicting consumption expenditure.

Or for instance, in a model that is built for analyzing wage differentials between male and female
workers, if dummy variables for both males and females are included in the regression analysis,
the computer software may not be able to estimate the regression coefficients.
These types of problems arise due to presence of multicollinearity.

3. What is Multicollinearity
Multicollinearity is a statistical phenomenon and is said to be present when an independent
variable is a linear combination of some or all of the other independent variables in the model. An
example of multicollinear variables could be GDP, money supply and prices. Another example
could be body mass index and heights of children in a given age group.

3.1 Exact Multicollinearity

If two or more explanatory variables have an exact linear relationship between them, then exact
or perfect multicollinearity is said to be present. For a k-variable regression model involving X1,
X2,….,Xk regressors (where, X1 =1 to allow for intercept term), an exact linear relationship is said
to exist if the following condition is satisfied:

a1 X1 + a2 X2 +….,+ akXk = 0. (1)

The example above, pertaining to dummy variable suffers from exact multicollinearity. If M i is
the dummy variable for males such that Mi = 1, when the worker is male and Mi = 0, when the
worker is female. Similarly, if Fi is the dummy variable for females such that Fi = 1, when the
BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

worker is female and Fi = 0, when the worker is male. Obviously, Mi + Fi =1. So estimating a
regression model that contains both the dummy variables, Mi and Fi as explanatory variables will
run into the problem of exact multicollinearity.

3.2 Near Multicollinearity

When the linear relationships among explanatory variables are approximate, then near
multicollinearity is said to be present in the model. This can be represented as:
a1 X1 + a2 X2 +….,+ akXk + vi = 0 (2)

where, vi is a stochastic error term.

The example on consumption function model suffers from near multicollinearity where current
income and wealth measure similar attributes about individuals’ ability to undertake consumption
expenditure.

3.3 Multicollinearity - Violation of the Assumption of the Classical Regression


Model

One of the assumptions of a classical linear regression model is that there are no exact linear
relationships between two or more explanatory variables and that the number of observations is at
least as many as the number of independent variables. Presence of multicollinearity leads to a
violation of this assumption and has consequences related to uniqueness, statistical significance
and precision of the OLS estimators. The presence of perfect multicollinearity makes it
impossible to compute the OLS estimates of the regression model.

The presence of multicollinearity is a problem in regression analysis because any particular


regression coefficient measures the partial effect of that particular variable on the dependent
variable, holding all other independent variables at a given level. However, when two
independent variables vary together, then it is not possible to isolate the partial effect of a single
independent variable on the dependent variable ceteris paribus.

Note that multicollinearity does not depend on any theoretical linear relationship between the
explanatory variables. It depends on the presence of linear relationships between the regressors in
the data set. So, multicollinearity is a sample phenomenon.

There may be several reasons for the presence of multicollinearity in the data set, for example, the
explanatory variables may all have a common underlying time trend, one of the regressors may be
a lagged value of another regressor, or the data may not have been collected from a wide base so
that the independent variables may tend to vary together.

4. Consequences of Multicollinearity

4.1 Consequence of Exact Multicollinearity


BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

When two or more regressors in a regression model are exactly linearly related, then the
regression coefficients cannot be estimated.

Suppose we have the following hypothetical data:

2
(∑ 𝑦𝑖𝑥2𝑖 )(∑ 𝑥3𝑖 )− (∑ 𝑦𝑖𝑥3𝑖 )( ∑ 𝑥2𝑖 𝑥3𝑖 )
Using 𝛽̂2 = 2 2
(∑ 𝑥2𝑖 )(∑ 𝑥3𝑖 )−(∑ 𝑥2𝑖 𝑥3𝑖 )2

Where, yi ,x2i and x3i are the deviations of Yi, X2i, X3i from their mean values respectively.
Since, X3i = 2 X2i, we can easily see that the value of ̂ 2 = 0⁄0 , which is indeterminate.

4.2 Consequence of Near Multicollinearity

When multicollinearity is not perfect, the model can be estimated with some consequences.

4.2.1 Effect on the properties of estimators


 The OLS estimators are BLUE even in the presence of multicollinearity, i.e. they are still
unbiased, consistent and efficient.
 The OLS estimators and their standard errors can be sensitive to small changes in data,
making them less precise.
 At times, the signs of the regression coefficients may be counter intuitive due to presence of
multicollinearity
 The t ratios of one or more coefficients may be statistically insignificant but R 2 of the model
still could be very high.

4.2.2 Effect on Standard Errors of Regression coefficients

The numerical values of the standard error of the regression coefficients get amplified, making
the values of t ratios of one or more coefficients small and thus possibly making them statistically
insignificant. Though the hypotheses tests are still valid.
In a regression model Yi = 1 + 2X2i + 3X3i + ui (3)

The standard error of the regression coefficients ̂ 2 , can be given as in equation 4 below:
BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

ˆ 2
var(ˆ2 ) 
 x22i (1  r23 )
2

(4)

Where, =
x2 i  X 2 i  X 2

r232 is the coefficient of correlation between X


2 and X3
ei is the estimator of ui
n is the number of observations
k is the number of regressors excluding the intercept, which takes a value 2 in this case

r232 r232
When X2 and X3 are uncorrelated, = 0 but when multicollinearity is present becomes high
and as a result the variance of the regression parameter as given in equation (4) becomes very
high.

Consequently,
 The t ratios of the coefficient becomes small and confidence interval becomes large,
tending to make the coefficient statistically insignificant and less precise.

 Thus the regression coefficients and their standard errors can be very sensitive to small
changes in data in the presence of multicollinearity.

 Since the OLS estimates are unbiased and consistent, the forecasts based on them are
unbiased and confidence intervals are also valid.

4.2.3 Effect on Covariance between Regression Coefficients

The covariance between the regression coefficients will be very high in the presence of
multicollinearity.
In the regression model given by equation (3), covariance between 𝛽̂2 and 𝛽̂3 is given by

𝑟23 𝜎 2
Cov (𝛽̂ 2 , 𝛽̂ 3) = (5)
2 )√∑ 𝑥 2 ∑ 𝑥 2
(1−𝑟23 2𝑖 3𝑖

Where, x2i and x3i are defined as before.


r23 is the sample correlation coefficient between X2 and X3
2 is the variance of the error term.
When multicollinearity is present, r23 takes on a value closer to 1. As a result, the numerical value
of the covariance between the estimated regression coefficients becomes very large.

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

Consequently, the interpretation of regression coefficients becomes quite difficult. 𝛽̂2, the
regression coefficient of X2 is supposed to measure the change in the dependent variable Y, due
to a change in X2, controlling for all other variables. However, since X2 and X3 are correlated, any
change in X2 is also likely bring a change in X3 and so 𝛽̂2 will not only capture the change in Y
due to X2 but it will also contain some effect of change in X 3. This implies that if one estimated
parameter 𝛽̂2 overestimates the true parameter 2 then the other estimated parameter 𝛽̂3 is likely
to underestimate the true parameter 3 and vice versa.

As a result,
• It becomes difficult to interpret the partial effect of individual regressors on the
dependent variable.
• This at times, makes model selection difficult.

4.2.5 Effect on Hypothesis Testing

The tests of hypotheses are still valid but since the t ratios of the regression coefficients are small,
the confidence intervals tend to be much wider, making the test less powerful. There is a greater
chance of the regression coefficient being statistically insignificant.
Although the F statistic may still be significant.

5. Detection of Multicollinearity
Multicollinearity is a sample phenomenon and not a problem with the model, so there are some
informal tests rather than formal tests for detecting multicollinearity.

5.1 High value of covariance between estimated regression coefficients

If a pair of regressors Xk and Xj are correlated with each other, the covariance between the
estimated regression coefficients as given in equation (5) is going to be high.

5.2 High value of R2 and few significant values of t- statistics

When we run the OLS regression, if multicollinearity is present, the results would throw up
individual regression coefficients as statistically insignificant although the value of R2 would be
very high and the Wald F-statistic for joint significance of regression coefficients would also be
highly significant.

5.3 High values of correlation coefficients

In the presence of multicollinearity the pair-wise correlation coefficients between the regressors is
high. Though while carrying out empirical exercises it should be kept in mind that in models
involving only two independent variables, value of pair-wise correlation coefficients are
sufficient to test for multicollinearity, but in models involving more than two regressors this is not
a very satisfactory test of multicollinearity because the explanatory variables can be redefined in a
number of ways and we may get drastically different measures of correlations.

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

5.4 Auxiliary Regression

If multicollinearity is suspected, then in order to find out which of the regressors are linearly
related, one may run auxiliary regressions and perform F tests.

Steps in auxiliary regression


• Step 1 Regress each independent variable on the rest of the independent variables
• Step 2 Compute the corresponding multiple coefficient of determination, called
• Step 3 Perform the following F test to check for collinearity between variables:
2
𝑅𝑥1,𝑥2,..𝑥𝑘⁄
𝑘−2
F= 2
1−𝑅𝑥1,𝑥2,..𝑥𝑘⁄
𝑛−𝑘+1

is the coefficient of determination in the regression of variable X k on the


remaining X variables.

The degrees of freedom of the F statistic are k-2 and n-k+1, where n is the number of
observations and k is the number of explanatory of variables including the intercept.
• Step 4 If the computed value of F exceeds the critical value of F at the selected level of
significance, then it indicates that Xk is collinear with other X’s

Klein’s Rule of Thumb

Instead of carrying out the above mentioned F test one can use Klein’s Rule of Thumb, according
to which, if the from the auxiliary regression is greater than the overall R2, then
multicollinearity is a serious problem.

5.5 Variance Inflation Factor

Variance Inflation Factor (VIF), defined below can also be used as an indicator of the degree of
multicollinearity

VIF =
Where, is the coefficient of determination in the regression model. The larger the value of
VIF, the greater the degree of multicollinearity. As a Rule of thumb, if VIF of a variable exceeds
10, it is said to be highly collinear with the other variables.

Tolerance

The inverse of VIF is defined as tolerance (TOL)


1
TOL = 𝑉𝐼𝐹
2
When 𝑟23 = 1, i.e. there is perfect multicollinearity, TOL = 0 and when there is no collinearity,
TOL takes a value equal to 1.

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

6. Remedies for Multicollinearity


It is a matter of judgment by the researcher whether to treat the presence of multicollinearity as a
serious problem or as a benign problem. The problem of multicollinearity can be tackled in the
following ways:

6.1 Ignore

1. If an analyst is less interested in interpreting individual coefficients and wants to forecast the
value of the dependent variable then the problem of multicollinearity may be ignored.
2. It is also worthwhile to ignore the problem if the regression coefficients are significant and
have meaningful signs and magnitudes.
3. Multicollinearity may also be ignored if R2 from the regression exceeds any of the .

6.2 Drop Variables

Multicollinearity may be eliminated by dropping one of the collinearly related variables from the
model.
But it should be kept in mind that this involves some risk of underspecifying the model, if the
variable truly belongs to the model.

6.3 Obtain Extraneous Data

Information from other sources about the variables or values of some regression coefficients from
some related studies may be incorporated in the model.

6.4 Increase Sample Size

Since multicollinearity is essentially a sample phenomenon, using additional data can often solve
the problem.

6.5 Transform the variables

The variables may be transformed. If the data is a time series data, then one may reformulate the
model in terms of the first difference forms. So instead of running the regression model stated in
equation 9.3 The regression that may be estimated is
Yt-Yt-1 = 2(X2t - X2t-1) + 3 (X3t - X3t-1) + vt
Where vt = ut – ut-1

6.6 Principal Component Analysis

PCA is a method by which a number of collinear variables can be transformed into a set of
uncorrelated or orthogonal variables, called principal components (PCs). These PCs are artificial
variables that are constructed in a manner such that they account for most of the variance that is

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

caused in the observed set of correlated variables due to the unobserved common factors between
them.

In fact this common factor is what is called a principal component and there are as many principal
components as the number of common factors.

Once the PCs have been extracted, the regression may be run on these orthogonal variables in
place of the original collinear variables.

The characteristic of these PCs is that the first component that is extracted accounts for a maximal
amount of total variance in the observed variables. Similarly, the remaining components that are
extracted in the analysis account for a maximal amount of variance in the observed variables that
was not accounted for by the preceding components, and is uncorrelated with all of the preceding
components.

It may be pointed out that in order to extract the PCs, the observed variables are standardized to
have a mean of zero and a variance of one and therefore the total variance in the data set which is
simply the sum of the variances of these observed variables, will always be equal to the number
of observed variables. This total variance is partitioned among the principal components that are
extracted.

However, one drawback of the PCA analysis is that generally it is not possible to interpret these
principal components in a meaningful way. So in order to interpret the regression coefficients in a
meaningful way, the regression coefficients of PCs are transformed back into the regression
coefficients of original variables after running the regression.

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

7. Summary

 Multicollinearity is a statistical phenomenon and is present when an independent variable is a


linear combination of some or all of the other independent variables in the model.
 If two or more explanatory variables have an exact linear relationship between them, then
exact or perfect multicollinearity is said to be present.
 When the linear relationships among explanatory variables are approximate, then near
multicollinearity is said to be present in the model.
 The presence of multicollinearity is a problem in regression analysis because any particular
regression coefficient measures the partial effect of that particular variable on the dependent
variable, holding all other independent variables at a given level. However, when two
independent variables vary together, then it is not possible to isolate the partial effect of a
single independent variable on the dependent variable ceteris paribus.
 When two or more regressors in a regression model are exactly linearly related, then the
regression coefficients cannot be estimated.
 In case of near multicollinearity, the numerical values of the standard error of the regression
coefficients get amplified, making the values of t ratios of one or more coefficients small and
thus possibly making them statistically insignificant.
 Multicollinearity is a sample phenomenon and not a problem with the model, so there are
some informal tests rather than formal tests for detecting multicollinearity.
 The problem of multicollinearity can be tackled in the following ways such as ignore, drop
variable, Obtain Extraneous Data, Increase Sample Size, Transform the variables and
Principal Component Analysis

BUSINESS PAPER NO. :8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. :13, MULTICOLLINEARITY
____________________________________________________________________________________________________

Subject Business Economics

Paper 8, Fundamentals of econometrics

Module No and Title 14, Heteroscedasticity- Nature and Consequences

Module Tag BSE_P8_M14

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcomes

2. Introduction

3. Why does heteroscedasticity arise?

3.1 Heteroscedasticity as an anticipated part of the model

3.2 Heteroscedasticity due to data issues

3.3 Heteroscedasticity due to specification errors

4. Ordinary least squares in the presence of heteroscedasticity

4.1 Consequences of ignoring heteroscedasticity

4.1.1 Effect on finite sample properties of OLS estimators

4.1.2 Effects on asymptotic [ large sample] properties

4.1.3 Effects on forecasts

4.2 Proof of properties of OLS estimators in the presence of heteroscedasticity

4.2.1 Proof of unbiasedness Of OLS estimator of regression coefficient

4.2.2 Proof of consistency of OLS estimator of regression coefficient

̂
4.2.3 Proof to show biasedness and inconsistency of variance of �

5. Summary

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module, you shall

 Know and understand the meaning of heteroscedasticity


 Learn the reasons of occurrence of heteroscedasticity in regression models
 Analyze and interpretconsequences of using ordinary least squares estimation in the
presence of heteroscedasticity

2. Introduction
A Beginner’s perspective

Suppose a student becomes interested in gender inequality and its negative effects for
development of capabilities of women, their freedom of choice and human development, in
general. She collects data on the so called, Gender Inequality index [GII] from UNDP’s Human
Development reports for 152 countries for the year 2013. GII is a composite measure reflecting
inequality in achievement between women and men in three dimensions: reproductive health,
empowerment and economic status. It can take values between 0 and 1, lower numbers reflecting
greater gender equality.

From her data, the student finds that the mean GII is 0.3754, median 0.3850 and standard
deviation 0.1899. The student interprets the distribution of GII to be fairly symmetric since mean
is almost equal to median; with wide variation since standard deviation is almost half as large as
the mean. She notices that countries in her sample are very heterogeneous with stark variations in
socio-economic dimensions.

If this student wishes to develop an econometric model in order to determine the factors
influencing GII, she is most likely to encounter the issue of heteroscedasticity- one of the most
common violations of the classical linear regression model in cross sectional data consisting of
heterogeneous units in the sample.

An Econometrician’s Perspective- What is heteroscedasticity?

To understand the meaning of heteroscedasticity, let us recall the homoscedasticity assumption of


the classical linear regression model. Homoscedasticity refers to a phenomenon where the error
terms in the population regression function are identically and independently distributed with
[mean zero and] equal variance σ .

For a two variable population regression function,

= � + � + = , ,…. . [ ]

Symbolically, homoscedasticity implies that

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

| = | = �
= , , ,…. [ ]

Equation [2] reflects the dispersion of the error terms around their mean zero. It is also a measure
of dispersion of the data values of the dependent variable around the linear regression line � +
� . The constant value σ implies that this dispersion is same for all observations on X.

The violation of this assumption is known as heteroscedasticity. Heteroscedasticity is a


phenomenon where the stochastic random disturbances exhibit a non-constant conditional
variance.Symbolically, heteroscedasticity implies that

| = | = �( )= � = , , … . .[3]

Heteroscedasticity is a result of a data generating process that draws disturbances, for each value
of the independent variable, from distributions that have different variances. It also implies that
dispersion of the dependent variable around the regression line is not constant.

Heteroscedasticity usually arises in cross sectional data where the scale of the dependent variable
tends to vary across observations, and in highly volatile time series data. It is less common in
other time series data where values of explanatory and dependent variables are of similar order of
magnitude at all points of time.

A standard manifestation of heteroscedasticity is that the spread of actual Y values around the
linear regression line will not be constant. Also, the plot of regression residuals against, say, the
predicted Y values will exhibit some characteristic pattern instead of being random. Let us
illustrate with the following example.

Example 1- Annual salaries of professors and years since they completed their Ph.D.

Figure 1 shows the scatter plot of annual salaries against years since Ph.D. of 222 professors from
7 U.S. universities [UC Berkley, UCLA, UC San Diego, Illinois, Michigan, Stanford and
Virginia] for the year 1995 [Ramanathan, 2002]. It can be seen that the spread of Y values around
the average straight line relation is not uniform- the variance initially increases, then decreases.
As Ramanathan points out, the salaries of recent Ph.D.’s are very competitive in the job market
and hence salary differentials are not expected to be high. Salary of tenured professors might vary
depending on their productivity and reputation. After a number of years, salary increases tend to
stabilize and hence the variance is likely to reduce. Scatter plots like these, point towards an
underlying heteroscedastic regression model.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

l_salary versus years (with least squares fit)


5.2
Y = 3.97 + 0.0197X

4.8

4.6

4.4

4.2

3.8

3.6
0 5 10 15 20 25 30 35 40 45
Years since Ph.D.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

Figure 1- Scatter plot of annual salaries

A log quadratic regression [in which log of annual salaries is regressed against a constant, years
since Ph.D. and squares of years since Ph.D.] for 222 observations yield squared residuals
presented in Figure 2. We can see that the squared residuals are first increasing then decreasing- a
pattern consistent with the behavior of annual salaries depicted in Figure 1. This residual plot is
characteristic of underlying heteroscedastic population disturbances.

The plan of the module is as follows- In the next section, we present a discussion on some of the
common reasons why heteroscedasticity is likely to be present in regression models.Section 4
explains in detail the structure of heteroscedastic variances commonly used in the literature.
Section 5 highlights the serious consequences of using ordinary least squares for estimation and
statistical inference in heteroscedastic regression models. The module summary is presented in
the last section.

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40 45
Years since Ph.D.

Figure 2: Squared residuals from log quadratic regression of annual salaries

3. Why does heteroscedasticity arise?


There are many reasons why error variances in regression models may be heteroscedastic.Let us
take a look at these causes in the following discussion.
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

3.1 Heteroscedasticity as an anticipated part of the model

If the observations in a cross section are related to heterogeneous units of different sizes/scales,
the assumption of a common disturbance variance for all observations is often violated. Consider
the following examples:

i. Suppose that the profit of a firm at a given time depends on research and development
expenditures. Since large firms engage in higher research and development expenditures
which are associated with greater risk, one can expect variation of profits around the
mean for large firms to be greater than the corresponding variance for smaller firms.
Therefore, the variances of the error terms in such regression models (even after
accounting for variation in firm sizes) are expected to be greater for large firms than for
small firms.
ii. Similarly, consider regression models for household expenditures. Households vary in
size as measured by the number of family members and the level of income. Since
households with large income have more discretion and choice with respect to
consumption, households with higher income will have greater dispersion around their
mean incomes as compared to lower income households. The residuals in the regression
will then be systematically related to income.
iii. Some cross section studies are based on replication of the dependent variable for given
values of the explanatory variable. Suppose we are interested in studying the effect of
fertilizer use on crop yields. For each dosage of fertilizer, a group of n plots is chosen.
As the dosage of fertilizer increases, the variance of the error terms is likely to increase,
although the error variance within a group may be constant.
All the above examples suggest that although we are running a single regression through different
heterogeneous units in the cross section of the form-

= � + � +� = , ,…

The actual regression is


= � + � +� = , ,…

-a regression in which slope coefficient � varies across observations. [In the household
expenditures example, it means that the effect of a given change in income on household
expenditures will be different for low income and high income families].To illustrate the concept
of heteroscedasticity, let us assume that the slope coefficients vary randomly across some fixed
value� ∗. Then we can say,

� = � ∗ + εi

, = � + � ∗ + εi +�

, = � + �∗ + εi + � )

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

, = � + �∗ + , where = εi + � [4]

Equation [4] shows that the error term will vary with and will have unequal variance.

3.2 Heteroscedasticity due to data issues

i. Many a times, we have monthly or quarterly data on an economic variable but we need to
work on yearly data. Finding averages is a common method of aggregation.
Heteroscedasticity can be generated as a consequence of data aggregation.
ii. Choice of a wrong scale: Sometimes heteroscedasticity arises because variables are
measured on a wrong scale. Measuring the variables in logs, as percentages or as ratios
reduces the scales in which the variables are measured. A homoscedastic disturbance
term in a logarithmic regression, responsible for proportional changes in the dependent
variable, may appear to be heteroscedastic in a linear regression because the absolute
changes in the dependent variable will be proportional to its size.
iii. Presence of outliers: Outliers or extreme observations in the sample make the
homoscedastic assumption difficult to maintain.
iv. Explanatory variables with a large range or skewness in their distribution: If the data on
an explanatory variable has a large range of values, it may lead to large variance of error
terms for larger values of the explanatory variables. Similarly, skewed distributions tend
to generate heteroscedasticity.

3.3 Heteroscedasticity due to specification errors

i. Omitted variables: If a researcher omits a relevant variable from the regression model, the
effect of this variable will be captured by the error term. It will appear that there is a
systematic relation between the disturbances and some exogenous variable and the
variance of the error terms will not be constant across observations.
ii. Non Linearities: The same problem occurs if a researcher estimates a linear relation
instead of a quadratic or any nonlinear model and the error term of the linear model
captures the effect of non linearities.
iii. Wrong functional form: If a linear function is fitted to an underlying quadratic or
logarithmic true model, or in general, a wrong functional form is chosen, then the
disturbances appear heteroscedastic.

4 .Ordinary least squares in the presence of heteroscedasticity


If the error terms in a k variable regression model are heteroscedastic, and ordinary least squares
is still used as the estimation procedure, then there are serious consequences for estimation,
interpretation and statistical inference.

4.1Consequences of ignoring heteroscedasticity

4.1.1 Effect on finite sample properties of OLS estimators

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

i. OLS estimators of the regression coefficients �̂ , = , , … . . , are still unbiased


in the presence of heteroscedasticity, i.e.

�(�̂ ) = � , = , ,…..,
ii. The ordinary least squares estimates will be inefficient i.e. they will no longer have
the minimum variance in a class of unbiased estimators and hence are not BLUE,
i.e. (�̂ ) ≥ �̃
where�̂ is the OLS estimator that ignores heteroscedasticity and �̃ is another
linear unbiased estimator that explicitly takes heteroscedasticity into account.
iii. The conventional estimator of the variance of the error term σ2is biased, i.e.
∑̂
� �̂ ≠ � ℎ �̂ =

iv. The conventional formula for the OLS estimators of the variance of regression
coefficients is wrong.
v. The OLS estimator of the variancesand covariances of the regression coefficients
are biased.
vi. The conventionally constructed confidence intervals are no longer be valid.
vii. The t and F statistics based on the OLS regression do not follow the t and F
distribution respectively and hence standard hypotheses tests are invalid

4.1.2 Effects on asymptotic [ large sample] properties

i. Least squares estimators of model parameters are consistent, i.e.


(�̂� ) = � = , ,….. .
ii. The ordinary least squares estimators are inefficient, and not BLUE, even
asymptotically.
iii. The conventional estimator of the variance of the error term σ2 is inconsistent, i.e.
�̂ ≠σ2
iv. The OLS estimator of the variances and covariances of the regression coefficients
are biased, hence hypothesis tests are invalid even in large samples.

4.1.3Effects on forecasts
Forecasts based on the ordinary least square estimates will be unbiased and
consistent but inefficient.

4.2 Proof of properties of OLS estimators in the presence of heteroscedasticity

In this section, we will prove some of the properties of OLS estimators discussed in the previous
section. The proof of others is beyond the scope of the module.

Consider the following simple, heteroscedastic regression model-

= � + = , ,…… [ ]
BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

| = � =� = , ,……. [ ]

The model is taken without an intercept; and the dependent and independent variables are
measured as deviations from their respective means. It is assumed that the variance of the error
term is proportional to the square of a variable W whose values are known. It is also assumed that
the model satisfies all other assumptions of the classical linear regression model.
[Ramanathan(2002)]

Note that if W takes a value 1 for all observations, we will have a case of homoscedasticity. The
OLS procedure, indeed assumes that = 1 for all observations and yields the estimates-

∑ ∑̂ ̂�̂ = �̂
�̂ = , �̂ = , [ ]
∑ − ∑

5.2.1 Proof of unbiasedness Of OLS estimator of regression coefficients

Substituting for from equation [11] into the formula for�̂, we get

∑ ∑ � + ∑
�̂ = = �̂ = = �+ [ ]
∑ ∑ ∑

Since ′ are given and the error terms are assumed to have a zero conditional mean, we can
show that �(�̂ ) = �. Hence OLS estimators are still unbiased. This proof requires only two of
the assumptions of the classical linear regression model, namely, the error terms have zero
conditional mean and they are uncorrelated with the explanatory variables. Therefore, even if
heteroscedasticity is ignored, OLS estimators are still unbiased.

5.2.2 Proof of consistencyof OLS estimator of regression coefficient

Since � = , by the law of large numbers, we can say that the probability limit of �̂ will be
equal to its expected value as n tends to infinity, i.e.

(�̂ ) = �(�̂ ) = �, �̂ �

Hence, OLS estimator is consistent. Note again that this proof did not require any assumption on
the variance of the error term, so the presence of heteroscedasticity will not alter the consistency
property of OLS estimators.

̂
5.2.3Proof to show biasedness and inconsistency of variance of �

From equation[14],we can write the variance of �̂ as


(�̂ ) = ∑ = ∑ � = ∑ [ ]
∑ ∑ ∑

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

This formula takes heteroscedasticity explicitly into account.


The OLS procedure that ignores heteroscedasticity gives an estimate of (�̂ ) as ̂�̂ =
̂2

∑ ��2

̂̂ ̂2

Now, � (� ) = � (∑ 2 ) ≠
��
(�̂ )

The OLS estimator of variance of regression coefficient is thus biased and inconsistent.

5. Summary
In this module, we have discussed the nature of heteroscedasticity and its consequences for
ordinary least squares estimation.A brief summary is presented below-

 Heteroscedasticity is a phenomenon where the stochastic random disturbances exhibit a


non-constant conditional variance.
 Heteroscedasticity may be a result of specification errors or data issues. However, it is
anticipated in most regression models with cross sectional data, especially in cases where
the scale of the dependent variable varies across observations.It is also observed in highly
volatile time series data. Regression models involving time series data, and dealing with
values of explanatory and dependent variables that are of similar order of magnitude at all
points of time, are less likely to exhibit heteroscedasticity.
 In the presence of heteroscedasticity, ordinary least squares estimators and forecasts
based on them are still unbiased and consistent but they are no longer BLUE. The
estimated variances and covariances of the estimators are biased and inconsistent;
hypothesis testing procedures and statistical inference are not valid anymore.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 14, HETEROSCEDASTICITY-NATURE &
CONSEQUENCES
____________________________________________________________________________________________________

Subject Business Economics

Paper 8, Fundamentals of Econometrics

Module No and Title 15, Heteroscedasticity- Detection

Module Tag BSE_P8_M15

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcomes

2. Introduction

3. Different diagnostic tools to identify the problem of heteroscedasticity

4. Informal methods to identify the problem of heteroscedasticity

4.1 Checking Nature of the problem

4.2 Graphical inspection of residuals

5. Formal methods to identify the problem of heteroscedasticity

5.1 Park Test

5.2 Glejser test

5.3 White's test

5.4 Spearman's rank correlation test

5.5 Goldfeld-Quandt test

5.6 Breusch- Pagan test

6. Summary

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

1.Learning Outcomes

After studying this module, you shall be able to understand

 Different diagnostic tools to detect the problem of heteroscedasticity


 Informal methods to identify the problem of heteroscedasticity
 Formal methods to identify the problem of heteroscedasticity

2. Introduction
So far in the previous module we have seen that heteroscedasticity is a violation of one of the
assumptions of the classical linear regression model. It occurs when the disturbance term ui of all
the observations have non-constant conditional variance. It may be a result of specification errors
or data issues. For instance, regression models with cross sectional data, especially in cases where
the scale of the dependent variable varies across observations, heteroscedasticity is more likely to
occur. It is also observed in highly volatile time series data. In the presence of heteroscedasticity,
ordinary least squares estimators and forecasts based on them are still unbiased and consistent but
they are no longer BLUE. The estimated variances and covariances of the estimators are biased
and inconsistent; hypothesis testing procedures and statistical inference are not valid anymore.
Therefore, if we continue to use the OLS method to estimate parameters and to test hypothesis for
a data suffering from heteroscedasticity, then we are likely to get misleading conclusions. This
makes necessary to find some diagnostic tools to check for the presence of heteroscedasticity
problem.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

3. Diagnostic tools to identify the Problem of Heteroscedasticity

Informal Formal
Methods Methods

Checking Nature
of the problem Park test

Graphical
inspection of Glejser test
residuals

Spearman's rank
correlation test

Goldfeld-
Quandt test

Breusch- Pagan
test

White's test

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

4. Informal Methods to Identify the Problem of Heteroscedasticity

4.1 Checking Nature of the Problem

Nature of the problem is one of the simplest methods to detect the presence of heteroscedasticity.
For instance, if we take a cross sectional data on household’s consumption patterns and income
level in a locality then we find that residual variance changes for every observation. This is
because cross sectional data pools small income, medium income and large income households
together for the study. Thus, the possibility of heteroscedasticity is higher in case of cross
sectional data.

4.2 Graphical inspection of residuals

Before going to check heteroscedasticity by formal methods, a graphical examination of


residuals, found by regressing dependent variable on explanatory variables, can be very helpful to
have an idea of the presence of heteroscedasticity. This we can do by creating a residualplot in
which we take squared residuals on the y axis andplot it against either on (one or more)
explanatory variables or on itself.

Figure 1shows different patterns of squared residuals, ̂ ,plotted against explanatory variable, X.
fig (a) shows there is no systematic pattern between squared residuals and explanatory variable.
Thus, there is no heteroscedasticity problem. However, fig (b) to (e) shows a systematic pattern
between squared residuals and explanatory variable. For instance, fig (b) shows a linear
relationship between squared residuals and explanatory variable. Similarly, fig (d) and (e) show
quadratic relationship between the two. Thus, fig (b) to (e) is depicting the possibility of
heteroscedasticity.

In case we have a multiple regression model with more than one explanatory variable then instead
of plotting squared residuals against each explanatory variable, we can plot it simply against ̂� ,
the estimated Y. Since, ̂� , is a linear combination of all the explanatory variables, we get the
same graphs as above when we plot squaredresiduals against ̂� . This is shown in figure 2.

However, the below drawn graphical plots are just an indication of the problem of
heteroscedasticity. We need some formal methods to make sure the claim of heteroscedasticity
problem.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

Figure1

Figure 2

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

5. Formal Methods to Identify the Problem of Heteroscedasticity

5.1 Park Test

Park method is based on the assumption that heteroscedastic variance, �� , is some function of the
explanatory variable � . Therefore, to regress �� on � , the following functional form is adopted:
���� = � + � �� � + � 1 � = , , … . �.[1]
Here, � is population error variance.
However, we cannot run this regression because population error variance � is unknown and
thus, we use �� as a proxy for � and obtain �� by the below mentioned steps:

1. Run the original OLS regression and obtain the residual, �� .


2. Squaring the residual, �� and taking their logsand regress the following equation:
���� = � + � �� � + � � = , , … . �.[2]
3. In case of more than one explanatory variable, we run the above regression for each
explanatory variable2.
4. Now, if � turns out to be significant then we have a problem of heteroscedasticity and
we need to correct it. However, if � turns out to be insignificant then we can interpret
� as homoscedastic variance, � .

EXAMPLE 1
The following hypothetical example will enable further understanding of this test:
Lets us take a data on wages (per hour, rupees), education (years of schooling) and experience
(years of job) for 523 workers of an industry.Then on regressing wages, being dependant
variable, on education and experience as explanatory variables, we get the following regression
results:
Wagei = -4.524472 + 0.913018 Edui + 0.096810 Expi
se= (1.239348) (0.082190) (0.017719)
t= (-3.650687) (11.10868) (5.463513)
p= (0.003) (0.0000) (0.0000)
r2= .194953

The above result shows that there exist a positive relationship between wages and education and
also wages and experience. The estimated coefficients of education and experience are also
significant as captured by their t values of about 11 and 5 respectively.

1. The functional form adopted is just for simplicity. One can choose some other functional form as well and the
results will be different. For instance, if the values of explanatory variables � are negative then one should
not take log of � . But simply regress �� on � .
2. The other method used can be to regress���� on ̂ i (estimated Y).

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

However, the real problem arises because of the fact that it is a cross sectional data i.e. a sample
of 523 workers with diverse backgrounds is taken together at a given point of time. Thus, the
possibility of heteroscedasticity is higher here. When we plot squared residuals on each of the
explanatory variable i.e. education and experience or on the estimated value of wage, we get
considerable variability in the plot as depicted in earlier figures 1 and 2 [fig (b) to (e)].

Now a formal check for heteroscedasticity can be done by using Park test. Here, we regress
squared residuals on estimated value of wage and get the following results:

� ̂
= -10.35965 + 3.467020 ���
se= (11.79490) (1.255228)
t= (-0.878316) (2.762063)
p= (.3802) (0.0059)
r2= 0.014432

Now we can see that the coefficient of estimated value of wage is statistically significant as it has
very small p value. Thus, the Park test suggests the presence of heteroscedasticity.

NOTE OF CAUTION: The error term � in equation [2] itself may not be homoscedastic. Thus,
we are again back on the same problem.

5.2 Glejser Test

This test suggests that instead of taking square of residuals, we take the absolute value of the
estimated residuals ̂� ,and regress it on explanatory variable, X. He proposed following
functional forms to do the regression.

̂� |= � + �
|� � + Vi

̂� |= � + � √
|� � + Vi

̂� |= � + �
|� / � + Vi

̂� |= � + �
|� /√ � + Vi

̂� |=√� + �
|� � + Vi

̂� |=√� + �
|� �
2
+ Vi

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

For each functional form if� turns out to be significant then we have a problem of
heteroscedasticity and we need to correct it. However, if � turns out to be insignificant then we
can interpret � as homoscedastic variance, � .

Let’s continue with the same hypothetical example to get further understanding of this test:
Now we regress |̂� | on education and get the following results,

̂� | = -0.3208 + 0.2829 Edui


|�
t= (-0.4739) (5.5483)
r2 = 0.0557

̂� | = -3.1905 + 1.8623√� ��
|�
t= (-2.5068) (5.1764)
r2 = 0.0489

The above result suggests that the data suffers from heteroscedasticity as the coefficient of
education is statistically significant. Similarly, we can regress |̂� | on experience and estimated
value of wage also.

NOTE OF CAUTION: The error term � in above functional forms itself may not be
homoscedastic. Also, below mentioned functional forms have non linear parameters and thus can
not be estimated by using normal OLS method.
̂� |=√� + � � + Vi
|�

̂� |=√� + �
|� �
2
+ Vi

Moreover, Glejser himself has found that his first four methods are quite satisfactory in detecting
heteroscedasticity in large samples only. Therefore, in our example of 525 workers, Glejser test
has strengthened the result of park test.

5.3 White’s General HeteroscedasticityTest

To understand this test, let’s assume the following model:

� = � + � �+ � � �� � = , ,…�

Now follow the below mentioned steps:


1. Run the above regression and obtain the residuals, �� .
2. Now run the Auxiliary Regression:
� A1 + A2
= � + A3 � + A4 � + A5 � + A6 � � + � � = , , … � [3]

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

Here, squared residuals are regressed on all the explanatory variables, square of the
explanatory variables and the cross products of them. � is the residual term of the
auxiliary regression.
3. Get the R2 value from the above regression and multiply it by sample size (n). This will
follow distribution with degrees of freedom equal to the number of X, explanatory
variables.
n. R2−
̃ with k-1 degrees of freedom[4]
In this model, d.f. are 5
4. Test the null hypothesis that all the slope coefficients are zero.
5. If the value obtained from equation [4] > critical at chosen level of significance,
or if the p value is very low then we reject the null hypothesis that all the slope
coefficients are zero. In other words, we suffer from the problem of heteroscedasticity.
And if the value obtained from equation [4] < critical at chosen level of
significance, or if the p value is fairly large then we do not reject the null hypothesis that
all the slope coefficients are zero.

Now we continue with example 1 and see how the White’s General Heteroscedasticity Test is
used.

Coeff std.error t stats prob.


Constant 14.38296 71.34726 0.201591 0.8403
EDU -1.1833 9.137968 -0.12949 0.897
2
EDU 0.168639 0.300676 0.560865 0.5751
EDU*EXP 0.022239 0.104117 0.213591 0.8309
EXP -1.40113 1.912126 -0.73276 0.464
2
EXP 0.027113 0.020969 1.293039 0.1966
2
R *n 11.23102
Chi-sq (5) 0.047

From the above table, we find that n. R2 is 11.23102 and this is significant at 5% level. Therefore,
we reject the null hypothesis of no heteroscedasticity.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

SOME MORE FORMAL TEST TO CHECK


HETEROSCEDASTICITY

5.4 Spearman’s Rank Correlation Test

This test is based on Spearman’s Rank Correlation coefficient. To understand this test, we assume
the following model:
� = � + � � + �� = , , … �

Now follow the below mentioned steps:


1. Run the above regression and obtain the residuals ̂� and take their absolute values| ̂� |.

2. Arrange | ̂� | and X i (or estimated value of Yi ) in either increasing order or decreasing


order and run Spearman’s Rank Correlation coefficient by using formula:
rs = 1 − 6 (∑ di2/ (n(n2 − 1)))

here di is the difference in the ranks allocated to two different attributes of the ith
individual and n istotal number of observations ranked.
3. Now use t test to check for the null hypothesis of no heteroscedasticity. It is assumed that
the population rank correlation coefficient ρsis zero and n >8, with df = n − 2.

t= rs√� − / √ − r2 s [5]
4. If the computed t value > critical t value then we reject the null hypothesis of no
heteroscedasticity and if the computed t value < critical t value then we do not reject the
null hypothesis and there is no heteroscedasticity problem.
Note: in case of multiple regression model i.e. with more than one explanatory variables,
rs between | ̂�|and Xican be calculated separately for each Xi and testing for statistical
significance can be done by using t test in [5].

5.5 Goldfield-Quandt Test

This test is based on the assumption that heteroscedatic variance, σ� , is positively related to one
of the explanatory variables, Xi.To understand this test, consider the following model:
Yi = β + β Xi + ui � = , , … n

As per the assumption, let’s say σ� = σ2X � , here σ2 is a constant.It states that σ� is proportional to
the square of the variableX. In other words, larger the values of Xi, larger would beσ� . Hence,
this relation makes the presence of heteroscedasticity more likely in the model.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

Now to have a better idea of this method, follow the below mentioned steps:
1. Arrange the values of Xi from lowest to highest.
2. Remove c central observations, in casec is stated a priori, and then divide the remaining
(n − c) observations into two set each with (n − c) / 2 observations.
3. Fit two separate regressions for the two sets divided above and obtain their respective
residual sums of squares RSS1 and RSS2. Here, RSS1 represents residual sums of squares
for the first set with smaller Xivalues (the small variance group) and RSS2 represents
residual sums of squares for the second set with larger Xivalues (the large variance
group).
The degrees of freedom of these residual sums of squares (RSS) is (n− c) / 2 − k or (n− c
− 2k) / 2. Here, k is the number of parameters (including intercept) to be estimated.
/df
4. Next, find out the ratio λ = [6]
/df
5. Now, λ of [6] will follow F distribution with degrees of freedom of both numerator and
denominator equal to (n−c−2k)/2 if uiis normally distributed and variance is
homoscedastic.
6. So if for a particular regression result, if the computed λ (= F) >critical F at the chosen
level of significance then we reject the null hypothesis of no heteroscedasticity

NOTE: The aim of removing c, central observations, is to highlight the distinction between small
variance group and the large variance group. The results of this test depend on how we decide
which c is to be removed.Thus, it becomes a major limitation of this test. Also, in case of multiple
regression model with more than one X variable, rank order or arrangement of the values of Xi
can be done by choosing any one of the X, explanatory variable. If we find difficulty about which
Xi to choose then we can run this test on each Xi and conclude the result.

5.6 Breusch–Pagan–Godfrey (BPG) Test.

As we have seen in the previous test, Goldfeld–Quandt test, its major drawbacks were finding
correct X, explanatory variable, about which arrangement of the values was done and also
deciding which c is to be removed. These limitations can be overcome by using this test. To
demonstrate this test, first consider the following linear regression model.

Yi= β1 + β2X2i+ ·· ·+βkXki+ ui� = , , … n [7]

Also assume thatthe error variance σ2 is a linear function of some non-stochastic variables Z’s in
the form of σ� = f (α1 + α2Z2i+ ·· ·+αmZmi)

In other words, σ� = α1 + α2Z2i + ·· ·+αmZmi

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
____________________________________________________________________________________________________

Now if f α2= α3= · · · = αm= 0, then σ 2i = α1, which is a constant. Therefore, null hypothesis of no
heteroscedasticity is α2 = α3 = · · · = αm= 0.

Now to have a better idea of this method, follow the below mentioned steps:
1. Run the above regression in equation [7] and obtain the residuals ̂ , ̂ , ……, ̂�
2. Getmaximum likelihood (ML) estimator of σ2 which is˜σ2 = ∑ ˆu2i/n.
3. Define a variable pi= ̂� 2 / ˜σ2
4. Now run the following regression: pi= α1 + α2Z2i + ·· ·+αmZmi + vi � = , , … n[8]
hereviis the residual term
5. Obtain ESS (explained sum of squares) by using [8] and define € as half of ESS.
6. Now we can show that if there is no heteroscedasticity and if the sample size n increases
indefinitely (under the assumption that uiare normally distributed) then we get

€ asy
̃ χ m-1
2

In other words, €asymptoticallyfollows chi-square distribution with (m − 1) degrees of


freedom
7. Therefore, if in a regression result, the computed χ2> the criticalχ2 value at the chosen
level of significance then we can reject the null hypothesis of no heteroscedasticity;
otherwise we will not reject it.

6. Summary
In this module, we have discussed different diagnostic tools to detect the presence of
heteroscedasticity. We have categorized these tools into two groups: informal methods and
formal methods. Informal methods include detection of nature of the problem and graphical
inspection of residuals. These informal methods just give a clue of the presence of
heteroscedasticity and its conformity can be done by using formal methods. For instance, Park
Test, Glejser test, White's test, Spearman's rank correlation test, Goldfeld-Quandt test and
Breusch- Pagan test. All these methods, with different steps, have some advantages and
disadvantages over one another.

BUSINESS PAPER NO. : 8, FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE NO. : 15, HETEROSCEDASTICITY DETECTION
Subject BUSINESS ECONOMICS

Paper No and Title 8- Fundamentals of Econometrics

Module No and Title 17; Autocorrelation

Module Tag BSE_P8_M17

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
TABLE OF CONTENTS
1. Learning Outcomes
2. Introduction
3. Meaning of Autocorrelation
3.1 Structure of Autocorrelation
3.2 Processes Generating Autocorrelation
3.2.1 Autoregressive (AR) Process
3.2.2 Moving Average (MA) Process
3.2.3 Joint Autoregressive Moving Average Process(ARMA)
3.3 Autocorrelation Function
4. Graphical View of Autocorrelation
5. Causes of Autocorrelation
6. Nature of Autocorrelation
6.1 Positive Autocorrelation
6.2 Negative Autocorrelation
7. Order of Autocorrelation
7.1 First Order Autocorrelation
7.2 Higher Order Autocorrelation
7.2.1 Second Order Auto Correlation
7.2.2 pth Order Autocorrelation
8. Consequences of Autocorrelation
9. Summary

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
1. Learning Outcomes
After studying this module, you shall be able to

 Know what is autocorrelation


 Understand structure of autocorrelation
 Know about processes generating autocorrelation
 Identify the nature of autocorrelation
 Identify the order of autocorrelation
 Learn causes of autocorrelation
 Understand the consequences of autocorrelation

2.Introduction

What is Autocorrelation?
One of the assumptions of the classical linear regression model is that conditional on X,
the successive error terms are independently distributed.
If the regression model is Yi = 1 + 2X2i +…+ kXki + ui
Then, the above assumption implies, Cov (ui, uj) = E (ui, uj) 0, Vi ≠ j
This property of the error term is known as serial independence or no autocorrelation.

Autocorrelation (also called serial correlation) is said to be present in a regression model


when this assumption is violated.
So, Cov (ui, uj) ≠ 0, forsome i ≠ j
And, E (ui, uj) ≠ 0, forsome i ≠ j

3.Meaning of Autocorrelation
In an intuitive sense,when no autocorrelation is present, the magnitude and sign of the
error term for some observation should not influence the sign and magnitude of the error
term in the adjacent observation(s). This implicitly implies that the different observations
in a data set are unrelated to each other. So for example, if we have data on income and
consumption expenditure of individuals, then the consumption expenditure of the ith
individual has no influence on the consumption expenditure of the (i+1)st individual.

Presence of autocorrelation, on the other hand implies that the various observations are
correlated to each other. So the observations in the data set may not only be treated as
data points but also as predictors for the subsequent observations.

For example, in a study of output, an unexpected event like breakdown of machinery will
decrease the output for a particular observation. But this negative disturbance for one
BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. 17: AUTOCORRELATION
observation should be no reason to have a negative disturbance for the
neighbouringobservations. So the influence of a random disturbance should not persist
over time. But if a hangover of the disturbance term is observed from one observation to
the neighbouring observations, then autocorrelation is said to be present.

The presence of autocorrelation is troublesome in regression analysis because it reduces


the number of independent observations in the data set, compounding the process of
statistical inference.

3.1 Structure of Autocorrelation

We know that when autocorrelation is present, E(ut, ut-s) = γs, Where, s = 0, ±1, ±2 . . . ,
is the length of the lag. At lag 0, E(ut2) is the constant variance term

Let, Var (ut) = E(ut2) = σ2 = γ0

Autocorrelation at lag s, ρs, may be defined as


𝛾
Ρs= 𝛾𝑠 , s = 0, ±1, ±2 . . .
0

𝛾
= σ𝑠2
∴ Cov (ut, ut-s) =γs = σ2 *ρs

The symbol “±” before the number of lags indicates that the γs and ρs are symmetrical in s
and it does not matter whether the lag is t+s or t-s. So these coefficients are constant over
time and depend only on the length of the lag.

For a sample with n observations, the variance- covariance matrix of the disturbance term
may be written as follows:

var(u1 ) Cov(u1 , u2 ) ⋯ Cov(u1 , un )


∵Var (u) = [ var(u2 ) ⋱ Cov(u2 , un )]
Cov(un , u1 ) Cov(un , u2 ) ⋯ var(un )

γ0 𝛾1 … 𝛾𝑛−1
∴Var (u) = [ 𝛾1 γ0 ⋱ 𝛾𝑛−2 ]
𝛾𝑛−1 𝛾𝑛−2 … γ0

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
1 𝜌1 ⋯ 𝜌𝑛−1
2
=σ [ ⋮ ⋱ ⋮ ]
𝜌𝑛−1 𝜌𝑛−2 ⋯ 1

Estimation of this matrix is not possible since the number of unknowns is greater than the
number of observations. In the literature, a structure is imposed on the disturbance terms.

3.2 Process Generating Autocorrelation

In time series data, three types of simplified structures may be imposed.

3.2.1 AUTOREGRESSIVE (AR) PROCESS

An autoregressive process is one where u is a function of lagged value of itself. When ut


is a function of its own value with one period lag, then it is known as 1st order
autoregressive or AR (1) schemeIn which case,

ut=𝜑1ut-1 + εt (1)

Where, φ is some parameter such that1|𝜑|< 1and, εt is a white noise disturbance term
with the usual properties, i.e.

εt∼ N (0,𝜎𝜀2 ) and cov (εt, εs) = 0, t≠ s

In general a pth order autoregressive, AR (p), scheme may be defined as follows:

ut= φ 1 ut-1 + φ 2 ut-2+ . . . . . + φ p ut-p + εt

Where, |𝜑𝑖 |< 1


εt∼ N (0, 𝜎𝜀2 )
and cov (εt, εs) = 0, t≠ s

Usually, an AR (4) process is apt for describing the disturbance term in seasonally non
adjusted quarterly data. Similarly, AR(12) may be quite suitable for describing the
disturbance term in seasonally non adjusted monthly data

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
3.2.2 MOVING AVERAGE (MA) PROCESS

A moving average process is one where u is a function of ε. A first order moving average
process (MA 1) is given by

ut =θ1εt-1 + εt
Where,εt ∼ N (0,𝜎𝜀2 )
and , cov(εt, εs) = 0, t≠ s

In general, the pth order moving average MA (p) scheme is given by

ut =θ1εt-1 + θ2εt-2+ . . . . . + θpεt-p + εt


Where, εt ∼ N(0,𝜎𝜀2 )
and cov (εt, εs) = 0, t≠ s

3.2.3 JOINT AUTOREGRESSIVE MOVING AVERAGE PROCESS(ARMA)

A general ARMA process is given by

ut =φ 1ut-1 + φ 2ut-2+ . . . + φ put-p + θ1εt-1 + θ2εt-2 +. ... + θpεt-p +εt

3.3 Autocorrelation Function

For a AR(1) process defined in (1), the autocorrelation function (ACF) is given as

ρs= φs , s= 0, 1,2

For a AR(2) process the ACF is given as

ρs= φ1ρs-1 +φ2ρs-2, s > 0

Stationary conditions ensure that the ACF is a decaying function of s. A plot of the ACF
against s is known as a correlogram.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
4. Graphical View of Autocorrelation

Graphically autocorrelation may be understood by plotting the residuals against the


observations. The first panel in Figure 1 shows the presence of positive autocorrelation
because, a positive residual is followed by a positive residual, and a negative residual is
followed by a negative residual. The second panel shows the absence of autocorrelation
because the distribution of the residuals across observations is absolutely random and
does not follow any pattern. The third panel shows the presence of negative
autocorrelation because, a positive residual is followed by a negative residual, and a
negative residual is followed by a positive residual.

Ut
Postive
. . .. . . . ..
.. .t
Auto. 0
.. .. . .
. ...

Ut
No . . .. . . . . . . . .
Auto. 0
. . .. . . . .. .. . .
. . .t
Ut . . . . . .
Negative
0
. . .
. . .
. . . . .
Auto. t
.

Figure 1

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
5. Causes of Autocorrelation

Although, autocorrelation can be present in both time series and cross section data,it is
found more often in time series data. The reason for this is that when the dependent and
independent variables are observed as a sequence of observations over time then they
tend to be correlated with each other over time or some time related phenomenon may be
influencing both the variables over time causing them to co-vary.
There are several factors causing autocorrelation. According to Greene WH (2000), some
of these factors are due to a flawed model specification and have been explained in
Reasons 4.1, 4.2 & 4.3 below. There are some other factors which give rise to serial
correlation as an anticipated part of the model; as listed at 4.4, 4.5 &4.6 below.

(i) Model Misspecification: Omitting a Variable

When some relevant explanatory variables have been omitted while specifying the model,
then their effect is captured by the residual term. As a result, the error term instead of
being random, will exhibit a systematic patterngenerated by the combinedeffect of the
omitted variables.Thiscreates spurious autocorrelation. Thus a model specification bias is
likely to generate an autocorrelated error term.

(ii) Model Misspecification: Incorrect Functional Form

Sometimes a model is erroneously estimated as a linear relationship between Y and X,


when the true underlying relationship is nonlinear. This is likely to generate an
autocorrelated error term because the error term will be consistently overestimated in
some range of the data and underestimated in some other range of data.

(iii)Lagged Dependent Variable

Suppose, the dependent variable in a given time period is a function of its own value in
the previous time period (causing the dependent variable to be a function of a lagged
value of its own self), for example,
Consumptiont = 1 + 2Incomet + 3Consumptiont-1 + ut
If the lagged value is erroneously dropped from the model, then the error term would also
include the impact of lagged consumption and insteadof being random will exhibit a
systematic pattern. So, autocorrelation is likely to be present in autoregressive models.

(iv) Inertia

Many variables, such as consumption expenditure, industrial output, unemployment,


price index etc, when observed for several periods as time series, tend to be suffering
from inertia or carryover processes. They may be characterized by persistence – a

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
tendency to remain in the same state from one observation to the other. They could also
be characterized by business cycle behaviour. Starting at the bottom of a recession, at the
start of economic recovery they tend to move upwards , so that each observed value will
be greater than the previous observed value. The reverse happens at the start of a
recession. Therefore, the successive observations are more likely to be interdependent
and the regression model is more likely to be characterized by an auto correlated error
term.

(v) Cobweb Phenomenon

When a variable responds to another variable with a time lag, it reflects what is known as
the Cobweb phenomenon. For example, when the supply of agricultural products reacts
to price changes with a one period time lag,
Supplyt = 1 + 2Pricet-1 + ut
Then the disturbance term, ui, is not expected to be random. In fact, the error term
displays a pattern of being positive in one period (when the farmers under produce)
followed by a negative value in the successive year (when the farmers overproduce),
followed by a positive value and so on.

(vi) Nature of Empirical Study

Sometimes the empirical relationship being studied is such that autocorrelation gets built
into the system. For example, when the expectations-augmented Phillips curve with
adaptive expectationsis used to explain the phenomenon of inflation, then the expected
inflation in the current year is forecast as the value of inflation in the previous year.
Naturally, the residual in any one time period becomes dependent on the value of the
residual in the previous time period.

Another such example is the returns from hedge funds. The returns from hedge funds are
strongly correlated with their values in the past because these securities are not very
actively traded in the stock market and their market prices are not always readily
available so their reported returns are calculated on the basis of their past returns, leading
to correlated observations across time.

(vii)Data Transformation

Sometimes the given observations are transformed to generate a data set which is more
suitable for the desired empirical analysis.

 Using averages
A quarterly series may be produced from a monthly series using the process of
averaging.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
 Extrapolation/ Interpolation
Data may be obtained by extrapolation or interpolation when it is not available
for certain time periods or observations.

 Difference equations
If a model is given as Yt = 1 + 2Xt + ut (2)
Then it must also be the case that
Yt-1 = 1 + 2Xt-1 + ut-1 (3)

Equations 1 & 2 are known as level equations. A difference equation may be obtained by
subtracting equation 2 from equation 1, which is obtained as

∆ Yt= 2∆Xt +∆ut (4)

It can be shown that if the error term does not exhibit serial correlation in equations 2&3,
then the error term in equation 4 will be serially correlated.

All these data transformation activities may cause the error terms to be correlated with
each other.

(viii) Nature of Time Series Data

If either the dependent variable or the independent variable or both are non-
stationary2then the error term in the regression model will exhibit serial correlation.

(ix) Measurement Error in the Dependent Variable

Sometimes the variables in the data are measured inaccurately so that the reported values
are different from the true values.If the dependent variable is suffering from measurement
error, then this effect willget incorporated into the disturbance term. So, each successive
value of the disturbance term will exhibit some systematic relationship with the
contiguous error term, causing the model to suffer from autocorrelation.

If Yi* = Yi + vi

Where, Yi* is the measured value And, Yi is the true value.


So, the true model is Yi = 1 + 2X2i +…+ kXki + ui. But the estimated model is Yi* = 1
+ 2X2i +…+ kXki + ei

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
So, the disturbance term in the estimated model, ei = ui + vi. If there is a systematic
measurement error in Y then the successive viterms will have a systematic relationship
with each other, causing the ei terms to exhibit autocorrelation.

6. Nature of Autocorrelation
Autocorrelation may be classified as positive or negative, depending on the relationship
between successive disturbance terms.

6.1 Positive Autocorrelation

If the covariance between ut and ut-1 is positive, then autocorrelation is said to be positive.
This would also mean that a positive disturbance term is followed by positive disturbance
term and after a prolonged time (or sequence of observations), the disturbance term
becomes negative and again there is a lingering effect so that for a long sequence of
observations,a negative disturbance term is followed by another negative disturbance.

6.2 Negative Autocorrelation

If the covariance between ut and ut-1 is negative then a positive disturbance term is
followed by negative disturbance term and the negative disturbance term is followed by a
positive disturbance. In this case the autocorrelation is said to be negative.

7. Order of Autocorrelation
The order of autocorrelation is determined on the basis of the relationship between the
successive error terms. The order of autocorrelation depends upon the number of lags
with which the disturbance in a give time is a function of the disturbance in the previous
time period(s).

7.1 First Order Autocorrelation

When the disturbance in any one time period is a function of the disturbance with one
time period lagged, it is called first order autocorrelation. The relationship between a pair
of successive disturbance terms is given as
ut = ρut-1 + εt ,
where, ρ is called the coefficient of autocorrelation such that |ρ| < 1andεtis identically and
independently distributed normally, with mean 0 and variance 2, V t. This process of
generating the disturbance term as a function of its own one period lagged value is also
called the first order autoregressive process AR(1). This series of ut is also known as the
white noise series with zero mean.
BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. 17: AUTOCORRELATION
It is helpful to have the absolute value of ρ, the coefficient of autocorrelation to be less
than one for the stability of the autoregressive process. If the absolute value of ρ is
greater than one than the series would be explosive. The magnitude of ρ indicates the
strength of autocorrelation. The closer is the magnitude of ρ to 1, the stronger is the
autocorrelation. Further, if ρ > 0, autocorrelation is positive and if ρ < 0, autocorrelation
is negative.

7.2 Higher Order Autocorrelation

The longer the time period of the relationship between the disturbance term in the current
time period and the disturbance term in the previous time periods, the higher is the order
of autocorrelation. The order of autocorrelation would depend on the nature of data.
Generally, for example, in case of quarterly data, a fourth order autocorrelation is more
likely to occur and in case of monthly data a 12th order autocorrelation is more likely.

7.2.1 SECOND ORDER AUTOCORRELATION

A model of autocorrelation is called the second-order autoregressive process or AR(2)if


the disturbance in period t is related to both, the disturbance in period t-1 and the
disturbance in period t-2.
t = 1t-1 + 2t-2 + t

The assumptions about the distribution of tand magnitude of i remain as in the AR (1)
process.

7.2.2 pthORDER AUTOCORRELATION

In general the autocorrelation may be of any order and a pth order autoregressive process
AR(p) may be written as:

ut = ρ1ut-1 + ρ2ut-1 + ρ3ut-3 +…+ ρput-p + εt

The assumptions about the distribution of tand magnitude of i remain as in the AR(1)
process.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
8. Consequences of Autocorrelation

The presence of autocorrelation has some consequences on the estimates of the regression
coefficients. These may be stated as follows:

1. The estimates of regression coefficients are unbiasedand consistent but not


efficient. Hence they are not BLUE

Proof of unbiased regression coefficient

The OLS estimator of β may be expressed as:


𝐶𝑜𝑣 (𝑋,𝑦) 𝐶𝑜𝑣 (𝑋,𝑢)
𝛽̂ = 𝑉𝑎𝑟(𝑋) =β + 𝑉𝑎𝑟(𝑋)
⇒E(𝛽̂ ) = β,∵Cov (X,u) =0
Hence, the estimates of regression coefficients are unbiased even in thepresence of
autocorrelation.

Reason for inefficiency of estimates of regression coefficients

We know that any observed value of Y is equal to the sum of the expected value of Y and
the error term. In the presence of positive autocorrelation, the positive error term (ut) in
each observation will be followed by a positive ut, so once an observed value of Y is
found to be above the true value, it will continue to remain above the true value. So, the
observed values of Y will continue to get overestimated. Similarly when ut is negative, it
will continue to be negative, so if an observed value of Y lies below the true value, it will
continue to stay there, making the observed value of Y to be continually underestimated.

This would imply that, although the estimated values of Y are not likely to be biased
(overestimates and underestimates are likely to cancel each other on the balance). The
estimates will be characterized by large variances.

2. The variances of the regression coefficients are also biased and inconsistent.

Proof for bias in the variance of the regression coefficient

∑𝑥 𝑦
We know that, 𝛽̂ OLS = ∑ 𝑥𝑖 2 𝑖
𝑖
𝜎 2
∴ Var(𝛽̂ OLS)= ∑ 𝑥 2
𝑖
So, when no autocorrelation is present, ρ = 0, Var(𝛽̂ ) is unbiased.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
But, when autocorrelation is present (ρ ≠ 0), then Var(𝛽̂ ) is biased and it can be shown
that under AR(1)3,
2𝜎 ∑𝑥 𝑥 ∑𝑥 𝑥 ∑𝑥 𝑥
Var(𝛽̂ OLS) = ∑ 𝑥 2 [1 + 2𝜌 ∑𝑡𝑥𝑡−1
2 + 2𝜌2 ∑𝑡𝑥𝑡−2
2 + ⋯ + 2𝜌𝑛−1 ∑ 𝑥𝑡 2𝑛]
𝑖 𝑖 𝑖 𝑖

If ρ > 0, i.e. when positive autocorrelation is present,then OLS will underestimate the
true variance.
If negative autocorrelation is present, i.e. ρ < 0, then OLS will overestimate the true
variance.

3. The t and F tests are not reliable because of underestimation or overestimation of


the standard errors of the regression coefficients.

When the true variance of the regression coefficient is underestimated, then the t values
of the OLS estimates are larger than should be. As a result, one might conclude that the
variables are statistically significant when they are not (type I error). On the other hand,
when the true variance of the regression coefficient is overestimated thenthe t values of
the OLS estimates are smallerthan should be. As a result, one might conclude that the
variables are statistically not significant when actually they are (type II error). So, the t
tests are not reliable.

4. The estimate of the variance of the error term is also biased. This can be shown in
the following manner. For a simple linear regression model, if variance of each ui
is given as 𝜎2𝑢 , then, its estimator, 𝜎̂𝑢2 , is given as
∑ 𝑒𝑖2
𝜎̂𝑢2 =
𝑛−2
Since, E(∑ 𝑒𝑖2 )= (n-2)𝜎𝑢2
Therefore, 𝜎̂𝑢2 becomes an unbiased estimator of 𝜎𝑢2

But if autocorrelation is present, then for a large n and first order autocorrelation with
parameter λ,
∑ 𝑥𝑡 𝑥𝑡−1 ∑ 𝑥𝑡 𝑥𝑡−2 ∑ 𝑥𝑡 𝑥𝑛
Then, E(∑ 𝑒𝑖2 ) =𝜎𝑢2 [𝑛 − (1 + 2𝜌 + 2𝜌2 + ⋯ + 2𝜌𝑛−1 )]
∑ 𝑥2𝑖 ∑ 𝑥2𝑖 ∑ 𝑥2𝑖

1+𝜌𝜆
Then, E(∑ 𝑒𝑖2 ) =𝜎𝑢2 (𝑛 − )
1−𝜌𝜆
In which case, the estimator, 𝜎̂𝑢2 becomes a biased estimator of 𝜎𝑢2 4

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
5. As a result of the bias in the estimated variance of the error term, the R2values
will not be a reliable test of goodness of fit. If autocorrelation is positive and the
independent variable is growing over time, then the estimated variance of the
error term will be underestimated and the value of R2 will be overestimated
implying an exaggerated goodness of fit. Similarly, when negative autocorrelation
is present, goodness of fit will be understated.

9. Summary

 Autocorrelation is the presence of covariance between the error terms in the


contiguous observations in the regression model.
 They could arise due to autoregressive error terms or error terms which are
generated as a result of moving averages.
 When the error term of a given observation is correlated to the error term of the
immediately succeeding observation, autocorrelation of first order is said to be
present.
 Higher order autocorrelation is said to be present when the error terms are
correlated across observations with a greater time lag.
 Autocorrelation could be positive or negative. Several factors like time series
nature of data, model misspecification, measurement error in the dependent
variable, cobweb phenomenon, lagged dependent variable, etc. may give rise to
autocorrelation.
 The estimates of regression coefficients are unbiased but not efficient in the
presence of autocorrelation.
 The variances of the regression coefficients are biased and inconsistent,
hypothesis test and tests of goodness of fit are unreliable in the presence of
autocorrelation.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 17: AUTOCORRELATION
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8: Fundamentals of Econometrics

Module No and Title 18: Test for Autocorrelation

Module Tag BSE_P8_M18

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
TABLE OF CONTENTS
1. Learning Outcomes
2. Detection of Autocorrelation
2.1 Graphical Method
2.2 Runs Test
3. Durbin Watson tests for Autocorrelation
3.1 Durbin Watson d Test
3.2 Durbin's h Test
3.3- Durbin's m Test
4. Tests for Higher Order Autocorrelation
4.1 Wallis Test
5. Lagrange Multiplier Test
6. Portmanteau Test
6.1 Box Pierce Q Test
6.2 Ljung Box Q Test
7. Summary

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
1. Learning Outcomes

After studying this module, you shall be able to

 Know How to test for the presence of autocorrelation


 Learn Graphical detection of autocorrelation
 Learn about Runs test for detection of autocorrelation
 Use Durbin Watson test for detection of autocorrelation
 Use LM test for detection of higher order autocorrelation
 Use Q test for detection of higher order autocorrelation
 Use Wallis test for detection of fourth order autocorrelation

2. Detection of Autocorrelation

2.1 Graphical method

The presence of autocorrelation can be detected by looking at the residual plot. The graph
of estimated residuals may be plotted in several ways.
 We can plot the residuals against time
 We can plot the standardized residuals against time. The residuals are standardized by
𝑒
dividing the residuals by the standard error of regression. estd =
̂𝑢
√𝜎
these standardized residuals have mean zero and approximately unit variance. For large
n, they are also approximately distributed normally1.
 We can also plot the residuals against their lagged values. So, for an AR (1) scheme, et
may be plotted against et-1.

1
Gujarati D & Sangeetha, Basic Econometrics, Tata McGraw Hill India, 4 th ed, 2007,pp474

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________

Figure 1

Figure 2
BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________

Figure 3

2.2 The Runs test

It is a non-parametric test because it does not invoke any assumptions about the underlying
distribution of the disturbance term. This test is also called Geary test after, R C Geary who
first proposed it in 19702. A run may be defined as the continuous sequence of a “+” or “–
” attribute in the residuals. The length of a run may be defined as number of elements in it.
For example, we may observe residuals with the following pattern of signs:
(- - - - - - - - )(+ + + + + + + + + + + + +)( - - - - - - - - - )
Here, we observe 8 negative residuals, followed by 13 positive residuals, followed by 9
negative residuals for a total of 30 observations. When we observe that a positive residual

2
R C Geary, “Relative efficiency of count sign changes for assessing residual autoregression in least square
regression”, Biometrika, Vol 57, 1970 pp 123-127.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
is followed by a positive residual for a large number of periods and similarly, a negative
residual is followed by a negative residual for a large number of periods then we say that
we are observing few but long runs, positive autocorrelation is said to be present. On the
other hand, if runs are short and change quickly then we can say that negative
autocorrelation is present.
Geary has also proposed a test of hypothesis to check if the runs are indicative of
autocorrelation. The test can be described in the following manner:
Let, N1 – number of residuals with + sign
N2 – number of residuals with - sign
N= N1 + N2 = Total number of observations
R- Number of runs
Now, R is asymptotically normally distributed with the following parameters:

2𝑁1 𝑁2
Mean, E(R) = +1
𝑁
2𝑁1 𝑁2 (2𝑁1 𝑁 2 − 𝑁)
Variance, 𝜎𝑅2 = (𝑁)2 (N−1)
The null hypothesis that can be tested is that the residuals are random. A table for critical
values of runs if N1 or N2 is less than 20 is also there.

3. Durbin Watson Tests for Autocorrelation

3.1 Durbin Watson (D-W) d Test

Durbin & Watson have defined the d statistic, based on the estimated residuals of the
regression
∑𝑛
𝑡=2(𝑢 ̂𝑡−1 )2
̂𝑡 −𝑢
d= ∑𝑛 ̂𝑡2
(1)
𝑡=1 𝑢
Assumptions underlying the d statistic
1. The regression model includes the intercept term.
2. The explanatory variables are non-stochastic
3. The disturbance term, ut is generated by first order autoregressive scheme,
4. The disturbance term, ut is distributed normally.
5. The regression model is not autoregressive, i.e., it does not include lagged values of the
dependent variable as one of the explanatory variables.
6. There are no missing observations in the data.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________

Relationship between the d statistic and autocorrelation

𝑢̂𝑡 and 𝑢̂𝑡−1 are said to be approximately equal since they differ in only one observation,
therefore the value of d statistic mentioned in (1) may be written as

∑𝑢
̂𝑡 𝑢
̂𝑡−1
𝑑 ≈ 2 (1 − ̂𝑡2
∑𝑢
) (2)

If we define, 𝜌̂ as the estimator of ρ, the first order coefficient of autocorrelation in the


following way
∑𝑢
̂𝑡 𝑢
̂𝑡−1
𝜌̂ = ̂𝑡2
∑𝑢
(3)

Then the value of d statistic in (2) becomes

d ≈ 2(1-𝜌̂) (4)
∵ -1 ≤ρ ≤ 1, ∴ 2 ≤d≤4 (5)

From (5) it follows that


If, 𝜌̂ = 0, d = 2, i.e. no autocorrelation
𝜌̂ = +1, d ≈ 0, i.e. positive autocorrelation
𝜌̂ = -1, d ≈ 4, i.e. negative autocorrelation

To summarize, if no autocorrelation is present in the model then, the d statistic will take a
value equal to 2. If positive autocorrelation is present in the model then, the d statistic will
take a value close to 0. If negative autocorrelation is present in the model then, the d statistic
will take a value close to 4.
However, it is important to note that the sampling distribution of d under the null
hypothesis of no autocorrelation depends upon the values of the explanatory variables. So,
the critical values of d will also depend on the values of the explanatory variables. In order,
to overcome this problem Durbin –Watson, developed the lower and upper bounds of d,
namely dL and dU.
The probability distribution of dL and dU does not depend on the values of the explanatory
variables and they have the property that, dL< d < dU. Using this property the D-W d test
can be carried out to test for autocorrelation.
BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
The values for the lower and upper bounds of d are contained in the Durbin Watson
statistical tables.
Steps in D-W d Test

1. Find the estimated residuals after running the OLS regression.


2. Compute the value of d. These days, all statistical programmes compute this value
automatically.
3. Find the values of dL and dU from the Durbin Watson statistical tables
corresponding to the given sample size and the number of explanatory variables for
a given level of significance.
4. Determine whether autocorrelation exists or not and whether autocorrelation is
positive or negative, by carrying out the following hypotheses tests
(i) Test for positive autocorrelation
H0: ρ = 0 versus H1: ρ > 0
Null hypothesis states that there is no positive autocorrelation
(ii) Test for negative autocorrelation
𝐻0∗ : ρ = 0 versus H1: ρ < 0
Null hypothesis states that there is no negative autocorrelation
(iii) Test for any type of autocorrelation- positive or negative
H0 /𝐻0∗ : ρ = 0 versus H1: ρ ≠ 0
Null hypothesis states that there is no autocorrelation- positive or negative
5. Accept or reject any of the above hypotheses as per the following decision rule:w

Durbin Watson d test decision rules

Null Hypothesis Decision Condition

Hypothesis (i)
No positive autocorrelation Reject 0 < d < dL
No positive autocorrelation No decision dL ≤ d ≤ dU
Hypothesis (ii)
No negative autocorrelation Reject 4- dL < d < 4
No negative autocorrelation No decision 4- dU ≤ d ≤ 4- dL
Hypothesis (iii)
No autocorrelation Do not reject dU <d < 4- du

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________

Figure 4- Zones of decision for d statistic

Limitations of D-W test


1. The test is applicable only when the assumptions underlying the d statistic are fulfilled
2. The test can only be used for testing the presence of first order autocorrelation and not
higher order autocorrelations
3. The test contains zones of indecision, where it cannot conclusively determine the
absence or presence of autocorrelation.

3.2 Durbin’s h Test


When the regression model is autoregressive, the d statistic in the D-W test has a tendency
to be close to 2. Thus, it has a built in bias to accept the null hypothesis of no
autocorrelation, even when it is not true.
To get over one of this limitation, Durbin (1970) proposed the h test. If there is an
autoregressive regression model of the following kind:

Yt = 1Yt-1 + 2Yt-1 + pYt-p +p+1 X1t +…+ p+kXkt + ut (6)


Ut = ρ ut-1 + εt and ε ∼ N(0, 𝜎𝜖2 )

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________

Then, the value of h statistic is as under:

𝑑 𝑛
h = (1- 2 )√ ̂1 ) ∼ approx. N(0, 1)
1−𝑛 𝑣𝑎𝑟(𝛽

Where, var (𝛽̂1 ) is the estimated variance of 𝛽̂1


d is the D-W statistic, as defined in the previous section
n is the sample size.

Steps in Durbin’s h Test

1. Run the OLS regression for the model represented by eq. 6.


2. Find var ( 𝛽̂1 )
3. Compute h
4. Since h follows a standard normal distribution for large samples, we use the normal
distribution to carry out the following hypothesis test.

(i) Test for positive autocorrelation


H0: ρ = 0 versus H1: ρ > 0
Null hypothesis states that there is no positive autocorrelation
(ii) Test for negative autocorrelation
H0: ρ = 0 versus H1: ρ < 0
Null hypothesis states that there is no negative autocorrelation

Limitations of Durbin’s h test


It is not possible to use this test if 𝑛 𝑣𝑎𝑟(𝛽̂1 ) ≥ 1.

3.3 Durbin’s m Test

Durbin (1970) suggested the m test to overcome the limitation of the h test.

Steps in Durbin’s m Test

1. Run the OLS regression for the model represented by eq. 6.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
2. Obtain 𝑢̂𝑡
3. Run the OLS regression
𝑢̂𝑡 = b1Yt-1 + b2Yt-1 + bp Yt-p +bp+1 X1t +…+ bp+k Xkt + bp+k+1 ût−1 (7)

We use the t test for the coefficient of ût−1 to carry out the following hypotheses test.
(i) Test for positive autocorrelation
H0: ρ = 0 versus H1: ρ > 0
Null hypothesis states that there is no positive autocorrelation
(ii) Test for negative autocorrelation
H0: ρ = 0 versus H1: ρ < 0
Null hypothesis states that there is no negative autocorrelation

4. Tests for Higher Order Autocorrelation

4.1 Wallis Test

Wallis proposed a test for fourth order autocorrelation. If the disturbance term is
characterized by fourth order autocorrelation then,
ut = φ4ut-4 + εt

Hypothesis in Wallis Test


Null Hypothesis
H0: φ4 = 0
Alternate Hypothesis
H1: φ4 <0 Or H1: φ4 >0
Essentially, the null hypothesis states that fourth autocorrelation is not present and the
alternate hypothesis states the converse is true.

Steps in Wallis Test


Wallis proposed a modified d statistic
∑𝑛
𝑡=5(𝑢 ̂𝑡−4 )2
̂𝑡 −𝑢
d4 = ∑𝑛 ̂𝑡2
𝑡=1 𝑢
The test can be used with for
a) A model without intercept which contains dummy variables for quarterly data
b) A model with intercept and without dummy variables for quarterly data.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
Wallis derived upper and lower bounds for d4. Tables for different significance points have
been derived separately for both these models.

5. Lagrange Multiplier (LM) or Breusch Godfrey Test

This test overcomes two main limitations of the D-W test namely,
(i) It allows the use of both autoregressive and moving averages regression models
(ii) Allows testing for higher order autocorrelation.

Hypothesis in LM Tests
The null hypothesis is that there is no autocorrelation of any order

Alternate hypothesis can be of two types


One, in which ut is generated by the pth order autoregressive AR (p) scheme as follows:
ut = ρ1 ut-1 + ρ2 ut-2 + . . . . . + ρp ut-p + εt
Where, εt ∼ N(0,𝜎𝜀2 ) and cov (εt, εs) = 0, t≠ s
The other, in which ut is generated by the pth order moving average MA (p) scheme as
follows:
ut = ρ1 εt-1 + ρ2 εt-2 + . . . . . + ρp εt-p + εt
Where, εt ∼ N(0,𝜎𝜀2 )
and cov (εt, εs) = 0, t≠ s

Steps in LM test

1. Estimate the model by OLS method and obtain the estimated residuals, 𝑢̂𝑡
2. Run the auxiliary regression
𝑢̂𝑡 = α1 + α2 Xt + 𝜌̂1 𝑢̂𝑡−1 + 𝜌̂2 𝑢̂𝑡−2 + . . . + 𝜌̂𝑛 𝑢̂𝑡−𝑛 + εt3 (8)
3. Obtain R2
4. The test variate is (n-p) R2.
5. For a large value of n, (n-p) R2 ∼ 𝜒𝑝2
6. The hypothesis test is a test of joint significance of first p autocorrelations of these
disturbance terms. H0: ρ1 = ρ2 =. . . . ρn = 0 (No autocorrelation)
H1: At least one of the ρ ≠ 0 (Autocorrelation present)
The same test can be used for any one of the alternate hypotheses.
3
This regression contains only n-p variables.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
7. If the test statistic exceeds the critical value of χ2 at the chosen level of significance,
then the null hypothesis is rejected.

Advantage of LM test

The test can be used both for autoregressive and moving average autocorrelated
disturbance term in the model.

Limitation of LM test

A limitation of the test is that the length of the lag p cannot be specified a priori. The length
of the lag has to be found by inspecting the t statistics on each lagged residual in the
auxiliary regression.

6. Portmanteau Q Tests

6.1 Box Pierce-Q Test

This test is based on the following Q statistic:


Box Pierce Q= n∑ℎ𝑘=1 𝜌̂𝑘2
Where, n is the total number of observations h is the maximum lag
𝜌̂ is the estimated autocorrelation function (ACF) and it is given as
∑𝑛 ̂𝑡 𝑢
𝑡=𝑘+1 𝑢 ̂𝑡−𝑘
𝜌̂𝑘 = ∑𝑛 ̂𝑡2
𝑡=1 𝑢
𝑢̂𝑡 is the estimated disturbance at tth observation
If residuals are white noise, the Q-statistic follows a χ2 distribution with h degrees of
freedom.

Limitations

It has been found that the Q statistic in Box-Pierce Test performs very well with large
samples but not with small samples.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
6.2 Ljung Box –Q Test

The Q statistic in Box-Pierce test has been modified so that it performs well in small
samples. It is known as the Ljung Box Q statistic.
̂2
𝜌
Ljung Box Q = n (n+2) ∑ℎ𝑘=1 𝑛−𝑘
𝑘

where, 𝜌̂ , is the same in Box- Pierce Q statistic


h is the number of lags being tested.

If residuals are white noise, the Q statistic follows a χ2 distribution with h degrees of
freedom just like the Box-Pierce Q statistic.

The Q statistic can be applied for any type of ARIMA4 model. But when it is applied to
test for autocorrelation of the disturbance term in a model, the degrees of freedom of the
test statistic have to be computed by subtracting the number of parameters from the total
number of observations.

Hypothesis in Q Tests

The null and alternative hypotheses are the same as those in the Breusch-Godfrey LM test.

Steps in Q test

1. Estimate the model by OLS method and obtain the estimated residuals, 𝑢̂𝑡
2. Estimate the autocorrelation function 𝜌̂
3. Compute the test variate Q
4. Q ∼ 𝜒ℎ2
5. The hypothesis test is a test of joint significance of first p autocorrelations of these
disturbance terms. H0: ρ1 = ρ2 =. . . . ρn = 0 (No autocorrelation)
H1: At least one of the ρ ≠ 0 (Autocorrelation present)
6. If the test statistic exceeds the critical value of χ2 at the chosen level of significance, then
the null hypothesis is rejected.

4
Autoregressive Integrated Moving Averages (ARIMA) model is a generalization of the ARMA
(Autoregressive Moving Averages model.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________
7. Summary

There are several methods for detecting the presence of autocorrelation. The graphical
method and runs method are non-parametric tests. Durbin Watson test and Durbin's m and
h test are parametric tests that can be used for detection of first order autocorrelation. Wallis
test, LM test and Portmanteau tests are used for detection of higher order autocorrelation.
Durbin Watson test is one of the most commonly known test for autocorrelation but it
suffers from some limitations. These limitations are that in some cases the test is
inconclusive. The test can only be used when the regression model has an intercept term
and does not include lagged dependent variables. The test can only be used when the error
term is autoregressive. Durbin's h test helps in testing for autocorrelation in the presence
of lagged dependent variables in the regression model. LM test can be used when the error
term is generated by an autoregressive or moving averages processes.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 18: TEST FOR AUTOCORRELATION
____________________________________________________________________________________________________

Subject BUSINESS ECONOMICS

Paper No and Title 8: FUNDAMENTALS OF ECONOMETRICS

Module No and Title 19: Remedies of autocorrelation

Module Tag BSE_P8_M19

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION
____________________________________________________________________________________________________

TABLE OF CONTENTS
1. Learning Outcomes
2. Remedies for Autocorrelation
2.1 Model Specification
2.2 GLS Method
2.3 FGLS Method
2.3.1 First Difference Method
2.3.2 Durbin Watson Method
2.3.3 Cochrane Orcutt Iterative Method
2.3.4 Hildreth Lu Grid Search Method
2.4 Newey West method of Correcting Standard Error
3. Summary

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module, you shall be able to

 Know How to estimate a regression model in the presence of autocorrelation


 Learn Generalised Least Squares Regression (GLS) procedure for regression
 Learn Feasible Generalised Least Squares Regression (FGLS) procedure for regression
 ̂
Use Cochrane Orcutt iterative procedure for estimating 𝝆
 ̂
Use Hildreth Lu grid search procedure for estimating 𝝆
 Learn Newey-West method for correcting standard errors of estimates in the presence of
autocorrelation

2. Remedies for Autocorrelation

If autocorrelation is present in a regression model, the estimation of the model may be


carried out in the following manner:

2.1 Model Specification/ Introducing Lagged Dependent variable

2.1.1 OMITTED VARIABLES


If one suspects that the autocorrelation may be due to omitted variables, then one may test
for omitted variables using Ramsey’s RESET test or any other appropriate test, and specify
the model appropriately by including suitable regressors and functional forms.

2.1.2 MISSPECIFIED DYNAMICS


Sometimes the dependent variable is a function of lagged values of Y and X. We can
specify the following dynamic model:
Yt = β1 Yt-1 + β2 Xt + β3 Xt-1 +εt ⎸β1⎹ < 1
According to Sargan’s common factor test, if β1β2 + β3 ≠ 0 then the true model is this
dynamic model.
So we must test the hypothesis Ho: β1 β2 + β3 = 0. If this hypothesis is rejected then the true
model is a dynamic model and we should introduce lagged dependent variables in the
model1.

1
Since this hypothesis is nonlinear, one has to use wald test or LR or LM test for test of hypothesis.
BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION
____________________________________________________________________________________________________

2.2 Generalised Least Squares Regression(GLS)

If autocorrelation is present despite correct specification of the model then the Generalised
Least Squares (GLS) regression method should be used. GLS regression method is a very
useful method of regression, where the regression model is so modified that it fulfills the
assumptions of the classical linear regression model (CLRM) and OLS regression
procedure may be used for estimating it. The GLS procedure is applicable only when ρ is
known
If the regression model is Yt = 1 + 2Xt + ut (1)
Such that, ut = ρut-1 + εt and -1< ρ< 1 (2)
This method exploits the fact the covariance matrix of the error term (ut) can be obtained
in terms of the autocorrelation coefficient and variance of εt.

2.2.1 GENERALISED DIFFERENCE EQUATION METHOD OR COCHRANE


ORCUTT METHOD

From eq (1) the following relationship also holds true:


Yt-1 = 1 + 2Xt-1 + ut-1 (3)
We can multiply eq (3) with ρ and obtain the following relationship:
ρYt-1 = ρ1 + ρ2Xt-1 + ρut-1 (4)
Subtracting eq4 from eq1, yields the following equation
Yt - ρYt-1 = 1- ρ1 + 2Xt - ρ2Xt-1 + ut- ρut-1 (5)
Which can be written as
𝑌𝑡∗ = 𝛽1∗ + 𝛽2∗ 𝑋𝑡∗ + εt (6)
where, 𝑌𝑡∗ = Yt - ρYt-1
𝛽1∗ = 1- ρ1
𝛽2∗ = 2
𝑋𝑡∗ = Xt - ρXt-1
εt = ut- ρut-1
The error term, εt, in eq 6, satisfies the assumptions of the classical linear regression model,
hence eq6 can be estimated by the usual OLS procedure.

Limitations
1. This method is suitable when the disturbance follows AR (1) scheme, otherwise it
compounds the errors of estimation.
2. The method can be applied only if X is not endogenous, i.e., the model does not include
lagged dependent variables.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION
____________________________________________________________________________________________________

2.2.2 PRAIS WINSTEN TRANSFORMATION

In carrying out the above differencing, the first observation is lost. When the number of
observations is large, it does not matter.
But if the number of observations is less, the OLS regressors in eq (6) are not BLUE.
The problem is rectified when the first observation on X and Y is transformed as
Y1√1 − 𝜌2 and X1√1 − 𝜌2 .
After introducing the first observations, the estimates of OLS regressors obtained are
BLUE.

2.3 Feasible Generalised Least Square Regression (FGLS)


FGLS procedure is used when the coefficient of first order autocorrelation, ρ, is not known
and the value of estimated ρ, i.e. 𝜌̂ is used in its place in estimating the difference
equation 6.

2.3.1 THE FIRST DIFFERENCE METHOD

When one suspects perfect positive autocorrelation, then ρ= +1, in which case, the
difference eq 5 becomes Yt - Yt-1 = 2 (Xt - Xt-1) + (ut- ut-1)
Or, Δ Yt = 2 Δ Xt + εt (7)
Where, Δ is defined as the first difference operator.
This equation may be estimated using the OLS regression model, since the disturbance
term satisfies the CLRM assumptions. So, this method may be used when one suspects a
very high value of ρ or the value of the D-W, d statistic is very low.
Some observations about the first difference model
1. This is a regression without intercept.
2. If ρ= +1, then, the underlying series is non stationary but the first difference series is
stationary.
3. This method is suitable only if ρ is very high or close to +1.
To test for ρ= +1, Berenblutt –Webb test may be used.
∑𝑛 𝜀̂ 2
Test statistic, g = ∑𝑛2 𝑒𝑡2
1 𝑡
Where, e- estimator of u in the original model
𝜀̂ - Estimator of ε in the first difference model

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION
____________________________________________________________________________________________________

Hypothesis in the test

H0: ρ= +1
H1: ρ= 0
If the calculated value of g lies below the lower limit of d, we do not reject H0, i.e. ρ= +1.

2.3.2 COMPUTING Ρ USING D-W D STATISTIC

𝑑
Since we know that 𝜌̂ ≈ 1- 2
For large samples, we can use this estimated value of ρ to transform data and run the
regression on generalized difference equation 6.

 Modification by Theil & Nagar


For small samples, it is not suitable to use DW statistic to compute 𝜌̂ . A modification has
been proposed by Theil & Nagar for computing 𝜌̂
𝑑
𝑛2 (1− )+𝑘 2
2
𝜌̂ ≈ 𝑛2 −𝑘 2
Where, n is the sample size, d is the DW statistic, k is the number of estimated coefficients
This value of 𝜌̂ may be used for estimating the generalized difference equation.

2.3.3. COCHRANE-ORCUTT ITERATIVE METHOD OF ESTIMATING Ρ

Iterative method entails starting with an initial value of 𝜌̂ and finding the value of 𝜌̂
through successive approximations. Steps involved in the procedure are:
Step 1: Estimate the regression and obtain residuals
Step 2: Estimate 𝜌̂ by regressing the residuals to its lagged terms.
(Note: This regression does not have an intercept term because the sum of residuals
is zero, making the intercept equal to zero.)
Step 3: Transform the original variables using the 𝜌̂ obtained from step 2 as follows: , 𝑌𝑡∗ =
Yt - 𝜌̂ Yt-1
𝛽1∗ = 1- 𝜌̂ 1
𝛽2∗ = 2
𝑋𝑡∗ = Xt - 𝜌̂ Xt-1
εt = ut- 𝜌̂ ut-1
Step 4: Run the regression again with the transformed variables and obtain residuals.
𝑌𝑡∗ = 𝛽1∗ + 𝛽2∗ 𝑋𝑡∗ + εt
Step 5 and on: Continue repeating steps 2 to 4 for several rounds until the iterations
converge, i.e., the estimates of autocorrelation coefficient , 𝜌̂, from two successive
iterations differ by no more than some preselected small value, such as 0.001.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION
____________________________________________________________________________________________________

2.3.4 HIDRETH LU GRID SEARCH PROCEDURE

This method entails finding the value of 𝜌̂ using a grid search procedure. The steps involved
in the test are:
Step 1: Calculate different values of 𝑌𝑡∗ and 𝑋𝑡∗ , defined in eq 6, for different values X of
ρ at intervals of 0.1 in the range -1≤ρ≤1.
Step 2: Run the regression 𝑌𝑡∗ = 𝛽1∗ + 𝛽2∗ 𝑋𝑡∗ + εt
Step 3: Calculate RSS for each regression and choose the value of ρ for which RSS is
minimum.
Step 4: Repeat the procedure for smaller intervals of ρ around the value of ρ obtained in
step 3.
𝑛 1
Step 5: Choose the value of ρ for which 2 log RSS (ρ) - 2 log (1- ρ2) is minimum.
It may be pointed out though, that the FGLS estimators are not unbiased. But they are
asymptotically more efficient than the OLS estimators of an AR (1) model.
But if the regression model includes lagged dependent variables or exhibits higher order
autocorrelation then, the FGLS estimates are neither unbiased nor consistent.

2.4 Newey – West method- correcting for standard errors of OLS regressors

When the exact form of autocorrelation is not known, or when the independent variables
in the regression model are not exogenous then it is better to estimate the model by the
usual OLS procedure but correct the standard errors of the estimates for different forms of
autocorrelation.
In absence of autocorrelation we know OLS estimate of variance on any coefficient is Var
𝜎𝑢2
(𝛽̂) = ( 𝑁 ∗ 𝑉𝑎𝑟(𝑋))
In the presence of autocorrelation, it can be shown that the Newey-West standard errors
are
𝑣
Var (β*) = Var(𝛽̂)* 𝜎
𝑢
𝑔
Where v = ∑𝑛1 𝑎̂𝑡2 + 2∑ℎ=1[1 − ℎ/(𝑔 + 1)] (∑𝑛𝑡=ℎ+1 𝑎̂𝑡 𝑎̂𝑡−ℎ )
𝑎̂𝑡 = 𝑟̂𝑡 𝑢̂𝑡 , t= 1, 2... n
g- The number of lags
𝑟̂𝑡 - The residual obtained by running the auxiliary regression: xt1 on xt2, xt3.., xtk
with the intercept.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION
____________________________________________________________________________________________________

Advantages
1. This technique may be used even when the model includes lagged dependent
regressors.

Limitations
1. The method can be used only for large samples.
2. The method makes the estimator unbiased, although it is still inefficient.

3. Summary
The presence of autocorrelation in the regression model can be dealt with in several ways.
In case autocorrelation has arisen due to incorrect model specification then the most
appropriate method is to specify the model correctly. This can be done by introducing the
omitted variables or lagged dependent variables. If autocorrelation is present despite
correct specification of the model then the Generalised Least Squares (GLS) regression
method should be used. But the GLS procedure is applicable only when autocorrelation
coefficient is known. When ρ is not known, Feasible Generalised Least Squares (FGLS)
regression method should be used and the estimated value of ρ may be used in place of
its true value. There are several ways of estimating the value of ρ, namely, DW
method, Cochrane Orcutt method and Hildreth Lu method. When the exact form of
autocorrelation is not known, or when the independent variables in the regression model
are not exogenous then it is better to estimate the model by the usual OLS procedure but
correct the standard errors of the estimates for different forms of autocorrelation, by using
the Newey West method of standard error correction.

BUSINESS PAPER No. 8: FUNDAMENTALS OF ECONOMETRICS


ECONOMICS MODULE No. 19: REMEDIES FOR AUTOCORRELATION

You might also like