Econometrics I Lecture Notes
Econometrics I Lecture Notes
CHAPTER ONE
INTRODUCTION
1.1. Definition and Scope of Econometrics
Ragnar Frisch is credited with coining the term ‘econometrics.’ Literally interpreted,
econometrics means “economic measurement”, but the scope of econometrics is much broader as
described by leading econometricians.
An econometrician has to be a competent mathematician and statistician who is an economist by
training. Fundamental knowledge of mathematics, statistics and economic theory are a necessary
prerequisite for this field. As Ragnar Frisch (1933) explains in the first issue of Econometrica, it
is the unification of statistics, economic theory and mathematics that constitutes econometrics.
Each view point, by itself is necessary but not sufficient for a real understanding of quantitative
relations in modern economic life.
Econometrics aims at giving empirical content to economic relationships. The three key
ingredients are economic theory, economic data, and statistical methods. Neither ‘theory without
measurement’, nor ‘measurement without theory’ are sufficient for explaining economic
phenomena. It is as Frisch emphasized their union that is the key for success in the future
development of econometrics.
In general, Econometrics is the science which integrates economic theory, economic statistics,
and mathematical economics to investigate the empirical support of the general schematic law
established by economic theory. It is a special type of economic analysis and research in which
the general economic theories, formulated in mathematical terms, is combined with empirical
measurements of economic phenomena. Starting from the relationships of economic theory, we
express them in mathematical terms so that they can be measured. We then use specific
methods, called econometric methods in order to obtain numerical estimates of the coefficients
of the economic relationships.
[
Income of consumers
An increase in diet consciousness (e.g. drinking coffee causes cancer; so better switch to
orange juice)
Increase or decrease in the price of substitutes (e.g. that of apple)
However, there is no end to this stream of other variables! Many have argued in favour of
simplicity since simple models are easier:
to understand
to communicate
to test empirically with data
The choice of a simple model to explain complex real-world phenomena leads to two criticisms:
For instance, to say that the demand for oranges depends on only the price of oranges
is both an oversimplification and an unrealistic assumption.
This brings us to the distinction between an economic model and econometric model.
i) Economic Models
1. A set of variables
2. A list of fundamental relationships and
3. A number of strategic coefficients
ii) Econometric Models
The most important characteristic of economic relationships is that they contain a random
element which is ignored by mathematical economic models which postulate exact relationships
between economic variables.
Example: Economic theory postulates that the demand for a commodity (Q) depends on its price
(P), on the prices of other related commodities (P0 ), on consumers’ income (Y) and on tastes (T).
This is an exact relationship which can be written mathematically as:
The above demand equation is exact. However, many more factors may affect demand. In
econometrics the influence of these ‘other’ factors is taken into account by the introduction into
the economic relationships of random variable. In our example, the demand function studied
with the tools of econometrics would be of the stochastic form:
Q b0 b1 P b2 P0 b3Y b4t u
where u stands for the random factors which affect the quantity demanded.
Specification of the model is the most important and the most difficult stage of any econometric
research. It is often the weakest point of most econometric applications. In this stage there exists
enormous degree of likelihood of committing errors or incorrectly specifying the model. Some of
the common reasons for incorrect specification of the econometric models are:
i. Economic a priori criteria: These criteria are determined by economic theory and
refer to the size and sign of the parameters of economic relationships.
ii. Statistical criteria (first-order tests): These are determined by statistical theory and
aim at the evaluation of the statistical reliability of the estimates of the parameters of
the model. Correlation coefficient test, standard error test, t-test, F-test, and R2 -test
are some of the most commonly used statistical tests.
iii. Econometric criteria (second-order tests): These are set by the theory of
econometrics and aim at the investigation of whether the assumptions of the
econometric method employed are satisfied or not in any particular case. The
econometric criteria serve as a second order test (as test of the statistical tests) i.e.
they determine the reliability of the statistical criteria; they help us establish whether
the estimates have the desirable properties of unbiasedness, consistency etc.
Econometric criteria aim at the detection of the violation or validity of the
assumptions of the various econometric techniques.
Forecasting is one of the aims of econometric research. However, before using an estimated
model for forecasting by some way or another, the predictive power of the model should be
tested. It is possible that the model may be economically meaningful and statistically and
econometrically correct for the sample period for which the model has been estimated; yet it may
not be suitable for forecasting due to various factors (reasons). Therefore, this stage involves the
investigation of the stability of the estimates and their sensitivity to changes in the size of the
sample. Consequently, we must establish whether the estimated function performs adequately
outside the sample of data i.e. we must test an extra sample performance the model. The steps
discussed above are summarized below diagrammatically.
The data used in empirical analysis may be collected by a governmental agency (e.g., the
Department of Commerce), an international agency (e.g., the International Monetary Fund (IMF)
or the World Bank), a private organization, or an individual. Literally, there are thousands of
such agencies collecting data for one purpose or another.
The Internet: The Internet has literally revolutionized data gathering. If you just “surf the net”
with a keyword (e.g., exchange rates), you will be swamped with all kinds of data sources.
data while holding certain factors constant in order to assess the impact of some factors on a
given phenomenon.
For instance, in assessing the impact of obesity on blood pressure, the researcher would want to
collect data while holding constant the eating, smoking, and drinking habits of the people in
order to minimize the influence of these variables on blood pressure.
In the social sciences, the data that one generally encounters are non-experimental in nature, that
is, not subject to the control of the researcher. For example, the data on GNP, unemployment,
stock prices, etc., are not directly under the control of the investigator. As we shall see, this lack
of control often creates special problems for the researcher in pinning down the exact cause or
causes affecting a particular situation. For example, is it the money supply that determines the
(nominal) GDP or is it the other way round?
Types of Data
Three types of data may be available for empirical analysis: time series, cross-section, and
pooled (i.e., combination of time series and cross section) data.
Time Series Data: A time series is a set of observations on the values that a variable takes at
different times. It is a data collected for a single entity (person, firm, country) collected
(observed) at multiple time periods Such data may be collected at regular time intervals, such as
daily (e.g., stock prices, weather reports), weekly (e.g., money supply figures), monthly [e.g., the
unemployment rate, the Consumer Price Index (CPI)], quarterly (e.g., GDP), annually (e.g.
government budgets), quinquennially, that is, every 5 years (e.g., the census of manufactures), or
decennially (e.g., the census of population).
Although time series data are used heavily in econometric studies, they present special problems
for econometricians because of stationarity issues.
Just as time series data create their own special problems (because of the stationarity issue);
cross-sectional data too have their own problems, specifically the problem of heterogeneity.
Pooled Data: In pooled, or combined, data are elements of both time series and cross-sectional
data.
Panel data (also known as longitudinal data or micropanel) consist of multiple entities where
each entity is observed at two or more time period. This is a special type of pooled data in which
the same cross-sectional unit (say, a family or a firm) is surveyed over time. The key feature of
panel data that distinguishes it from a pooled cross section is the fact that the same cross-
sectional units (individuals, firms, or counties) are followed over a given time period.
CHAPTER TWO
SIMPLE LINEAR REGRESSION
Economic theories are mainly concerned with the relationships among various economic
variables. These relationships, when phrased in mathematical terms, can predict the effect of one
variable on another. The functional relationships of these variables define the dependence of one
variable upon the other variable (s) in the specific form. The specific functional forms may be
linear, quadratic, logarithmic, exponential, hyperbolic, or any other form.
In this chapter we shall consider a simple linear regression model, i.e. a relationship between two
variables related in a linear form. We shall first discuss concept of regression function followed
by estimation methods and their properties, hypothesis testing and prediction using simple linear
regression model.
Much of applied econometric analysis begins with the following premise: y and x are two
variables, representing some population and we are interested in “explaining y in terms of x ,”
or in “studying how y varies with changes in x .”
In writing down a model that will “explain y in terms of x ,” we must confront three issues.
First, since there is never an exact relationship between two variables, how do we allow for other
factors to affect y ? Second, what is the functional relationship between y and x ? And third,
how can we be sure we are capturing a ceteris paribus relationship between y and x (if that is
a desired goal)?
We can resolve these ambiguities by writing down an equation relating y to x. A simple equation
is
y 0 1x u................................................(2.1)
Equation (2.1), which is assumed to hold in the population of interest, defines the simple linear
regression model. It is also called the two-variable linear regression model or bivariate linear
regression model because it relates the two variables y and x . We now discuss the meaning of
each of the quantities in (2.1).
When related by (2.1), the variables y and x have several different names used interchangeably,
as follows. y is called the dependent variable, the explained variable, the response variable,
the predicted variable, or the regressand. x is called the independent variable, the
explanatory variable, the control variable, the predictor variable, or the regressor. (The term
covariate is also used for x .) The terms “dependent variable” and “independent variable” are
frequently used in econometrics.
The variable u , called the error term or disturbance term or stochastic term in the relationship,
represents factors other than x that affect y . A simple regression analysis effectively treats all
factors affecting y other than x as being unobserved. You can usefully think of u as standing
for “unobserved.”
EXAMPLE 2.1
(Soybean Yield and Fertilizer)
Suppose that soybean yield is determined by the model
yield 0 1 fertilizer u,
so that y yield and x fertilizer. The agricultural researcher is interested in the effect of
fertilizer on yield, holding other factors fixed. This effect is given by 1 .The error term u
contains factors such as land quality, rainfall, and so on. The coefficient 1 measures the effect
of fertilizer on yield, holding other factors fixed: y 1fertilizer .
If the average value of u does not depend on the value of x , it is useful to break y into two
components as in (2.3) below. The component 0 1x is sometimes called the systematic part
of y (also called regression line) —that is, the part of y explained by x —and u is called the
unsystematic part or the part of y not explained by x .
y 0 1 xi u ..................................(2.2)
thedependent var iable the regression line random var iable
The classicals made important assumption in their analysis of regression .The most important of
these assumptions are discussed below.
A function is said to be linear in the parameter, say, β 1 , if β1 appears with a power of 1 only and
is not multiplied or divided by any other parameter (for example, β1 β2 , β2 /β1 , and so on).
EXAMPLE 2.2
All of the following models are linear in parameters except d, e, f and g.
a. Y X U e. Y X U
b. ln Y 1 2 ln X U f. Y 2 X U
c. Y 0 1 X 2 U g. lnY X U
d. Y 0 1 X U h. Y 2 1 2 X 1/3 U
This means that the value which U may assume in any one period depends on chance; it may be
positive, negative or zero. Every value has a certain probability of being assumed by U in any
particular instance.
3. The mean value of the random variable (U) in any particular period is zero
This means that for each value of x, the random variable(u) may assume various values, some
greater than zero and some smaller than zero, but if we considered all the possible and negative
values of U, for any given value of X, they would have on average value equal to zero. In other
words the positive and negative values of U cancel each other.
The variance of the random variable (U) is constant in each period. This means that for all
values of X, the U i ’s will show the same dispersion around their mean. In Fig.2.a this
assumption is denoted by the fact that the values that U can assume lie within the same
limits, irrespective of the value of X. For X 1 , U can assume any value within the range AB;
for X 2 , U can assume any value within the range CD which is equal to AB and so on.
Graphically;
Mathematically;
This constant variance is called homoscedasticity assumption and the constant variance itself is
called homoscedastic variance.
This means the values of U (for each X) have a bell shaped symmetrical distribution about their
zero mean and constant variance 2 , i.e.
Ui ~ N (0, 2 ) ………………………………………………………..(2.5)
Given any two X values, Xi and Xj (i j), the correlation between any two Ui and Uj (i j) is
zero. Symbolically,
cov(U i , U j ) E U i E (U i ) U j E (U j )
0 0
E (Ui U j ) (2.6)
0
7. Xi are non-stochastic
The X i ' s are a set of fixed values in the hypothetical process of repeated sampling which
underlies the linear regression model. This means that, in taking large number of samples on Y
and X, the X i values are the same in all samples, but the U i values do differ from sample to
sample, and so of course do the values of Yi .
8. The random variable (U) is independent of the explanatory variables .
This means there is no correlation between the random variable and the explanatory variable. If
two variables are unrelated their covariance is zero.
Proof:
Mean: (Y ) xi ui
X i Since (ui ) 0
Variance: Var (Yi ) Yi (Yi )
2
X i u i ( X i )
2
(ui ) 2
2 (since (ui ) )
2 2
var(Yi ) 2 ……………………………………….(2.9)
The shape of the distribution of Yi is determined by the shape of the distribution of U i which is
normal by assumption 4. Since and , being constant, they don’t affect the distribution ofYi .
Furthermore, the values of the explanatory variable, X i , are a set of fixed values by assumption
5 and therefore don’t affect the shape of the distribution of Yi .
Yi ~N( X i , 2 )
Proof:
Cov(Yi , Y j ) E{[Yi E (Yi )][Y j E (Y j )]}
E{[ X i U i E ( X i U i )][ X j U j E ( X j U j )}
(Since Yi X i U i and Y j X j U j )
= E[( X i Ui X i )( X j U j X j )] ,Since (ui ) 0
E (U iU j ) 0 (from equation (2.6))
Therefore, Cov(Yi ,Y j ) 0 .
Yi X i Ui
The assumptions we have made about the error term U imply that
Let ̂ and ˆ be the estimator for and , respectively. The sample counterpart of U i is the
estimated error Uˆ i (which is also called the residual), defined as
Uˆ i Yi ˆ ˆ X i …………………………………………………………. (2.10)
The two equations to determine ̂ and ˆ are obtained by replacing population assumptions by
their sample counterparts.
Uˆ i 0 or Uˆ i 0
1
E (U) 0
n
X iUˆ i 0 or X iUˆ i 0
1
cov( X ,U ) 0
n
n
In these and the following equations, denotes . These we get the two equations
i1
Yi ˆ ˆ X i 0
→ Y nˆ ˆ X 0
i i
ˆ Y ˆ X …………………………………………………………….(2.13)
Y X i ˆXX i ˆX i2
Uˆ i
2
).
From the estimated relationship Yi ˆ ˆ X i Uˆ i , we obtain:
Uˆ i Yi (ˆ ˆ X i ) …………………………………………….. (2.17)
To find the values of ˆ and ˆ that minimize this sum, we have to partially differentiate Uˆ i
2
with respect to ˆ and ˆ and set the partial derivatives equal to zero.
Uˆ i2
1. 2 (Yi ˆ ˆ X i ) 0 …………………………………………… (2.19)
ˆ
Uˆ i 0 and X Uˆ
i i 0.........................................................(2.23)
If we rearrange equation (2.22) we obtain;
Yi X i ˆX i ˆX i2 ……………………………………….(2.24)
Equation (2.20) and (2.24) are called the Normal Equations. Substituting the values of ̂ from
(2.21) to (2.24), we get:
Y X X (Y ˆX ) ˆX 2
i i i i
Y X i ˆXX i ˆX i2
The expression in (2.28) to estimate the parameter coefficient is termed as the formula in deviation
form.
2.3. Adequacy of the Regression Model
Fitting a regression model requires several assumptions.
Errors are uncorrelated random variables with mean zero;
Errors have constant variance; and,
Errors are normally distributed.
The analyst should always consider the validity of these assumptions to be doubtful and conduct
analyses to examine the adequacy of the model
2.3.1. Residuals Analysis
The residuals from a regression model are Uˆ Y Yˆ , where Yi is an actual observation and Yˆ is
i i i i
the corresponding fitted value from the regression model. Analysis of the residuals is frequently
helpful in checking the assumption that the errors are approximately normally distributed with
constant variance, and in determining whether additional terms in the model would be useful.
Residual analysis is used to check goodness of fit for models.
2.3.2. Goodness-of-fit ( R 2 )
The aim of regression analysis is to explain the behavior of the dependent variable Y. In any given
sample, Y is relatively low in some observations and relatively high in others. We want to know
why. The variations in Y in any sample can be summarized by the sample variance, Var(Y). We
should like to be able to account for the size of this variance through the test called R 2 .
R 2 shows the percentage of total variation of the dependent variable that can be explained by the
changes in the explanatory variable(s) included in the model. To elaborate this let’s draw a
horizontal line corresponding to the mean value of the dependent variable Y . (see figure 2.1
below). By fitting the line Yˆ ˆ 0 ˆ1 X we try to obtain the explanation of the variation of the
dependent variable Y produced by the changes of the explanatory variable X.
.Y = U Y Yˆ
Y
Y Y = Yˆ Yˆ ˆ 0 ˆ1 X
= Yˆ Y
Y.
X
Figure 2.1. Actual and estimated values of the dependent variable Y.
As can be seen from fig.(2.1) above, Y Y represents measures the variation of the sample
observation value of the dependent variable around the mean. However the variation in Y that can
be attributed the influence of X, (i.e. the regression line) is given by the vertical distance Yˆ Y .
The part of the total variation in Y about Y that can’t be attributed to X is equal to Uˆ i Yˆ Y
which is referred to as the residual variation.
In summary:
Uˆ i Yi Yˆ = deviation of the observation Yi from the regression line.
yi Y Y = deviation of Y from its mean.
yˆ Yˆ Y = deviation of the regressed (predicted) value ( Yˆ ) from the mean.
Now, we may write the observed Y as the sum of the predicted value ( Yˆ ) and the residual term (
Uˆ ).
i
Yi Yˆ Uˆ i …………………………………………(2.29)
Observed Yi predicted Yi Re sidual
From equation (2.29) we can have the above equation but in deviation form
yi yˆi Uˆ i . By squaring and summing both sides, we obtain the following expression:
yi 2 ( yˆi 2 Uˆ i ) 2
yi 2 ( yˆi 2 Uˆ ii2 2 yiUˆ i )
yi 2 Uˆ i2 2yˆiUˆ i
ˆ ˆi = Uˆ i (Yˆ Y ) Uˆ i (ˆ ˆ X i Y )
But yu
ˆUˆ i ˆUˆ i X i Y Uˆ i
yU
ˆ ˆ i 0 ………………………………………………(2.30)
Therefore;
yi2 yˆ 2 Uˆ i2 ………………………………...(2.31)
Total Explained Un exp lained
var iation var iation var ation
OR,
Total sum of Explained sum Re sidual sum
square of square of square
i.e. TSS ESS RSS …………… (2.32)
TSS ESS RSS
The breakdown of the total sum of squares TSS into the explained sum of squares ESS and the
residual sum of squares RSS is known as analysis of variance (ANOVA). The purpose of
presenting the ANOVA table is to test the significance of the explained sum of squares.
Mathematically; the explained variation as a percentage of the total variation is explained as:
ESS yˆ 2
……………………………………….(2.33)
TSS y 2
The estimated regression line in deviation form is given by yˆ ̂x (dear students! You can
perform its proof by yourself).Squaring and summing both sides give us
yˆ 2 ˆ 2x 2 (2.34)
We can substitute (2.30) in (2.29) and obtain:
ˆ 2 x 2
ESS / TSS …………………………………(2.35)
y 2
xy xi x y
2 2
2 , Since ˆ i 2 i
x y xi
2
xy xy
2 ………………………………………(2.36)
x y 2
Comparing (2.36) with the formula of the correlation coefficient:
r = Cov (X,Y) / x 2 x 2 = xy / nx 2 x 2 = xy / ( x 2 y )1/2 ………(2.37)
2
= ( xy )2 / ( x 2 y ). ………….(2.38)
2
Squaring (2.37) will result in: r2
Comparing (2.36) and (2.38), we see exactly the same the expressions. Therefore:
xy xy
ESS/TSS 2 2 = r2
x y
From (2.32), RSS=TSS-ESS. Hence R2 becomes;
Interpretation of R 2
Suppose R2 0.95 , this means that the regression line gives a good fit to the observed data
since this line explains 95% of the total variation of the Y value around their mean. The
remaining 5% of the total variation in Y is unaccounted for by the regression line and is
attributed to the factors included in the disturbance variable u i .
According to the Gauss-Markov theorem, the OLS estimators possess all the BLUE properties.
The detailed proof of these properties is presented below (Dear students! this is your reading
assignment and if you face any difficulty while reading, you are welcome).
a. Linearity: (for ˆ )
Proposition: ˆ & ˆ are linear in Y.
b. Unbiasedness:
Proposition: ˆ & ˆ are the unbiased estimators of the true parameters &
From your statistics course, you may recall that if ˆ is an estimator of then E (ˆ) the
amount of bias and if ˆ is the unbiased estimator of then bias = 0 i.e. E (ˆ) 0 E (ˆ)
In our case, ˆ & ˆ are estimators of the true parameters & .To show that they are the
unbiased estimators of their respective parameters means to prove that:
( ˆ ) and (ˆ )
Proof (1): Prove that ˆ is unbiased i.e. (ˆ ) .
We know that ̂ kYi ki ( X i U i )
ki ki X i ki ui ,
but ki 0 and ki X i 1
xi ( X X ) X nX nX nX
k i 0
xi
2
xi2
xi
2
xi2
k i 0 …………………………………………………………………(2.41)
xi X i ( X X ) Xi
k i X i
xi2 xi2
X 2 XX X 2 nX 2
1 k i X i 1.......... .......... .........
X 2 nX 2 X 2 nX 2
……………………………………………(2.42)
2
var( ˆ ) 2 k i2 2 ……………………………………………..(2.47)
xi
Variance of ̂
var(ˆ ) (ˆ ( )
2
ˆ (2.48)
2
2 ( 1 n Xk i ) 2
2 ( 1 n 2 2 n Xk i X 2 k i2 )
2 ( 1 n 2 X n k i X 2 k i2 ) , Since k i 0
2 ( 1 n X 2 k i2 )
1 X2 xi2 1
2( ) , Since k 2
2
n xi (xi ) xi
2 i 2 2
Again:
1 X 2 xi2 nX 2 X 2
2
n xi2 nxi2 nxi
X2 X i2
var(ˆ ) 2 1 n 2 2 …………………………………………(2.49)
2
x i nx i
Dear students! We have computed the variances of OLS estimators. Now, it is time to check
whether these variances of OLS estimators do possess minimum variance property compared to
the variances other estimators of the true and , other thanˆ and ˆ .
To establish that ˆ and ˆ possess minimum variance property, we compare their variances with
that of the variances of some other alternative linear and unbiased estimators of and , say *
and * . Now, we want to prove that any other linear and unbiased estimator of the true
population parameter obtained from any other econometric method has larger variance that that
OLS estimators.
Lets first show minimum variance of ˆ and then that of ̂ .
1. Minimum variance of ˆ
Suppose: * an alternative linear and unbiased estimator of and;
Let * w i Y i .......... .......... .......... .......... . ………………………………(2.50)
where , wi ki ; but: wi ki ci
* wi ( X i ui ) Since Yi X i U i
wi wi X i wi ui
( *) wi wi X i ,since (ui ) 0
Since * is assumed to be an unbiased estimator, then for * is to be an unbiased estimator of ,
there must be true that wi 0 and wi X 1 in the above equation.
But, wi ki ci
wi (ki ci ) ki ci
Therefore, ci 0 since ki wi 0
Again wi X i (ki ci ) X i ki X i ci X i
Since wi X i 1 and ki X i 1 ci X i 0 .
From these values we can drive ci xi 0, where xi X i X
ci xi ci ( X i X ) ci X i Xci
Since ci xi 1 ci 0 ci xi 0
Thus, from the above calculations we can summarize the following results.
wi 0, wi xi 1, ci 0, ci X i 0
To prove whether ˆ has minimum variance or not lets compute var( *) to compare with var( ˆ ) .
var( *) var( wiYi )
wi var( Yi )
2
ci xi
wi2 k i2 ci2 Since k i ci 0
xi2
Therefore, var( *) (k i ci ) k i ci
2 2 2 2 2 2 2
Given that ci is an arbitrary constant, 2 ci2 is a positive i.e it is greater than zero. Thus
var( *) var( ˆ ) . This proves that ˆ possesses minimum variance property. In the similar way
we can prove that the least square estimate of the constant intercept ( ̂ ) possesses minimum
variance.
Minimum Variance of ̂
We take a new estimator * , which we assume to be a linear and unbiased estimator of function
of . The least square estimator ̂ is given by:
ˆ ( 1 n Xk i )Yi
By analogy with that the proof of the minimum variance property of ˆ , let’s use the weights wi =
ci + ki Consequently;
* ( 1 n Xwi )Yi
Since we want * to be on unbiased estimator of the true , that is, ( *) , we substitute
for Y xi ui in * and find the expected value of * .
* ( 1 n Xwi )( X i ui )
X u i
( Xwi XX i wi Xwi ui )
n n n
* X ui / n Xwi Xwi X i Xwi u i
For * to be an unbiased estimator of the true , the following must hold.
( wi ) 0, ( wi X i ) 1 and ( wi u i ) 0
i.e., if wi 0, and wi X i 1 . These conditions imply that ci 0 and ci X i 0 .
As in the case of ˆ , we need to compute Var( * ) to compare with var( ̂ )
var( *) var ( 1 n Xwi )Yi
( 1 n Xwi ) 2 var(Yi )
2 ( 1 n Xwi ) 2
2 ( 1 n2 X 2 wi 2 1 n Xwi )
2
2 ( n n2 X 2 wi 2 X wi )
2 1
n
var( *) 2
1
n X 2 wi
2
,Since wi 0
but wi k i ci
2 22
var( *) 2 1
n X 2 (k i2 ci2
1 X2
var( *) 2 2 2 X 2 ci2
n xi
X i2
2 2 X 2 ci2
nxi
2
The first term in the bracket it var( ˆ ) , hence
To use ̂ 2 in the expressions for the variances of ˆ and ˆ , we have to prove whether ̂ 2 is the
n2
The maximum likelihood method of estimation is based on the idea that different populations
generate different samples, and that any given sample is more likely to have come from some
populations than from others.
The ML estimator of a parameter is the value of ˆ which would most likely generate the
observed sample observations Y1 , Y2 ,..., Yn . The ML estimator maximizes the likelihood function L
which is the product of the individual probabilities taken overall n observations given by:
1 1
exp 2 Y E (Y )
2
2
i i
n
2 2
1 1
exp 2 Y X
2
2
i i
n
2 2
Our aim is to maximize this likelihood function L with respect to the parameters , and 2 . To
do this, it is more convenient to work with the natural logarithm of L (called the log-likelihood
function) given by:
n n 1
ln(2 ) (Y X )
2
ln L
2 2
i i
2 2
Taking partial derivatives of lnL with respect to , and 2 and equating with zero, we get:
ln L 1
2 2
(Y X )(1) 0
i i
ln L 1
2 (Yi Xi )( X i ) 0
2
Y n X
i i
X Y X X
i i i i
2
Note that these equations are similar to the normal equations that we obtained so far (i.e. under
OLS and MM). Solving for U Y Yˆ and we get:
Y X ˆ
XY i i
ˆ
X nX
i
2 2
By partial differentiation of the lnL with respect to 2 and equating it to zero we get:
ln L n 1 1 2 1
2 (Yi Xi ) 4 0
2
2 2
1 1 1
(Yi Xi ) (Yi ˆ ˆ Xi ) U i2
2
2
2
n n n
Note:
i. The ML estimators and are identical to the OLS estimators, and are thus best
linear unbiased estimators (BLUE).
ii. The ML estimator 2 of 2 is biased.
Vˆ ( ˆ )
uˆ 2
i
(n 2) x 2
i
S.E.( ˆ )= Vˆ ( ˆ )
ˆ 2
uˆ 2
x 2
i (n 2) x 2
i
Similarly,
X 2 uˆi X i2
2
2 X i uˆ
2
X i2
2
Vˆ (ˆ ) ˆ 2 i 2 ˆ ˆ
2 V ( ) ˆ 2
i
and S.E. = =
nxi n 2 nxi nx i n 2 nxi2
1
Note: The standard error test is an approximated test (which is approximated from the z-test
and t-test) and implies a two tail test conducted at 5% level of significance.
should not be included in the function, since the conducted test provided evidence that changes in
X leave Y unaffected. In other words acceptance of H0 implies that the relationship between Y and
X is in fact Y (0) x , i.e. there is no relationship between X and Y.
Numerical example: Suppose that from a sample of size n=30, we estimate the following supply
function.
Since we have two parameters in simple linear regression with intercept different from zero, our
degree of freedom is n-2. Like the standard error test we formally test the hypothesis:
Step 4: Obtain critical value of t, called tc at 2 and n-2 degree of freedom for two tail test.
Step 5: Compare t* (the computed value of t) and tc (critical value of t)
If t*> tc , reject H0 and accept H1 . The conclusion is ˆ is statistically significant.
The values in the brackets are standard errors. We want to test the null hypothesis: H0 : 0
against the alternative H1 : 0 using the t-test at 5% level of significance.
a. the t-value for the test statistic is:
ˆ 0 ˆ 0.70
t* = 3 .3
SE( ˆ ) SE( ˆ ) 0.21
b. Since the alternative hypothesis (H1 ) is stated by inequality sign ( ), it is a two tail test,
hence we divide 2 0.05 2 0.025 to obtain the critical value of ‘t’ at 2 =0.025 and 18
degree of freedom (df) i.e. (n-2=20-2). From the t-table ‘tc’ at 0.025 level of significance
and 18 df is 2.10.
c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ˆ is statistically significant.
Right-tail ̂ * ̂ * t t , df
Left-tail ̂ * ̂ * t t , df
P t (n 2) t t (n 2) 1
2 2
ˆ
P t (n 2) t (n 2) 1
2 se( ˆ ) 2
Rearranging the above expressions gives,
P ˆ t (n 2)se( ˆ ) ˆ t (n 2) se( ˆ ) 1 . Thus, a (1 )100% confidence interval
2 2
for is given by ˆ t (n 2)se(ˆ ) . Now we can use the constructed confidence interval for
2
hypothesis testing.
The test procedure is outlined as follows.
H0 : 0
H1 : 0
Decision rule: If the hypothesized value of in the null hypothesis is within the confidence
interval, accept H0 and reject H1 . The implication is that ˆ is statistically insignificant; while if
the hypothesized value of in the null hypothesis is outside the limit, reject H0 and accept H1 .
This indicates ˆ is statistically significant.
If the chosen model does not refute the hypothesis or theory under consideration, we may use it to
predict the future value(s) of the dependent, or forecast, variable Yon the basis of known or
expected future value(s) of the explanatory, or predictor, variable X.
Suppose we want to predict the mean consumption expenditure (C) for 1997 given that the
estimated model (using data from 1963-1996) is given by Cˆ 153.48 0.876Y . The GDP (Y)
value for 2004 was 68275.40 million Birr. Putting this GDP figure on the right-hand side of the
estimated model, we obtain;
Cˆ1997 153.48 0.876(68275.40) 59962.7304
Note that there is discrepancy between the predicted value and the actual value which results in
forecast error. What is important here is to note that such forecast errors are unavoidable given
the statistical nature of our analysis.
CHAPTER THREE
MULTIPLE LINEAR REGRESSION
3.1.Introduction
In Chapter 2, we learned how to use simple regression analysis to explain a dependent variable, Y,
as a function of a single independent variable, X. But in practice, economic models generally
contain one dependent variable and two or more independent variables. Such models are called
multiple regression models.
Examples:
1. Suppose that soybean yield (Y) is determined the amount of fertilizer (X1 ) applied,
land quality (X2 ) and rainfall (X3 ) and is given by the following model:
Yi 1 X1i 2 X 2i 3 X 3i ui ,
The error term u contains unforeseen factors.
2. In a study of the amount of output (product), we are interested to establish a
relationship between output (Q) and labor input (L) & capital input (K). The equation
is often estimated in log-log form as:
ln Qi 1 ln Li 2 ln Ki i
3. Economic theory postulates that the quantity demanded for a given commodity (Q d)
depends on its price (X1 ), price of other products (X2 ), consumer’s income (X3 ).
Qdi 1 X1i 2 X 2i 3 X 3i i , where the disturbance term (reads as
‘epsilon’) contains unobserved factors such as tastes and so on.
Multiple regression analysis is more amenable to ceteris paribus analysis because it allows us to
explicitly control for many other factors that simultaneously affect the dependent variable.
Naturally, if we add more factors to our model that are useful for explaining Y, then more of the
variation in Y can be explained. Thus, multiple regression analysis can be used to build better
models for predicting the dependent variable.
We continue to operate within the framework of the classical linear regression model
(CLRM) first introduced in Chapter 2. Specifically, we assume the following:
1. Linearity of the model in parameter: The classicals assumed that the model should be
linear in the parameters regardless of whether the explanatory and the dependent variables
are linear or not.
2. Randomness of the error term: The variable ui is a real random variable.
3. Zero mean of the error term: E(ui ) 0
4. Homoscedasticity: The variance of each u i is the same for all the xi values.
i.e. E (u i ) u (constant)
2 2
This condition is automatically fulfilled if we assume that the values of the X’s are a set of
fixed numbers in all (hypothetical) samples.
In order to understand the nature of multiple regression model easily, we start our analysis with the
case of two explanatory variables, then extend this to the case of k-explanatory variables.
Consider the model
Yi 1 X1i 2 X 2i ui .........................................................(3.1.)
In order to understand the nature of multiple regression model easily, we start our analysis with the
case of two explanatory variables, then extend this to the case of k-explanatory variables. The
expected value of the above model is called population regression equation i.e.
E(Y ) 1 X1i 2 X 2i , Since E(ui ) 0 . …………………................(3.2)
where , 1 and 2 are the population parameters. is referred to as the intercept and 1 and 2
are also sometimes known as regression slopes of the regression. Note that, 2 for example
measures the effect on E (Y ) of a unit change in X 2 when X 1 is held constant.
Since the population regression equation is unknown to any investigator, it has to be estimated
from sample data. Let us suppose that the sample data has been used to estimate the population
regression equation. We leave the method of estimation unspecified for the present and merely
assume that equation (3.2) has been estimated by sample regression equation, which we write as:
Yˆi ˆ ˆ1 X 1i ˆ2 X 2i ……………………………………………….(3.3)
Where ̂1 and ˆ2 are estimates of the 1 and 2 respectively and Yˆ is known as the predicted
value of Y.
Now it is time to state how (3.1) is estimated. Given sample observation on Y , X 1 & X 2 , we
estimate (3.1) using the method of least square (OLS).
Yi ˆ ˆ1 X 1i ˆ2 X 2i uˆi ……………………………………….(3.4)
Equation (3.4) is estimated relation between Y , X 1 & X 2 .
uˆ 2 (Y Yˆ Y ˆ ˆ X ˆ X )2 …………………………………..(3.5)
i i i 1 1 2 2
To solve for ̂1 and ˆ2 , it is better to put our model in deviation form as below:
y ˆ1 x1i ˆ2 x2i ui ui yi ˆ1 x1i ˆ2 x2i
uˆ ( y ˆ x
2
i i 1 1i ˆ2 x2i )2 ……………………………………. (3.9)
Partially differentiating (3.9) w.r.t. ̂1 and ˆ2 , equating to zero and simplifying we get:
uˆi2
0 x1i yi ˆ1x1i 2 ˆ2x1i x2i ………………………………………… (3.10)
ˆ
1
uˆi2
0 x2i yi ˆ1x1i x2i ˆ2x22i ……………………………………… (3.11)
ˆ 2
x 1
2
x x 1 2 ˆ1 = x y
1i i …………………..(3.12)
x x 1 2 x 2
2
̂ 2 x y 2i i
ˆ
2 ˆ ) uˆ 2
(Yi Yi
i
…………………………………………………… (3.15)
n3 n3
Where Yˆi ˆ ˆ1 X 1i ˆ2 X 2i
The variances of estimated regression coefficients ̂1 and ˆ2 are estimated, respectively, as:
ˆ 2 ˆ 2
Vˆ ( ˆ1 ) and Vˆ ( ˆ2 ) where r12 is the coefficient of correlation
(1 r122 ) x12i (1 r122 ) x22i
between X 1i and X 2i .
3.3.Partial Correlation Coefficients & Their Interpretation
A partial correlation coefficient measures the relationship between any two variables, when all
other variables connected with those two are kept constant. For the three-variable regression model
we can compute three correlation coefficients: r12 (correlation between Y and X2), r13 (correlation
coefficient between Y and X3 ), and r23 (correlation coefficient between X2 and X3); notice that we
are letting the subscript 1 represent Y for notational convenience. These correlation coefficients
are called gross or simple correlation coefficients, or correlation coefficients of zero order.
But now consider this question: Does, say, r12 in fact measure the “true” degree of (linear)
association between Y and X2 when a third variable X3 may be associated with both of them? In
general, r12 is not likely to reflect the true degree of association between Y and X2 in the presence
of X3 . As a matter of fact, it is likely to give a false impression of the nature of association
between Y and X2 , as will be shown shortly. Therefore, what we need is a correlation coefficient
that is independent of the influence, if any, of X3 on X2 and Y. Such a correlation coefficient can
be obtained and is known appropriately as the partial correlation coefficient. Conceptually, it is
similar to the partial regression coefficient. We define
These partial correlations can be easily obtained from the simple or zero order, correlation
coefficients as follows:
r12 r13r23
r12.3 ……………………………………………………………….. (3.16a)
(1 r132 )(1 r232 )
The partial correlations given in the above equations are called first order correlation coefficients.
By order we mean the number of secondary subscripts. Thus r12.34 would be the correlation
coefficient of order two, r12.345 would be the correlation coefficient of order three, and so on. As
noted previously, r12 , r13 and so on are called simple or zero-order correlations. The interpretation
of, say, r12.345 is that it gives the coefficient of correlation between Y and X2, holding X3 andX4
constant.
In the two-variable case, the simple r had a straightforward meaning: It measured the degree of
(linear) association (and not causation) between the dependent variable Y and the single
explanatory variable X. But once we go beyond the two-variable case, we need to pay careful
attention to the interpretation of the simple correlation coefficient. From (3.22a), for example,
we observe the following:
1. Even if r12 0 , r12.3 will not be zero unless r13 or r23 or both are zero.
2. If r12 0 and, r13 and r23 are nonzero and are of the same sign, r12.3 will be negative,
whereas if they are of the opposite signs, it will be positive.
3. The terms r12.3 and r12 (and similar comparisons) need not have the same sign.
4. In the two-variable case we have seen that r 2 lies between 0 and 1. The same property
holds true of the squared partial correlation coefficients. Using this fact, one can obtain the
following expression:
0 r122 r132 r232 2r12r13r23 1
which gives the interrelationships among the three zero-order correlation coefficients.
5. Suppose that r13 r23 0 . This does not mean that Y and X2 are uncorrelated (i.e. r12 0 ).
2
In passing, note that the expression r12.3 may be called the coefficient of partial determination and
may be interpreted as the proportion of the variation in Y not explained by the variable X 3 that has
been explained by the inclusion of X2 into the model. Conceptually it is similar to R2 .
Before moving on, note the following relationships between R2 , simple correlation coefficients,
and partial correlation coefficients:
R 2 r122 (1 r122 )r13.2
2
It above expressions state that R2 will not decrease if an additional explanatory variable is
introduced into the model, which can be seen clearly from the above equations. This equation
states that the proportion of the variation in Y explained by X2 and X3 jointly is the sum of two
parts: the part explained by X2 alone ( R r12 ) and the part not explained by X2 (1 r12 ) times the
2 2 2
proportion that is explained by X3 after holding the influence of X2 constant. Now R r12 as long
2 2
as r13.2 0 .
2
The sign of partial correlation coefficients is the same as that of the corresponding estimated
parameter. For example, for the estimated regression equation Yˆ ˆ ˆ1 X 1 ˆ2 X 2 , r12.3 has the
same sign as ̂1 and r13.2 has the same sign as ˆ2 .
Partial correlation coefficients are used in multiple regression analysis to determine the relative
importance of each explanatory variable in the model. The independent variable with the highest
partial correlation coefficient with respect to the dependent variable contributes most to the
explanatory power of the model.
3.4. Coefficient of Multiple Determination
In the simple regression model, we introduced R2 as a measure of the proportion of variation in the
dependent variable that is explained by variation in the explanatory variable. In multiple
regression model the same measure is relevant, and the same formulas are valid but now we talk of
the proportion of variation in the dependent variable explained by all explanatory variables
included in the model. The coefficient of determination is:
ESS RSS uˆ 2
R2 1 1 i2 ------------------------------------- (3.17)
TSS TSS yi
In the present model of two explanatory variables given in deviation form:
uˆi2 ( yi ˆ1 x1i ˆ2 x2i ) 2
uˆi ( yi ˆ1 x1i ˆ2 x2i )
uˆi y ˆ1x1i uˆi ˆ2uˆi x2i
uˆi yi since uˆi x1i uˆi x2i 0
y i ( y i ˆ1 x1i ˆ 2 x 2i )
i.e uˆi2 y 2 ˆ1x1i yi ˆ2x2i yi
y 2 ˆ1x1i yi ˆ2x2i yi uˆi 2 ----------------- (3.18)
Total sumof Explained sum of Re sidual sum of squares
square (Total square ( Explained ( un exp lained var iation )
var iation ) var iation )
R
ESS ˆ1x1i yi ˆ2 x2i yi
2
1
uˆi2
----------------------------------(3.19)
TSS y 2 yi2
As in simple regression, R2 is also viewed as a measure of the prediction ability of the model over
the sample period, or as a measure of how well the estimated regression fits the data. If R2 is high,
the model is said to “fit” the data well. If R2 is low, the model does not fit the data well.
Adjusted Coefficient of Determination ( R 2 )
One difficulty with R 2 is that it can be made large by adding more and more variables, even if the
variables added have no economic justification. Algebraically, it is the fact that as the variables
are added the sum of squared errors (RSS) goes down (it can remain unchanged, but this is rare)
and thus R 2 goes up. If the model contains n-1 variables then R 2 =1. The manipulation of model
just to obtain a high R 2 is not wise. An alternative measure of goodness of fit, called the adjusted
R 2 and often symbolized as R 2 , is usually reported by regression programs. It is computed as:
uˆi2 / n k n 1
R 1 2
2
1 (1 R 2 ) --------------------------------(3.20)
y / n 1 nk
This measure does not always goes up when a variable is added because of the degree of freedom
term n-k is the numerator. As the number of variables k increases, RSS goes down, but so does n-
k. The effect on R 2 depends on the amount by which R 2 falls. While solving one problem, this
corrected measure of goodness of fit unfortunately introduces another one. It loses its
interpretation; R 2 is no longer the percent of variation explained. This modified R 2 is sometimes
used and misused as a device for selecting the appropriate set of explanatory variables.
2 2
The R and R tell you whether:
The regressors are good at predicting or "explaining" the values of the dependent
2 2
variable in the sample of data on hand. If the R (or R ) is nearly 1, then the regressors
produce good predictions of the dependent variable in that sample, in the sense that
the variance of the OLS residual is small compared to the variance of the dependent
variable.
2 2
The R (or R ) do NOT tell you whether:
Whether the included variable is statistically significant;
The regressors are a true cause of the movements in the dependent variable;
There is omitted variable bias; or
You have chosen the most appropriate set of regressors.
uˆi2
2(Yi ˆ0 ˆ1 X1i ˆ2 X 2i ...... ˆk X k )(Xki ) 0
ˆk
The general form of the above equations (except first ) may be written as:
uˆi2
2(Yi ˆ0 ˆ1 X 1i ˆk X ki )(X ji ) 0 ; where ( j 1,2,....k )
ˆ j
: : : : :
: : : : :
Yi X ki ˆ0 X ki ˆ1X 1i X ki X 2i X ki .......... ........ ˆk X ki
2
Solving the above normal equations will result in algebraic complexity. But we can solve this
easily using matrix. Hence in the next section we will discuss the matrix approach to linear
regression model.
The general linear regression model with k explanatory variables is written in the form:
Yi 0 1 X 1i 2 X 2i .......... ... k X ki +ui
Since i represents the ith observation, we shall have ‘n’ number of equations with ‘n’ number of
observations on each variable.
Y1 0 1 X11 2 X 21 3 X 31............. k X k1 u1
Y2 0 1 X12 2 X 22 3 X 32 ............. k X k 2 u2
Y3 0 1 X13 2 X 23 3 X 33................. k X k 3 u3
…………………………………………………...
Yn 0 1 X1n 2 X 2n 3 X 3n ............. k X kn un
These equations are put in matrix form as:
Y1 1 X 11 X 21 ....... X k1 0 u1
Y 1 X 12 X 22 .......
X k 2 1 u
2 2
Y3 1 X 13 X 23 ....... X k 3 2 u3
. . . . ....... . . .
Yn 1 X 1n X 2n ....... X kn n un
Y X . u
In short Y X u ……………………………………………………(3.21)
The order of matrix and vectors involved are:
Y (n 1), X (n (k 1) , (k 1) 1 and u (n 1)
To derive the OLS estimators of , under the usual (classical) assumptions mentioned earlier, we
uˆ
i 1
2
i uˆ12 uˆ22 uˆ32 ......... uˆn2
uˆ1
uˆ
2
[uˆ 1 , uˆ 2 ......uˆ n ] . uˆ 'uˆ
.
uˆn
uˆi2 uˆ 'u
ˆ ˆ (Y X ˆ )'(Y X ˆ )
u'u
YY 'ˆ ' X 'Y Y ' Xˆ ˆ ' X ' Xˆ ………………….…(3.22)
Since ˆ ' X ' Y ' is scalar (1x1), it is equal to its transpose;
ˆ ' X 'Y Y ' Xˆ
ˆ ˆ Y 'Y 2ˆ ' X 'Y ˆ ' X ' X ˆ -------------------------------------(3.23)
u'u
Vˆ ( ˆ ) ˆ u2 ( X ' X ) 1
Form the above expression, the variances of the estimates are given by multiplying the main
diagonal of ( X ' X )1 and ˆ u2 .
When the model is in deviation form we can write the multiple regression in matrix form as ;
̂ ( xx) 1 xy
ˆ1 x 2 1 x1 x 2 ....... x1 x k
ˆ 2
x 2 x1 x 2 x 2 x k
2
.......
where ˆ = : and ( xx) : : :
: : : :
2
ˆ k x n x1 x n x 2 ....... x k
The above column matrix ˆ doesn’t include the constant term ˆ0 .Under such conditions the
variances of slope parameters in deviation form can be written as:
ˆ ˆ ) ˆ 2 ( x ' x) 1 ………………………………………………………….. (3.25)
V( u
Dear Students! I hope that from the discussion made so far on multiple regression model, in
general, you may make the following summary of results.
(i) Model: Y X u
1
ˆ ' X 'Y Yi
2
If we invoke the assumption that ui . N (0, ) , then we can use one of the following tests: the t-
2
test, standard error, confidence interval test or p-value test to test a hypothesis about any individual
partial regression coefficient. Test of a single parameter in multiple linear regression is the same
with significance test under simple linear regression discussed in Chapter 2.
The t-test
To illustrate consider the following example.
Let Y ˆ0 ˆ1 X 1 ˆ2 X 2 ...ˆk X k uˆi
Consider the null hypothesis,
H0 : j 0
H1 : j 0, j 1, 2,..., k
Since j measures the partial effect of X j on Y after controlling for other independent variables,
H 0 : j 0 means that, once X j have been accounted for, X j has no effect on Y. We compute the
(t-statistic) t-ratio for each ˆ i as follows:
ˆ j
t*ˆ
j
se( ˆ j )
Next find the tabulated value of t ( tc ).
If t j tc (tabulated), we do not reject the null hypothesis, i.e. we can conclude that ˆ j is
*
not significant and hence the regressor does not appear to contribute to the explanation
of the variations in Y.
If t t (tabulated), we reject the null hypothesis and we accept the alternative one; ˆ
*
j c j
*
is statistically significant. Thus, the greater the value of t j the stronger the evidence
In many applications we are interested in testing a hypothesis involving more than one of the
population parameters. We can also use the t-statistic to test a single linear combination of the
parameters, where two or more parameters are involved.
There are two different procedures to perform the test with a single linear combination of
parameters. In the first, the standard error of the linear combination of parameters corresponding to
the null hypothesis is calculated using information on the covariance matrix of the estimators. In
the second, the model is reparameterized by introducing a new parameter derived from the null
hypothesis and the reparameterized model is then estimated; testing for the new parameter
indicates whether the null hypothesis is rejected or not. The following example illustrates both
procedures.
To examine whether there are constant returns to scale in the agricultural sector, we are going to
use the Cobb-Douglas production function, given by
ln(output) 1 ln(labour) 2 ln(capital) u
In the above model parameters 1 and 2 are elasticities (output/labor and output/capital).
Before making inferences, remember that returns to scale refers to a technical property of the
production function examining changes in output subsequent to a change of the same proportion in
all inputs, which are labor and capital in this case. If output increases by that same proportional
change then there are constant returns to scale. Constant returns to scale imply that if the factors
labor and capital increase at a certain rate (say 10%), output will increase at the same rate (e.g.,
10%). If output increases by more than that proportion, there are increasing returns to scale. If
output increases by less than that proportional change, there are decreasing returns to scale. In the
above model, the following occurs
If 1 2 1, there are constant returns to scale.
If 1 2 1, there are increasing returns to scale.
Two procedures will be used to test this hypothesis. In the first, the covariance matrix of the
estimators is used. In the second, the model is reparameterized by introducing a new parameter.
Procedure 1: using covariance matrix of estimators
ˆ ˆ2 1
t*ˆ ˆ 1 , where se( ˆ1 ˆ2 ) Vˆ ( ˆ1 ˆ2 ) Vˆ ( ˆ1 ) V(
ˆ ˆ ) 2 covar( ˆ , ˆ )
se( ˆ ˆ )
2 1 2
1 2
1 2
()ˆ u2x1 x2
se( ˆ1 ˆ2 ) Vˆ (ˆ1 ) V(
ˆ ˆ ) 2
x12x22 (x1x2 )2
2
If t*ˆ ˆ tc we will conclude, in a two side alternative test, that there are not constant returns to
1 2
scale. On the other hand, if t *ˆ ˆ is positive and large enough, we will reject, in a one side
1 2
It is easier to perform the test if we apply the second procedure. A different model is estimated in
this procedure, which directly provides the standard error of interest. Thus, let us define:
1 2 1
Thus, the null hypothesis that there are constant returns to scale is equivalent to saying that
H0 : 0 .
From the definition of , we have 1 2 1. Substituting 1 in the original equation:
ln(output) ( 2 1)ln(labour) 2 ln(capital) u
Hence,
ln output
labour ln(labour) capital labour u
2
Therefore, to test whether there are constant returns to scale is equivalent to carrying out a
significance test on the coefficient of ln(labor) in the transformed model. The strategy of rewriting
the model so that it contains the parameter of interest works in all cases and is usually easy to
implement. We test H0 : 0 against H1 : 0 .
ˆ
The t-statistic is: t*ˆ
se(ˆ)
So far, we have only considered hypotheses involving a single restriction. But frequently, we wish
to test multiple hypotheses about the underlying parameters 1 , 2 , 3 ,..., k . In multiple linear
restrictions, we will distinguish three types: exclusion restrictions, model significance and other
linear restrictions.
Exclusion Restrictions
We begin with the leading case of testing whether a set of independent variables has no partial
effect on the dependent variable, Y. These are called exclusion restrictions. Consider the following
model
Y 1 X1 2 X 2 3 X 3 4 X 4 5 X 5 ui ………………………………(3.26)
The null hypothesis in a typical example of exclusion restrictions could be the following:
H0 : 4 5 0 against H1 : H0 is not true
This is an example of a set of multiple restrictions, because we are putting more than one
restriction on the parameters in the above equation. A test of multiple restrictions is called a joint
hypothesis test.
It is important to remark that we test the above H0 jointly, not individually. Now, we are going to
distinguish between unrestricted (UR) and restricted(R) models. The unrestricted model is the
reference model or initial model. In this example the unrestricted model is the model given in
(3.26). The restricted model is obtained by imposing H0 on the original model. In the above
example, the restricted model is
Y 1 X1 2 X2 3 X3 u ………………………………….(3.27)
By definition, the restricted model always has fewer parameters than the unrestricted one.
Moreover, it is always true that RSSR RSSUR where RSSR is the RSS of the restricted model, and
RSSUR is the RSS of the unrestricted model. Remember that, because OLS estimates are chosen to
minimize the sum of squared residuals, the RSS never decreases (and generally increases) when
certain restrictions (such as dropping variables) are introduced into the model.
Test statistic: F ratio
(RSSR RSSUR ) / r 2
( RUR RR2 ) / r
F* , where r is number of restrictions.
RSSUR (1 RUR
2
) / (n k)
(n k)
Decision Rule
The Fr,n k distribution is tabulated and available in statistical tables, where we look for the critical
value ( F (r,nk ) ), which depends on (the significance level), r (the df of the numerator), and n-k,
(the df of the denominator). Taking into account the above, the decision rule is quite simple.
If F F (r,n k ) reject the H 0 .
*
The test procedure for any set of hypothesis can be based on a comparison of the sum of squared
errors from the original, the unrestricted multiple regression model to the sum of squared errors
from a regression model in which the null hypothesis is assumed to be true. When a null
hypothesis is assumed to be true, we in effect place conditions or constraints, on the values that the
parameters can take, and the sum of squared errors increases. The idea of the test is that if these
sum of squared errors are substantially different, then the assumption that the joint null hypothesis
is true has significantly reduced the ability of the model to fit the data, and the data do not support
the null hypothesis.
If the null hypothesis is true, we expect that the data are compliable with the conditions placed on
the parameters. Thus, there would be little change in the sum of squared errors when the null
hypothesis is assumed to be true.
Let the Restricted Residual Sum of Square (RSS R) be the sum of squared errors in the model
obtained by assuming that the null hypothesis is true and RSS UR be the sum of the squared error of
the original unrestricted model i.e. unrestricted residual sum of square (RSS UR). It is always true
that RSSR - RSSUR 0.
Consider Y ˆ0 ˆ1 X 1 ˆ2 X 2 ......... ˆk X k uˆi .
This model is called unrestricted. The test of joint hypothesis is that:
H 0 : 1 2 3 .......... .. k 0
H 1 : at least one of the k is different from zero.
̂ 0
Y i
Y (applying OLS)…………………………….(3.28)
n
uˆi Yi ˆ0 but ˆ 0 Y
uˆi Yi ˆ
uˆi2 (Yi Yˆi ) 2 y 2 TSS
The sum of squared error when the null hypothesis is assumed to be true is called Restricted
Residual Sum of Square (RSS R) and this is equal to the total sum of square (TSS).
RSS R RSSUR / K 1
The F-ratio: ~F( k 1,n k ) (has an F-distribution with k-1 and n-k degrees of
RSSUR / n K
freedom for the numerator and denominator respectively).
RSSR TSS
RSSUR uˆi2 y 2 ˆ1yx1 ˆ2yx2 ..........ˆk yxk RSS
(TSS RSS ) / k 1
F*
RSS / n k
ESS / k 1
F*
RSS / n k
If we divide the above numerator and denominator by y 2 TSS then:
ESS
/ k 1
F * TSS
RSS
/k n
TSS
R2 / k 1
F*
1 R2 / n k
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or R 2 & 1-
R2 . If the null hypothesis is not true, then the difference between RSSR and RSSUR (TSS & RSS)
becomes large, implying that the constraints placed on the model by the null hypothesis have large
effect on the ability of the model to fit the data, and the value of F tends to be large. Thus, we
reject the null hypothesis if the F test static becomes too large. This value is compared with the
critical value of F which leaves the probability of in the upper tail of the F-distribution with k-1
and n-k degree of freedom.
If the computed value of F is greater than the critical value of F (k-1, n-k), then the parameters of
the model are jointly significant or the dependent variable Y is linearly related to the independent
variables included in the model.
In Chapter 2 we showed how the estimated two-variable regression model can be used for
prediction, that is, predicting the point on the population regression function (PRF), as well as for
(2) individual prediction, that is, predicting an individual value of Y given the value of the
regressor X=X0 , where X0 is the specified numerical value of X.
The estimated multiple regression too can be used for similar purposes, and the procedure for
doing that is a straightforward extension of the two-variable case.
Let the estimated regression equation be:
Yˆ ˆ ˆ1 X 1 ˆ2 X 2
Now consider the prediction of the value Y0 of Y given values X10 of X 1 and X 20 of X 2 . These
could be values at some future date.
The we have:
Y0 ˆ ˆ1 X 10 ˆ2 X 20 u0
Consider Yˆ0 ˆ ˆ1 X 10 ˆ2 X 20
The prediction error is: Yˆ0 Y0 ˆ ( ˆ1 1 ) X 10 ( ˆ2 2 ) X 20 u0
Example
Suppose that the estimated model is: Yˆ 4.0 0.7 X 10 0.3 X 20
CHAPTER FOUR
4.0. Introduction
In both the simple and multiple regression models, we made important assumptions about the
distribution of Yi and the random error term ‘ ui ’. We assumed that ‘ ui ’ is random variable with
mean zero and var( u t ) , and that the errors corresponding to different observation are
2
Now, we address the following ‘what if’ questions in this chapter. What if the error variance is not
constant over all observations? What if the different errors are correlated? What if the
explanatory variables are correlated? We need to ask whether and when such violations of the
basic classical assumptions are likely to occur. What types of data are likely to lead to
heteroscedasticity (different error variance)? What type of data is likely to lead to autocorrelation
(correlated errors)? What types of data are likely to lead to multicollinearity? What are the
consequences of such violations on least square estimators? How do we detect the presence of
autocorrelation, heteroscedasticity, or multicollinearity? What are the remedial measures? How
do we build an alternative model and an alternative set of assumptions when these violations exist?
Do we need to develop new estimation procedures to tackle the problems? In the subsequent
sections, we attempt to answer such questions.
One of the assumptions of the CLRM is that there is no exact linear relationship exists between
any of the explanatory variables. When this assumption is violated, we speak of perfect
multicollinearity. If all explanatory variables are uncorrelated with each other, we speak of
absence of MC. These are two extreme cases and rarely exist in practice. Of particular interest are
cases in between moderate to high degree of MC.
For k-variable regression involving explanatory variables x1 , x2 ,......, xk, an exact linear
relationship is said to exist if the following condition is satisfied.
1x1 2 x2 ... k xk 0............................................(4.1)
where 1 , 2 ,.....k are constants such that not all of them are simultaneously zero.
However , the term multicollinearity is used in a broader sense to include the case of perfect
multicollinearity as shown by (4.1) as well as the case where the x-variables are inter-correlated
but not perfectly so as follows:
1x1 2 x2 ....... 2 xk vi 0....................................(4.2)
Note that multicollinearity refers only to linear relationships among the explanatory variables. It
does not rule out non-linear relationships among the explanatory variables.
For example:Y 1 X i 1 X i 1 X i vi (4.3)
2 3
MC may arise for various reasons. Firstly, there is a tendency of economic variables to move
together over time. Economic magnitudes are influenced by the same factors and in consequence
once these determining factors become operative the economic variables show the same broad
pattern of behavior over time. For example in periods of booms or rapid economic growth the
basic economic magnitudes grow, although some tend to lag behind others. Thus, income,
consumption, savings, investment, prices, employment, tend to rise in periods of economic
expansion and decrease in periods of recession. Growth and trend factors in time series are the
most serious cause of MC. Secondly, regressing on small sample values of the population may
result in MC. Thirdly, over determined model is another cause of MC. This happens when the
model has more explanatory variables than the number of observations. This could happen in
medical research where there may be a small number of patients about whom information is
collected on a large number of variables.
Why does the classical linear regression model put the assumption of no multicollinearity among
the X’s? It is because of the following consequences of multicollinearity on OLS estimators.
Dear student, do you recall the formulas of ˆ1 and ̂ 2 from our discussion of multiple regression?
Assume x 2 x1 where is non-zero constant. Substituting it in the above ˆ1 and ̂ 2 formula:
Applying the same procedure, we obtain similar result (indeterminate value) for ̂ 2 . Likewise,
from our discussion of multiple regression model, variance of ˆ1 is given by:
2x22
var( ˆ1 )
x12 x12 (x1 x2 ) 2
2 2 x12
2 (x12 ) 2 2 (x12 ) 2
2 2 x12
Infinite.
0
These are the consequences of perfect multicollinearity. One may raise the question on
consequences of less than perfect correlation. In cases of near or high multicollinearity, one is
likely to encounter the following consequences.
ii. Because of consequence (i), the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
iii. Also because of consequence (i), there is a high probability of not rejecting null
hypothesis of zero coefficient (using the t-test) when in fact the coefficient is
significantly different from zero.
iv. Although the t-ratio of few (or none) coefficient is statistically significant, R2 , the
overall measure of goodness of fit, can be very high. The regression model may
do well, that is, R2 may be quite high.
v. The OLS estimates and their standard errors may be quite sensitive to small
changes in the data.
4.1.4. Detection of Multicollinearity
MC almost always exist in most applications. So the question is not whether it is present or not,
it is a question of degree. Also MC is not a statistical problem, it is a data (sample) problem.
Since multicollinearity refers to the condition of the explanatory variables that are assumed to be
non-stochastic, it is a feature of the sample and not of the population. Therefore, we do not “test
for MC”; but measure its degree in any particular sample using some rules of thumb.
1. High R2 but few (or no) significant t-ratios. If R2 is high, say, in excess of 0.8, the F test
in most cases will reject the hypothesis that the partial slope coefficients are
simultaneously equal to zero, but the individual t tests will show that none or very few of
the partial slope coefficients are statistically different from zero.
2. High pair-wise correlation among regressors. Note that high zero-order correlations are a
sufficient but not a necessary condition for the existence of multicollinearity because it
can exist even though the zero-order or simple correlations are comparatively low.
3. Variance Inflation Factor (VIF)
Yi 0 1 X1 2 X 2 ... k X k ui
1
VIF ( j ) ; j 1, 2,3,..., k
1 R 2j
2
Where R j is the coefficient of determination obtained when X j variable is regressed on the
remaining explanatory variables (called auxiliary regression). For example, the VIF (2 ) is
calculated as:
1
VIF ( 2 ) ; where R22 is the coefficient of determination of the auxiliary
1 R22
regression:
X 2 0 1 X1 3 X 3 ... k X k u
Rule of thumb:
If VIF ( j ) exceeds 10, the ˆ j is poorly estimated because of MC (or the j th regressor is
responsible for MC).
What can be done if multicollinearity is serious? We have two choices: (1) do nothing or (2)
follow some rules of thumb.
Do Nothing
Rule-of-Thumb Procedures
One can try the following rules of thumb to address the problem of multicollinearity, the success
depending on the severity of the collinearity problem.
2 Blanchard, O. J., Comment, Journal of Business and Economic Statistics, vol. 5, 1967, pp. 449–451.
Yi 0 1 X 1i 2 X 2i ui
Where Y=consumption, X1 =income, and X2 =wealth. Income and wealth variables tend to be
highly collinear. But suppose a priori we believe that 1 0.12 ; that is, the rate of change of
consumption with respect to wealth is one-tenth the corresponding rate with respect to income.
We can then run the following regression:
Once we obtain ̂1 , we can estimate ˆ2 from the postulated relationship between 1 and 2 .
However, such a priori information is rarely available.
When faced with severe multicollinearity, one of the “simplest” things to do is to drop one of the
collinear variables. But in dropping a variable from the model we may be committing a
specification bias or specification error. Specification bias arises from incorrect specification of
the model used in the analysis. Thus, if economic theory says that income and wealth should
both be included in the model explaining the consumption expenditure, dropping the wealth
variable would constitute specification bias.
Since multicollinearity is a sample feature, it is possible that in another sample involving the
same variables collinearity may not be as serious as in the first sample. Sometimes simply
increasing the size of the sample (if possible) may reduce the collinearity problem.
4.2. Heteroscedasticity
4.2.1. The Nature of Heteroscedasticity
In the classical linear regression model, one of the basic assumptions is that the probability
distribution of the disturbance term remains same over all observations of X; i.e. the variance of
each u i is the same for all the values of the explanatory variable. Symbolically,
If u is not constant but its value depends on the value of X; it means that ui f ( X i ) . Such
2 2
In panel (a) u seems to increase with X. In panel (b) the error variance appears greater in X’s
2
middle range, tapering off toward the extremes. Finally, in panel (c), the variance of the error
term is greater for low values of X, declining and leveling off rapidly an X increases.
The pattern of heteroscedasticity would depend on the signs and values of the coefficients of the
relationship ui f ( X i ) , but u i ’s are not observable.
2
As such in applied research we make
convenient assumptions that heteroscedasticity is of the forms:
i. ui2 K 2 ( X i2 )
ii. 2 K 2 (X i )
K
iii. ui2 etc.
Xi
4.2.2. Reasons for Heteroscedasticity
There are several reasons why the variances of u i may be variable. Some of these are:
It states that as people learn their error of behavior become smaller over time. In this case i is
2
expected to decrease. Example: as the number of hours of typing practice increases, the average
number of typing errors and as well as their variance decreases.
Thus banks that have sophisticated data processing equipment are likely to commit fewer errors
in the monthly or quarterly statements of their customers than banks without such facilities.
An outlier is an observation that is much different (either very small or very large) in relation to
the other observation in the sample.
1. OLS estimators are still linear and unbiased. The least square estimators are unbiased
even under the condition of heteroscedasticity.
2. Variance of OLS coefficients will not be minimum. Thus, under heteroscedasticity, the
OLS estimators of the regression coefficient are not BLUE and efficient.
3. Because of consequence (2), confidence interval, t-test and F-test of significance are
invalid.
4. ˆ
2
uˆ 2
is biased, E( ˆ 2 ) 2
nk
4.2.4.Detection of Heteroscedasticity
I. Graphical Method:
Plot the estimated residual ( uˆi ) or squared ( uˆi ) against the predicted dependent Variable (Yi) or
2
any independent variable (Xi). Observe the graph whether there is a systematic pattern.
In the figure below uˆi are plotted against Yˆ or ( X i ) . In fig (a), we see there is no systematic
2
pattern between the two variables, suggesting that perhaps no heteroscedasticity is present in the
data. Figures b to e, however, exhibit definite patterns. . For instance, c suggests a linear
relationship whereas d and e indicate quadratic relationship between uˆi and Yi .
2
Consider: Y 0 1 X1 2 X 2 u
Decision Rule: Reject H0 if nRw2 p2 (the value of the Chi-square distribution with p degree of
freedom for level of significance). Here, p is number of variables in the auxiliary regression.
uˆ 2
0 1 X1 2 X 2 ... k X k ; and calculate estimated sum of square (ESS). The
ˆ 2
ESS
test statistic is: cal
2
2
where (k) is the critical value from the Chi-square distribution with k degree of freedom
2
i2 is not known.
weighted least squares, for the estimators thus obtained are BLUE. Assume that our original
model is:Y X i U i where u i satisfied all the assumptions except that u i is
hetroscedastic.
(ui ) 2 i2 f (k i )
If we apply OLS to the above model, the estimators are no more BLUE. To make them BLUE
we have to transform the above model. Applying OLS to the transformed variables is known as
the method of Generalized (Weighted) Least Squares (GLS/WLS). In short GLS/WLS is OLS
on the transformed variables that satisfy the standard least squares assumptions. The estimators
thus obtained are known as GLS/WLS estimators, and it is these estimators that are BLUE.
The transforming variable of the above model is i2 i so that the variance of the
Y X i ui
Y * * * X * u*
i i i i
As noted earlier, if true i are known, we can use the WLS method to obtain BLUE estimators.
2
Since the true i are rarely known, there is a way of obtaining consistent (in the statistical sense)
2
estimates of the variances and covariances of OLS estimators even if there is heteroscedasticity.
4.3. Autocorrelation
In our discussion of simple and multiple regression models, one of the assumptions of
the classicalist is that the cov(u i u j ) E (u i u j ) 0 which implies that successive values of
disturbance term u are temporarily independent, i.e. disturbance occurring at one point of
observation is not related to any other disturbance. This means that when observations are made
over time, the effect of disturbance occurring at one period does not carry over into
another period.
If the above assumption is not satisfied, that is, if the value of u in any particular
period is correlated with its own preceding value(s), we say there is autocorrelation of
the random variables. Autocorrelation is a special case of correlation which refers to the
relationship between successive values of the same variable.
This form of autocorrelation is called a second order autoregressive scheme and so on.
Generally when autocorrelation is present, we assume simple first form of autocorrelation: ut =
f(ut-1 ) and also in the linear form:
ut ut 1 vt --------------------------------------------4.3
where the coefficient of autocorrelation and v is a random variable satisfying all the basic
assumption of ordinary least square.
(v) 0, (v 2 ) v2 and (v i v j ) 0 for i j
The above relationship (4.3) states the simplest possible form of autocorrelation; if we apply
OLS on the model given in (4.3) we obtain:
u u t t 1
ˆ t 2
n
--------------------------------4.4
u
t 2
2
t 1
ut ut 1 ut ut 1 u u t t 1
ˆ t 2
n
t 2
t 2
rut ut 1
(Why?)---------------------4.5
u t2 u t21
u
2
2
n
t 2
t 1
u 2 t 1
t 2
1 ˆ 1 since 1 r 1 ---------------------------------------------4.6
This proves the statement “we can treat autocorrelation in the same way as correlation in
general”. From our statistics background we know that:
If the values of u are found to be correlated with simple Markov process, then it becomes:
ut ut 1 vt with / / 1 vt fulfilling all the usual assumptions of a disturbance term. Our
objective, here is to obtain value of u t in terms of autocorrelation coefficient and random
variable vt . The complete form of the first order autoregressive scheme may be discussed as
under:
u1 u0 v1
u2 u1 v2 ( u0 vt ) vt 2u0 v1 v2
u3 u2 v3 ( 2u0 v1 v2 ) v3 3u0 2v1 v2 v3
u4 u3 v4 ( 3u0 2v1 v2 v3 ) v 4 4u0 3v1 2v2 v3 v4
ut t u0 t 1v1 t 2v2 ... vt 1 vt
t 1
ut t u0 j vt j .....................................................(4.7)
j 0
From (4.6) we have seen that the value of is less than in absolute value, i.e. 1 1 . In such
cases,lim
𝑡→∞
ut 0
t
0 0 . Thus, ut can be expressed as:
ut j vt j vt vt 1 2vt 2 3vt 3 ... (4.8)
j 0
Now, using this value of u t , let’s compute its mean, variance and covariance
Mean of ut :
E (ut ) j E (vt j ) 0 …………………………………………………… (4.9)
j 0
Variance of ut :
2
Var(u t ) E (u t ) j vt j
2
j 0
2j 2
E vt j j i vt j vt i
j 0 j 0 i j
E 2 j vt2 j E j i vt j vt i
j 0 j 0 i j
2 j E (vt2 j ) j i E (vt j vt i ); E (vt2 j ) v2 , E (vt j vt i 0;i j)
j 0 j 0 i j
2 j v2
j 0
1
If a 1 , then a
j 0
j
1 a
v2
Thus, var(u t ) v v
2j 2 2 2j
……………………………. (4.10)
j 0 j 0 1 2
Covariance of ut :
v2
cov(u t u t s ) s u2 ; s 1, 2,3,.... ………………………… (4.11)
1 2
ˆ
x y
t t
x 2
t
ˆ t t2 t t 2 t
x ( x u ) xu
xt xt
0
Then we have E ( ˆ )
xt E (ut ) , (The estimated coefficients are still unbiased .)
xt2
2. The variances of the OLS estimators is no longer the smallest (efficient)
ˆ 2 2 2 xt xt 1 2 t t 2
xx 3 t t 3
xx
var( 1 ) AR (1) ...
xt2 xt2 xt2 xt2 xt2
If 0, autocorrelation, then Var( 1 )AR(1) > Var( 1 ). The implication is if wrongly use
ˆ 2
Var ( ) 2 while the data is autocorrelated var( ) is underestimated.
x
If var( ˆ ) is underestimated, SE ( ˆ ) is also underestimated, this makes t-ratio large. This large t-
ratio may make ˆ statistically significant while it may not. Wrong testing procedure will make
wrong prediction and inference about the characteristics of the population.
The most celebrated test for detecting serial correlation is one that is developed by statisticians
Durbin and Waston. It is popularly known as the Durbin-Waston d statistic, which is defined as:
t n
(u u t 1 )2
2 1 ˆ ……………………… (4.12)
t
d t 2
t n
u
t 1
2
t
b. The disturbances U t are generated by the first order auto regressive scheme:
ut ut 1 vt AR(1) scheme.
c. The regression model does not include lagged value of Y the dependent variable as one
of the explanatory variables. Thus, the test is inapplicable to models of the following
type: yt 0 1 X1t 2 X 2t 2 yt 1 ut
ˆ 0, d 2
if ˆ 1, d 0
ˆ 1, d 4
Whenever the calculated value of d turns out to be sufficiently close to 2, we do not reject the
null hypothesis, and if it is close to zero or four, we reject the null hypothesis.
For the two-tailed Durbin Watson test, we have set five regions to the values of d as depicted in
the figure below.
We do not have unique critical value of d-static. We have d L -lower bound and d u upper bound
of the initial values of d to reject or not to reject the null hypothesis.
The mechanisms of the D.W test are as follows, assuming that the assumptions underlying the
tests are fulfilled.
Solution: First compute (4 d L ) and (4 dU ) then compare the computed value of d with dL ,
dU , (4 d L ) and (4 dU )
(4 d L ) =4-1.37=2.63
(4 dU ) =4-1.5=2.50
since d is less than d L we reject the null hypothesis of no autocorrelation.
The test is not applicable if the regression model does not include an intercept term.
The test is valid for the AR(1) error scheme only.
The test is invalid when lagged values of the dependent variable appear as regressors.
The test is invalid if there are missing values.
There are certain regions where the test is inconclusive.
2. The Breusch- Godfrey (BG) Test
Assume that the error term follows the autoregressive scheme of order p, that is, AR (p) given
by:
ut 1ut 1 2ut 2 3ut 3 ... put p vt where vt satisfies all assumptions of the CLRM.
Steps:
1. Estimate the model ( Yt X t ut ) using OLS and obtain the residuals, uˆt .
2. Regress uˆt on X t and uˆt 1 , uˆt 2 , uˆt 3 ,..., uˆt p , that is, run the following auxiliary regression:
uˆt X t 1uˆt 1 2uˆt 2 3uˆt 3 ... puˆt p t
3. Obtain the coefficient of determination R2 from the auxiliary regression
4. If the sample size T is large, Breusch and Godfrey have shown that (T P)R follows the
2
Decision rule:
Reject the null hypothesis of no autocorrelation if (T P)R exceeds the tabulated value from the
2
Steps:
d. Run OLS on Y1 1 X t ut
* * * * *
e. If DW test shows that the autocorrelation still existing, then it needs to iterate the
procedures from (4). Obtain the uˆt
*
Run OLS on uˆt uˆt 1 vt and obtain ̂ˆ (the second-round estimated )
* * ,
f.
g. Use ̂ˆ to transform the variables:
h. Run OLS on Yt 1 X t ut
** ** ** ** **
i. Check on the DW3 -statistic, if the autocorrelation is still existing, then go into third-round
procedures and so on.
Prais-Winsten transformation
To avoid the loss of the first observation of each variable, the first observation of Y * and X*
should be transformed as:
Yt *1 1 ˆ 2 (Yt 1 )
X t*1 1 ˆ 2 (Xt 1 )
Until now we have assumed that the multiple linear regression we are estimating includes all the
relevant explanatory variables. In practice, this is rarely the case. Sometimes some relevant
variables are not included due to oversight or lack of measurements. This is often called the
problem of excluding a relevant variable or underspecifying the model. How our inferences
change when this problem is present?
Instead, we omit x2 and estimate the equation: y 1 x1 u . This will be referred to as the
“misspecified model.” The estimate of 1 we get is
ˆ1
x1 y
x 2
1
Substituting the expression for y from the true equation in this, we get
ˆ1 x x xu
x1 ( 1 x1 2 x2 u )
ˆ1 1 2 1 2 1
x 2
1 x x 2
1
2
1
from a regression of x2 on x1 .
bias 2
x x 1 2
x 2
1
2
Var ( ˆ1 )
x 2
1 (1 r122 )
Thus, ̂1 is the biased estimator but has a smaller variance than ̂1 estimated from the correctly
specified model. This results in large t-ratio/CI which in turn leads to rejection of null hypothesis
when it is true.
When we estimate a multiple regression equation and use it for predictions at future points of
time we assume that the parameters are constant over the entire time period of estimation and
prediction. To test this hypothesis of parameter consistency (or stability) some tests have been
proposed. One of these tests is Chow (Analysis-of-variance) test.
Suppose that we have two independent sets of data with sample sizes n1 and n2 , respectively.
The regression equation is
A test for stability of the parameters between the populations that generated the two data sets is a
test of hypothesis:
H0 : 11 21, 12 22 , 13 23 ,..., 1k 2k ,1 2
If this hypothesis is true, we estimate a single equation for the data set obtained by pooling the
two data sets. The F-test we use is the F-test described in Chapter 3 based on RSSUR and RSSR .
To get RSSUR we estimate the regression model of each of the data sets separately.
Define
RSS1 Residual sum of squares for the first data set.
RSS2 Residual sum of squares for the second data set.
RSS1 RSS2 RSSUR
RSSR Residual sum of squares for the pooled (first+second) data set.
Fcal
RSSR RSSUR / (K 1)
RSSUR / n1 n2 2k 2
Decision Rule: Reject the null hypothesis of identical parameters (parameter stability) if
Fcal F k 1, n1 n2 2k 2